Formatting XML in SQL Server 2008

Since SQL SERVER 2005 it is possible to store XML data in database by using dedicated data type - XML. This type solved many programming problems such storing XML data as simple text which was very unpleasant to maintenance and detecting errors in document structure. Now stroring XML data is very simple and requires only the creation of column in XML data type. But sometimes we may want to do reverse operation. I mean generating XML document or fragment from non XML columns in a single SELECT statement. Of course SQL Server meets our expectations and allow to transform any results of SELECT statement (which always returns table) to XML structure.

To start transforming selected results to XML format we must use FOR XML clause right after last statemnt in normal SELECT. For example instead of using:

Code Snippet

SELECT ProductNumber,Name,ListPrice,Color
FROM Production.Product

we should use:

Code Snippet

USE [AdventureWorks]
SELECT ProductNumber,Name,ListPrice,Color
FROM Production.Product
FOR XML

But if we want run this query server going to return an error 'Incorrect syntax near 'XML'. So now we realize that it is so easy - sorry :) We just tell SQL Server to prepare XML for us but we didn`t tell them how to do that. In SQL Server 2005 and above there is a short list of modes in this topic:

FOR XML RAW - generate a single <row> for each single row in result table returned by SELECT statement.
FOR XML AUTO - generate nesting structure of XML document based on structure of the SELECT statement.
FOR XML EXPLICIT - allow to generate complex structure which we can control. This provides a lot of flexibility but does so at the cost of much more greater complexity and performance.
FOR XML PATH - as EXPLICIT its allow to customize output XML but here we can do it much more easier - queries are less complex.

Ok, so now we are able to fix our last query so lets do this by using each of them:

Code Snippet

USE [AdventureWorks]
SELECT ProductNumber,Name,ListPrice,Color FROM Production.Product
FOR XML RAW 
 
<row ProductNumber="AR-5381" Name="Adjustable Race" ListPrice="0.0000" />
<row ProductNumber="BA-8327" Name="Bearing Ball" ListPrice="0.0000" />
<row ProductNumber="BE-2349" Name="BB Ball Bearing" ListPrice="0.0000" />
------------------------------------------------------------------------------
SELECT ProductNumber,Name,ListPrice,Color FROM Production.Product
FOR XML AUTO 
 
<Production.Product ProductNumber="AR-5381" Name="Adjustable Race" ListPrice="0.0000" />
<Production.Product ProductNumber="BA-8327" Name="Bearing Ball" ListPrice="0.0000" />
<Production.Product ProductNumber="BE-2349" Name="BB Ball Bearing" ListPrice="0.0000" />
------------------------------------------------------------------------------
SELECT ProductNumber,Name,ListPrice,Color FROM Production.Product
FOR XML EXPLICIT 
 
--Msg 6803, Level 16, State 1, Line 1
--FOR XML EXPLICIT requires the first column to hold positive integers that represent XML tag IDs.
------------------------------------------------------------------------------
SELECT ProductNumber,Name,ListPrice,Color FROM Production.Product
FOR XML PATH 
<row>
  <ProductNumber>AR-5381</ProductNumber>
  <Name>Adjustable Race</Name>
  <ListPrice>0.0000</ListPrice>
</row>
<row>
------------------------------------------------------------------------------

You can see that three FOR XML type returns different XML structure (RAW, AUTO, PATH) and one returns error message (EXPLICIT). To explain behavior of each type we need to look at them more precisely.

RAW

This mode returns a single XML fragment (not whole document) which consist of many <row ... /> nodes with many attributes (each column or alias from base SELECT command related to one attribute in output XML) but this was only default behavior. We may tell SQL SERVER to generate sub-nodes insted of attributes by using FOR XML RAW, ELEMENTS clause.

Code Snippet

SELECT ProductNumber,Name,ListPrice,Color FROM Production.Product
FOR XML RAW, ELEMENTS 
 
<row>
  <ProductNumber>AR-5381</ProductNumber>
  <Name>Adjustable Race</Name>
  <ListPrice>0.0000</ListPrice>
</row>
<row>
  <ProductNumber>BA-8327</ProductNumber>
  <Name>Bearing Ball</Name>
  <ListPrice>0.0000</ListPrice>
</row>

We may use much more options in RAW mode and these are:

FOR XML RAW ('custom_name') - rename <row /> node to name given in brackets but we can using white space in node name
FOR XML RAW, ROOT - result XML has exactly one root node. By default root is named <root /> but this may be changes to custom name in the same way as in case <row/> (ex. FOR XML RAW ('node'), ROOT ('my_root'))

If You want to use RAW mode to obtain hierarchical structure this is not god idea....

AUTO

This mode by default creates list on nodes without root, but unlike the RAW by default generates nodes named the same as in SELECT statement. Additionally allow to create nesting structure of XML document depends of the structure of the SELECT. The root,elements and custom root name rules are the same as in RAW mode. To demonstrate how AUTO mode works lets focus on following example.

Code Snippet

SELECT pm.Name, p.ProductNumber,p.Name,p.Color 
FROM Production.ProductModel as pm
JOIN Production.Product as p on p.ProductModelID = pm.ProductModelID
FOR XML AUTO,ROOT ('Inventory')
 
<Inventory>
  <pm Name="HL Road Frame">
    <p ProductNumber="FR-R92B-58" Name="HL Road Frame - Black, 58" />
    <p ProductNumber="FR-R92R-58" Name="HL Road Frame - Red, 58" Color="Red" />
  </pm>
  <pm Name="Sport-100">
    <p ProductNumber="HL-U509-R" Name="Sport-100 Helmet, Red" Color="Red" />
    <p ProductNumber="HL-U509" Name="Sport-100 Helmet, Black" Color="Black" />
  </pm>
  <pm Name="Mountain Bike Socks">
    <p ProductNumber="SO-B909-M" Name="Mountain Bike Socks, M" Color="White" />
  </pm>
</Inventory>

As we can see by using AUTO mode we are able to generate hierarchical XML structure. But here we have several problem to fix. Firstly we not descriptive node names. This because AUTO mode uses name/aliases from original SELECT to name result nodes. To fix this just use more descriptive aliases for columns. Second things is sorting, because we don`t use any ordering nodes in result XML are non sorted and whole document looks like unpleasant. Just use ORDER BY clause to fix problem. Last problem is that first product 'FR-R92B-58' in their node haven`t attribute Color. Such thing may occur if value in database was set to NULL and SQL SERVER omit this value. Fix is very simple, just add special keyword XSINIL (we can use this only with ELEMENTS directive) at the end of FOR XML - this will add the schema to handle xsi:nil="true". After all changes result presents as follow.

Code Snippet

SELECT [ProductModel].Name, [Product].ProductNumber,[Product].Name,[Product].Color 
FROM Production.ProductModel as [ProductModel]
JOIN Production.Product as [Product] on [Product].ProductModelID = [ProductModel].ProductModelID
ORDER BY [ProductModel].Name,[Product].Name
FOR XML AUTO , ROOT('inventory'), ELEMENTS XSINIL 
 
<inventory xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <ProductModel>
    <Name>All-Purpose Bike Stand</Name>
    <Product>
      <ProductNumber>ST-1401</ProductNumber>
      <Name>All-Purpose Bike Stand</Name>
      <Color xsi:nil="true" />
    </Product>
  </ProductModel>
  <ProductModel>
    <Name>Bike Wash</Name>
    <Product>
      <ProductNumber>CL-9009</ProductNumber>
      <Name>Bike Wash - Dissolver</Name>
      <Color xsi:nil="true" />
    </Product>
  </ProductModel>
  <ProductModel>
    <Name>Cable Lock</Name>
    <Product>
      <ProductNumber>LO-C100</ProductNumber>
      <Name>Cable Lock</Name>
      <Color xsi:nil="true" />
    </Product>
  </ProductModel>
  <ProductModel>
    <Name>Chain</Name>
    <Product>
      <ProductNumber>CH-0234</ProductNumber>
      <Name>Chain</Name>
      <Color>Silver</Color>
    </Product>
  </ProductModel>
  <ProductModel>
    <Name>Classic Vest</Name>
    <Product>
      <ProductNumber>VE-C304-L</ProductNumber>
      <Name>Classic Vest, L</Name>
      <Color>Blue</Color>
    </Product>
  </ProductModel>
</inventory>

EXPLICIT

This mode gives the greatest control over result XML document but requires work the most. It`s good in the situation where you want to have hierarchical and complex document. Because this mode requires special table structure returned by SELECT statement good idea is firstly create appropriate table and then add for command to construct XML. But what 'special table structure' does mean? As we can see in picture 1. if we want to use EXPLICIT mode we have to create table that contains:

TAG column at first position: this column is used by SQL SERVER to detect nesting level of each row.
Parent: this column at second position tell SQL SERVER into which node each of children node is located. For top level nodes (not root node!) we pass NULL and for each next node in nesting hierarchy we increment this value at 1 point.
Rest of columns: each columns from position 3 in table will be converted into node or attribute in result XML. But as You can see naming of those columns play the very important role and general pattern of naming looks NodeName!TagID!AttributeName for store value as attribute or NodeName!TagID!AttributeName!Element for store value as element text - please notice that in this case 'Element' is constant.

Pic.1. 'Special table structure'.

Now when we have our 'special table' we can add FOR XML EXPLICIT clause to produce XML document (additionally with the root element). Please remembaer abut sorting to produce more readable output.

Code Snippet

SELECT 1 as TAG, NULL as Parent,
             [ProductModel].ProductModelID as [Model!1!ID],
             [ProductModel].Name as [Model!1!Name],
             NULL as [Product!2!Number!Element],
             NULL as [Product!2!Name],
             NULL as [Product!2!Color]
FROM Production.ProductModel as [ProductModel]
JOIN Production.Product as [Product] on [Product].ProductModelID = [ProductModel].ProductModelID
UNION ALL
SELECT 2, 1,
            [ProductModel].ProductModelID,
             NULL,
             [Product].ProductNumber,
             [Product].Name,
             [Product].Color 
FROM Production.ProductModel as [ProductModel]
JOIN Production.Product as [Product] on [Product].ProductModelID = [ProductModel].ProductModelID
ORDER BY [Model!1!ID]
FOR XML EXPLICIT, ROOT ('Inventory')
 
<Inventory>
  <Model ID="1" Name="Classic Vest">
    <Product Name="Classic Vest, S" Color="Blue">
      <Number>VE-C304-S</Number>
    </Product>
    <Product Name="Classic Vest, M" Color="Blue">
      <Number>VE-C304-M</Number>
    </Product>
    <Product Name="Classic Vest, L" Color="Blue">
      <Number>VE-C304-L</Number>
    </Product>
    <Product Name="AWC Logo Cap" Color="Multi">
      <Number>CA-1098</Number>
    </Product>
  </Model>
  <Model ID="2" Name="Cycling Cap" />
  <Model ID="3" Name="Full-Finger Gloves">
    <Product Name="Full-Finger Gloves, S" Color="Black">
      <Number>GL-F110-S</Number>
    </Product>
    <Product Name="Full-Finger Gloves, M" Color="Black">
      <Number>GL-F110-M</Number>
    </Product>
    <Product Name="Full-Finger Gloves, L" Color="Black">
      <Number>GL-F110-L</Number>
    </Product>
    <Product Name="Half-Finger Gloves, S" Color="Black">
      <Number>GL-H102-S</Number>
    </Product>
    <Product Name="Half-Finger Gloves, M" Color="Black">
      <Number>GL-H102-M</Number>
    </Product>
    <Product Name="Half-Finger Gloves, L" Color="Black">
      <Number>GL-H102-L</Number>
    </Product>
  </Model>
</Inventory>

PATH

We can read in MSDN 'PATH mode is also a simpler way to introduce additional nesting for representing complex properties' and whole is true. PATH mode is much more easier way than EXPLICITY mode to create complex XML. This mode based on additional technology called XPATH which allow to query XML data but in this case reverse application was used. Its means that XML document hierarchy is contructed at runtime based on column name from the underlying SELECT statement.

The easiest way to introduce custom attributes to XML output is to add column aliases to each column in base SELECT statement (result as below). The problem with outputted XML is that we have a lot of <row /> node which is obviously not a professional, additionally we haven`t root node but as You can see by using @ before column alias we may create a attribute of current node ex. @Name. Next problem is that we don`t have any nested nodes because if we add some XPATX name like 'node/test' the result will be far from true.

Code Snippet


SELECT         [ProductModel].ProductModelID as '@ID',
             [ProductModel].Name as '@Name'
FROM Production.ProductModel as [ProductModel]
ORDER BY  [ProductModel].ProductModelID
FOR XML PATH
 
<row ID="1" Name="Classic Vest" />
<row ID="2" Name="Cycling Cap" />
<row ID="3" Name="Full-Finger Gloves" />
<row ID="4" Name="Half-Finger Gloves" />
<row ID="5" Name="HL Mountain Frame" />
<row ID="6" Name="HL Road Frame" />
<row ID="7" Name="HL Touring Frame" />

To fix problems listed above we have to make some changes in base SELECT query. Let`s assume that we want to have root node named 'Models'. As You can read earlier in this article this problem can be solve by adding ROOT('Models') at the end of FOR XML PATH statement. Furthermore we want to remove odd <row /> nodes from output XML replacing them with 'Model' nodes - this can be done easily by adding name after FOR XML PATH ex. FOR XML PATH ('Model'). Last this we want to do is to attach products list in each 'Model' node. This is not as intuitive as we can think and to solve this problem we must be familiar with TYPE SQL keyword. By using it we can create SQL type content (hyperlinked in Management Studio) in sub-query so we are able to return base data time in the same row as XML. But in this case we need to use TYPE keyword to create sub-nodes collection in our output document. Because we cant to add list of products to each product model (example below) we need to use sub-query inside in which we want to make a list of products where each products contains two attributes (name and color) and list (sub-nodes) of product number. To achieve this we need to make a sub-query in our SELECT statement and inside it we have to make correlated query (in example. [Product].ProductModelID = [ProductModel].ProductModelID). Whole sub-query result need an alias which will be parent node for whole product list in output example. Our sub-query need to return XML element so to to this right we have to use TYPE keyword with FOR XML PATH mode.

Code Snippet


SELECT         [ProductModel].ProductModelID as '@ID',
             [ProductModel].Name as '@Name',
             (SELECT 
                 [Product].Name  as '@Name',
                 [Product].Color as '@Color',
                 [Product].ProductNumber as 'Number'
              FROM  Production.Product [Product] 
              WHERE [Product].ProductModelID = [ProductModel].ProductModelID
              FOR XML PATH ('Product'), TYPE
              ) as 'Products'
FROM Production.ProductModel as [ProductModel]
  ORDER BY[ProductModel].ProductModelID
FOR XML PATH('Model'),ROOT('Models')
 
<Models>
  <Model ID="1" Name="Classic Vest">
    <Products>
      <Product Name="Classic Vest, S" Color="Blue">
        <Number>VE-C304-S</Number>
      </Product>
      <Product Name="Classic Vest, M" Color="Blue">
        <Number>VE-C304-M</Number>
      </Product>
      <Product Name="Classic Vest, L" Color="Blue">
        <Number>VE-C304-L</Number>
      </Product>
    </Products>
  </Model>
  <Model ID="2" Name="Cycling Cap">
    <Products>
      <Product Name="AWC Logo Cap" Color="Multi">
        <Number>CA-1098</Number>
      </Product>
    </Products>
  </Model>
</Models>

After making appropriate adjustments our query is ready to deploy on server to returns pretty looks XML document.

Thank You...

Web Notes

Search This Blog

Formatting XML in SQL Server 2008

Labels

Popular posts from this blog

Using Hortonworks Hive in .NET

Creating common partial class with Entity Framework

Persisting Enum in database with Entity Framework