Skip to main content

Formatting XML in SQL Server 2008

Since SQL SERVER 2005 it is possible to store XML data in database by using dedicated data type - XML. This type solved many programming problems such storing XML data as simple text which was very unpleasant to maintenance  and detecting errors in document structure. Now stroring XML data is very simple and requires only the creation of column in XML data type. But sometimes we may want to do reverse operation. I mean generating XML document or fragment from non XML columns in a single SELECT statement. Of course SQL Server meets our expectations and allow to transform any results of SELECT statement (which always returns table) to XML structure.

To start transforming selected results to XML format we must use FOR XML clause right after last statemnt in normal SELECT. For example instead of using:

Code Snippet
  1. SELECT ProductNumber,Name,ListPrice,Color
  2. FROM Production.Product

we should use:

Code Snippet
  1. USE [AdventureWorks]
  2. SELECT ProductNumber,Name,ListPrice,Color
  3. FROM Production.Product
  4. FOR XML

But if  we want run  this query server going to return an error 'Incorrect syntax near 'XML'. So now we realize that it is so easy - sorry :) We just tell SQL Server to prepare XML for us but we didn`t tell them how to do that. In SQL Server 2005 and above there is a short list of modes in this topic:

  • FOR XML RAW - generate a single <row> for each single row in result table returned by SELECT statement.
  • FOR XML AUTO - generate nesting structure of XML document based on structure of the SELECT statement. 
  • FOR XML EXPLICIT - allow to generate complex structure which we can control. This provides a lot of flexibility but does so at the cost of much more greater complexity and performance.
  • FOR XML PATH - as EXPLICIT its allow to customize output XML  but here we can do it much more easier - queries are less complex.
Ok, so now we are able to fix our last query so lets do this by using each of them:

Code Snippet
  1. USE [AdventureWorks]
  2. SELECT ProductNumber,Name,ListPrice,Color FROM Production.Product
  5. <row ProductNumber="AR-5381" Name="Adjustable Race" ListPrice="0.0000" />
  6. <row ProductNumber="BA-8327" Name="Bearing Ball" ListPrice="0.0000" />
  7. <row ProductNumber="BE-2349" Name="BB Ball Bearing" ListPrice="0.0000" />
  8. ------------------------------------------------------------------------------
  9. SELECT ProductNumber,Name,ListPrice,Color FROM Production.Product
  12. <Production.Product ProductNumber="AR-5381" Name="Adjustable Race" ListPrice="0.0000" />
  13. <Production.Product ProductNumber="BA-8327" Name="Bearing Ball" ListPrice="0.0000" />
  14. <Production.Product ProductNumber="BE-2349" Name="BB Ball Bearing" ListPrice="0.0000" />
  15. ------------------------------------------------------------------------------
  16. SELECT ProductNumber,Name,ListPrice,Color FROM Production.Product
  19. --Msg 6803, Level 16, State 1, Line 1
  20. --FOR XML EXPLICIT requires the first column to hold positive integers that represent XML tag IDs.
  21. ------------------------------------------------------------------------------
  22. SELECT ProductNumber,Name,ListPrice,Color FROM Production.Product
  24. <row>
  25.   <ProductNumber>AR-5381</ProductNumber>
  26.   <Name>Adjustable Race</Name>
  27.   <ListPrice>0.0000</ListPrice>
  28. </row>
  29. <row>
  30. ------------------------------------------------------------------------------

You can see that three FOR XML type returns different XML structure (RAW, AUTO, PATH) and one returns error message (EXPLICIT). To explain behavior of each type we need to look at them more precisely.

This mode returns a  single XML fragment (not whole document) which consist of many <row ... /> nodes with many attributes (each column or alias from base SELECT command related to one attribute in output XML) but this was only default behavior. We may tell SQL SERVER to generate sub-nodes insted of attributes by using FOR XML RAW, ELEMENTS clause.

Code Snippet
  1. SELECT ProductNumber,Name,ListPrice,Color FROM Production.Product
  4. <row>
  5.   <ProductNumber>AR-5381</ProductNumber>
  6.   <Name>Adjustable Race</Name>
  7.   <ListPrice>0.0000</ListPrice>
  8. </row>
  9. <row>
  10.   <ProductNumber>BA-8327</ProductNumber>
  11.   <Name>Bearing Ball</Name>
  12.   <ListPrice>0.0000</ListPrice>
  13. </row>
We may use much more options in RAW mode and these are:

  • FOR XML RAW ('custom_name') - rename <row />  node to name given in brackets but we can using white space in node name
  • FOR XML RAW, ROOT - result XML has exactly one root node. By default root is named <root /> but this may be changes to custom name in the same way as in case <row/> (ex. FOR XML RAW ('node'), ROOT ('my_root'))
If You want to use RAW mode to obtain hierarchical structure this is not god idea....

This mode by default creates list on nodes without root, but unlike the RAW by default generates nodes named the same as in SELECT statement. Additionally allow to create nesting structure of XML document depends of the structure of the SELECT. The root,elements and custom root name rules are the  same as in RAW mode. To demonstrate how AUTO mode works lets focus on following example.

Code Snippet
  1. SELECT pm.Name, p.ProductNumber,p.Name,p.Color
  2. FROM Production.ProductModel as pm
  3. JOIN Production.Product as p on p.ProductModelID = pm.ProductModelID
  4. FOR XML AUTO,ROOT ('Inventory')
  6. <Inventory>
  7.   <pm Name="HL Road Frame">
  8.     <p ProductNumber="FR-R92B-58" Name="HL Road Frame - Black, 58" />
  9.     <p ProductNumber="FR-R92R-58" Name="HL Road Frame - Red, 58" Color="Red" />
  10.   </pm>
  11.   <pm Name="Sport-100">
  12.     <p ProductNumber="HL-U509-R" Name="Sport-100 Helmet, Red" Color="Red" />
  13.     <p ProductNumber="HL-U509" Name="Sport-100 Helmet, Black" Color="Black" />
  14.   </pm>
  15.   <pm Name="Mountain Bike Socks">
  16.     <p ProductNumber="SO-B909-M" Name="Mountain Bike Socks, M" Color="White" />
  17.   </pm>
  18. </Inventory>

As we can see by using AUTO mode we are able to generate hierarchical XML  structure. But here we have several problem to fix. Firstly we not descriptive node names. This because AUTO mode uses name/aliases from original SELECT to name result nodes. To fix this just use more descriptive aliases for columns. Second things is sorting, because we don`t use any ordering nodes in result XML are non sorted and whole document looks  like unpleasant. Just use ORDER BY clause to fix problem. Last problem is that   first product  'FR-R92B-58' in their node haven`t attribute Color. Such thing may occur if value in database was set to NULL and SQL SERVER omit this value. Fix is very simple, just add special keyword XSINIL (we can use this only with ELEMENTS directive) at the end of FOR XML - this will add the schema to handle  xsi:nil="true". After  all changes result presents as follow.

Code Snippet
  1. SELECT [ProductModel].Name, [Product].ProductNumber,[Product].Name,[Product].Color
  2. FROM Production.ProductModel as [ProductModel]
  3. JOIN Production.Product as [Product] on [Product].ProductModelID = [ProductModel].ProductModelID
  4. ORDER BY [ProductModel].Name,[Product].Name
  7. <inventory xmlns:xsi="">
  8.   <ProductModel>
  9.     <Name>All-Purpose Bike Stand</Name>
  10.     <Product>
  11.       <ProductNumber>ST-1401</ProductNumber>
  12.       <Name>All-Purpose Bike Stand</Name>
  13.       <Color xsi:nil="true" />
  14.     </Product>
  15.   </ProductModel>
  16.   <ProductModel>
  17.     <Name>Bike Wash</Name>
  18.     <Product>
  19.       <ProductNumber>CL-9009</ProductNumber>
  20.       <Name>Bike Wash - Dissolver</Name>
  21.       <Color xsi:nil="true" />
  22.     </Product>
  23.   </ProductModel>
  24.   <ProductModel>
  25.     <Name>Cable Lock</Name>
  26.     <Product>
  27.       <ProductNumber>LO-C100</ProductNumber>
  28.       <Name>Cable Lock</Name>
  29.       <Color xsi:nil="true" />
  30.     </Product>
  31.   </ProductModel>
  32.   <ProductModel>
  33.     <Name>Chain</Name>
  34.     <Product>
  35.       <ProductNumber>CH-0234</ProductNumber>
  36.       <Name>Chain</Name>
  37.       <Color>Silver</Color>
  38.     </Product>
  39.   </ProductModel>
  40.   <ProductModel>
  41.     <Name>Classic Vest</Name>
  42.     <Product>
  43.       <ProductNumber>VE-C304-L</ProductNumber>
  44.       <Name>Classic Vest, L</Name>
  45.       <Color>Blue</Color>
  46.     </Product>
  47.   </ProductModel>
  48. </inventory>

This mode gives the greatest control over result XML document but requires work the most. It`s good in the situation where you want to have hierarchical and complex document. Because this mode requires special table structure returned by SELECT  statement good idea is firstly create appropriate table and then add for command to construct XML. But what 'special table structure' does mean? As we can see in picture 1. if we want to use EXPLICIT mode we have to create table that contains:

  • TAG column at first position: this column is used by SQL SERVER to detect nesting level of each row.
  • Parent: this column at second position tell SQL SERVER into which node each of children node is located. For top level nodes (not root node!) we pass NULL and for each next node in nesting hierarchy we increment this value at 1 point.
  • Rest of columns: each columns from position 3 in table  will be converted into node or attribute in result XML. But as You can see naming of those columns play the very important role and general pattern of naming looks NodeName!TagID!AttributeName for store value as attribute or  NodeName!TagID!AttributeName!Element for store value as element text - please notice that in this case 'Element' is constant.

Pic.1. 'Special table structure'.
Now when we have our 'special table' we can add FOR XML EXPLICIT clause to produce XML  document (additionally with the root element). Please remembaer abut sorting to produce more readable output.

Code Snippet
  1. SELECT 1 as TAG, NULL as Parent,
  2.              [ProductModel].ProductModelID as [Model!1!ID],
  3.              [ProductModel].Name as [Model!1!Name],
  4.              NULL as [Product!2!Number!Element],
  5.              NULL as [Product!2!Name],
  6.              NULL as [Product!2!Color]
  7. FROM Production.ProductModel as [ProductModel]
  8. JOIN Production.Product as [Product] on [Product].ProductModelID = [ProductModel].ProductModelID
  10. SELECT 2, 1,
  11.             [ProductModel].ProductModelID,
  12.              NULL,
  13.              [Product].ProductNumber,
  14.              [Product].Name,
  15.              [Product].Color
  16. FROM Production.ProductModel as [ProductModel]
  17. JOIN Production.Product as [Product] on [Product].ProductModelID = [ProductModel].ProductModelID
  18. ORDER BY [Model!1!ID]
  19. FOR XML EXPLICIT, ROOT ('Inventory')
  21. <Inventory>
  22.   <Model ID="1" Name="Classic Vest">
  23.     <Product Name="Classic Vest, S" Color="Blue">
  24.       <Number>VE-C304-S</Number>
  25.     </Product>
  26.     <Product Name="Classic Vest, M" Color="Blue">
  27.       <Number>VE-C304-M</Number>
  28.     </Product>
  29.     <Product Name="Classic Vest, L" Color="Blue">
  30.       <Number>VE-C304-L</Number>
  31.     </Product>
  32.     <Product Name="AWC Logo Cap" Color="Multi">
  33.       <Number>CA-1098</Number>
  34.     </Product>
  35.   </Model>
  36.   <Model ID="2" Name="Cycling Cap" />
  37.   <Model ID="3" Name="Full-Finger Gloves">
  38.     <Product Name="Full-Finger Gloves, S" Color="Black">
  39.       <Number>GL-F110-S</Number>
  40.     </Product>
  41.     <Product Name="Full-Finger Gloves, M" Color="Black">
  42.       <Number>GL-F110-M</Number>
  43.     </Product>
  44.     <Product Name="Full-Finger Gloves, L" Color="Black">
  45.       <Number>GL-F110-L</Number>
  46.     </Product>
  47.     <Product Name="Half-Finger Gloves, S" Color="Black">
  48.       <Number>GL-H102-S</Number>
  49.     </Product>
  50.     <Product Name="Half-Finger Gloves, M" Color="Black">
  51.       <Number>GL-H102-M</Number>
  52.     </Product>
  53.     <Product Name="Half-Finger Gloves, L" Color="Black">
  54.       <Number>GL-H102-L</Number>
  55.     </Product>
  56.   </Model>
  57. </Inventory>

We can read in MSDN 'PATH mode is also a simpler way to introduce additional nesting for representing complex properties' and whole is true. PATH mode is much more easier way than EXPLICITY mode to create complex XML. This mode based on additional technology called XPATH which allow to query XML data but in this case reverse application was used. Its  means that XML document hierarchy is contructed at runtime based on column name from the underlying SELECT statement.

The easiest way  to introduce custom attributes to XML output is to add column aliases to each column in base SELECT statement (result as below). The problem with outputted XML is that we have a lot of <row /> node which is obviously not a professional, additionally we haven`t root node but as You can see by using @ before column alias we may create a attribute of current node ex. @Name. Next problem is that we don`t have any nested nodes because if we add some XPATX name like 'node/test' the result will be far from true.
Code Snippet
  1. SELECT         [ProductModel].ProductModelID as '@ID',
  2.              [ProductModel].Name as '@Name'
  3. FROM Production.ProductModel as [ProductModel]
  4. ORDER BY  [ProductModel].ProductModelID
  7. <row ID="1" Name="Classic Vest" />
  8. <row ID="2" Name="Cycling Cap" />
  9. <row ID="3" Name="Full-Finger Gloves" />
  10. <row ID="4" Name="Half-Finger Gloves" />
  11. <row ID="5" Name="HL Mountain Frame" />
  12. <row ID="6" Name="HL Road Frame" />
  13. <row ID="7" Name="HL Touring Frame" />

To fix problems listed above we have to make some changes in base SELECT query. Let`s assume that we want to have root node named 'Models'. As You can read earlier in this article this problem can be solve by adding ROOT('Models') at the end of FOR XML PATH statement. Furthermore we want to remove odd <row /> nodes from output XML replacing them with 'Model' nodes - this can be done easily by adding  name after FOR XML PATH ex. FOR XML PATH ('Model'). Last this we want to do is to attach products list in each 'Model' node. This is not as intuitive as we can think and to solve this problem we must be familiar  with TYPE  SQL keyword. By using it we can create SQL type content (hyperlinked in Management Studio) in sub-query so we are able to return base data time in the same row as XML. But in this case we need to use TYPE keyword to create sub-nodes collection in our output document. Because we cant to add list of products to each product model (example below) we need to use sub-query inside in which we want to make a list of products where each products contains two attributes (name and color) and list (sub-nodes) of  product number. To achieve this we need to make a sub-query in our SELECT statement and inside it we have to make correlated query (in example. [Product].ProductModelID = [ProductModel].ProductModelID). Whole sub-query result need an alias which will be parent node for whole product list in output example. Our sub-query need to return XML element so to to this right we have to use TYPE keyword with FOR XML PATH mode.

Code Snippet
  1. SELECT         [ProductModel].ProductModelID as '@ID',
  2.              [ProductModel].Name as '@Name',
  3.              (SELECT
  4.                  [Product].Name  as '@Name',
  5.                  [Product].Color as '@Color',
  6.                  [Product].ProductNumber as 'Number'
  7.               FROM  Production.Product [Product]
  8.               WHERE [Product].ProductModelID = [ProductModel].ProductModelID
  9.               FOR XML PATH ('Product'), TYPE
  10.               ) as 'Products'
  11. FROM Production.ProductModel as [ProductModel]
  12.   ORDER BY[ProductModel].ProductModelID
  13. FOR XML PATH('Model'),ROOT('Models')
  15. <Models>
  16.   <Model ID="1" Name="Classic Vest">
  17.     <Products>
  18.       <Product Name="Classic Vest, S" Color="Blue">
  19.         <Number>VE-C304-S</Number>
  20.       </Product>
  21.       <Product Name="Classic Vest, M" Color="Blue">
  22.         <Number>VE-C304-M</Number>
  23.       </Product>
  24.       <Product Name="Classic Vest, L" Color="Blue">
  25.         <Number>VE-C304-L</Number>
  26.       </Product>
  27.     </Products>
  28.   </Model>
  29.   <Model ID="2" Name="Cycling Cap">
  30.     <Products>
  31.       <Product Name="AWC Logo Cap" Color="Multi">
  32.         <Number>CA-1098</Number>
  33.       </Product>
  34.     </Products>
  35.   </Model>
  36. </Models>

After making appropriate adjustments our query is ready to deploy on server to returns pretty looks XML document.

Thank You...

Popular posts from this blog

Playing with a .NET types definition

In the last few days I spent some time trying to unify structure of one of the project I`m currently working on. Most of the changes were about changing variable types because it`s were not used right way. That is why in this post I want to share my observations and practices with you. First of all we need to understand what ' variable definition ' is and how it`s different from ' variable initialization '. This part should be pretty straightforward:   variable definition  consist of data type and variable name only <data_type> <variable_name> ; for example int i ; . It`s important to understand how variable definition affects your code because it behaves differently depends weather you work with value or reference types. In the case of value types after defining variable it always has default value and it`s never null value. However after defined reference type variable without initializing it has null value by default. variable initialization  is

Using Newtonsoft serializer in CosmosDB client

Problem In some scenarios engineers might want to use a custom JSON serializer for documents stored in CosmosDB.  Solution In CosmosDBV3 .NET Core API, when creating an instance of  CosmosClient one of optional setting in  CosmosClientOptions is to specify an instance of a Serializer . This serializer must be JSON based and be of  CosmosSerializer type. This means that if a custom serializer is needed this should inherit from CosmosSerializer abstract class and override its two methods for serializing and deserializing of an object. The challenge is that both methods from  CosmosSerializer are stream based and therefore might be not as easy to implement as engineers used to assume - still not super complex.  For demonstration purpose as or my custom serializer I'm going to use Netwonsoft.JSON library. Firstly a new type is needed and this must inherit from  CosmosSerializer.  using  Microsoft.Azure.Cosmos; using  Newtonsoft.Json; using  System.IO; using  System.Text; ///   <

Using Hortonworks Hive in .NET

A few months ago I decided to learn a big data. This sounds very complex and of course it is. All these strange names which actually tells nothing to person who is new in these area combined with different way of looking at data storage makes entire topic even more complex. However after reading N blogs and watching many, many tutorials today I finally had a chance to try to write some code. As in last week I managed to setup a Hortonworks distribution of Hadoop today I decided to connect to it from my .NET based application and this is what I will describe in this post. First things first I didn`t setup entire Hortonworks ecosystem from scratch - I`d love to but for now it`s far beyond my knowledge thus I decided to use a sandbox environment provided by Hortonworks. There are multiple different VMs available to download but in my case I`ve choose a Hyper-V. More about setting this environment up you can read here . Picture 1. Up and running sandbox environment. Now whe