Skip to main content

Processing XML i SQL SERVER 2008

Many times our application uses data from external suppliers. These data, mostly recived by the Internet, is written in XML format and has different from our data mode structure. But it`s still very important for us so we want to process them by extracting data from. Of course we may procesing XML  documents in CLR or simple .NET project and than pass them to database but we should this about performance of each our solution. For example if we recieved 100MB XML  document from supplier and we processed them we still need to send those data to our database which means  that we have to pass data by the network- it`s very costly...

Now assume that the same operation, connected with proccessing XMLdocuemnt, can be done in SQL SERVER side. Looks great don`t You? So let`s begin.

First of all we should learn something more about three things:
  • master.dbo.sp_xml_preparedocument: start preparing passed text as XML document with checking document integrity. The first (OUTPUT type) parameter 'idoc' returns handle to XML cached in memory.
  • OPENXML (keyword): used to generate table from XML parameters passed as a handle to a file in memory. Allow to processing XML  document fragment.
  • master.dbo.sp_xml_removedocument: removes all information conected with passed 'idoc'  handler.
Now we are able to use elements enumerated above to create simple importing stored procedure. Let`s assumed that we are going to import the following XML document fragment (not entire XML document!!):

Code Snippet
  1. DECLARE @t as ='<Car Brand="Audi">
  2.   <Model Name="A1">
  3.     <Type TypeName="Sendan">
  4.       <EngineType Vol="1.6" Fuel="Benzine" Version="Standard" BasePrince="80000" />
  5.       <EngineType Vol="1.8" Fuel="Benzine" Version="Standard" BasePrince="85000" />
  6.       <EngineType Vol="1.8" Fuel="Benzine" Version="Full" BasePrince="95000" />
  7.       <EngineType Vol="1.9" Fuel="Diseal" Version="Standard" BasePrince="95000" />
  8.       <EngineType Vol="1.9" Fuel="Diseal" Version="Full" BasePrince="105000" />
  9.     </Type>
  10.     <Type TypeName="Coupe">
  11.       <EngineType Vol="1.6" Fuel="Benzine" Version="Standard" BasePrince="81000" />
  12.       <EngineType Vol="1.8" Fuel="Benzine" Version="Standard" BasePrince="86000" />
  13.       <EngineType Vol="1.8" Fuel="Benzine" Version="Full" BasePrince="96000" />
  14.       <EngineType Vol="1.9" Fuel="Diseal" Version="Standard" BasePrince="96000" />
  15.       <EngineType Vol="1.9" Fuel="Diseal" Version="Full" BasePrince="106000" />
  16.     </Type>
  17.   </Model>
  18.   <Model Name="A4">
  19.     <Type TypeName="Sendan">
  20.       <EngineType Vol="1.6" Fuel="Benzine" Version="Standard" BasePrince="110000" />
  21.       <EngineType Vol="1.8" Fuel="Benzine" Version="Standard" BasePrince="115000" />
  22.       <EngineType Vol="1.8" Fuel="Benzine" Version="Full" BasePrince="115000" />
  23.       <EngineType Vol="1.9" Fuel="Diseal" Version="Standard" BasePrince="115000" />
  24.       <EngineType Vol="1.9" Fuel="Diseal" Version="Full" BasePrince="125000" />
  25.     </Type>
  26.     <Type TypeName="AllRoad">
  27.       <EngineType Vol="1.6" Fuel="Benzine" Version="Standard" BasePrince="110000" />
  28.       <EngineType Vol="1.8" Fuel="Benzine" Version="Standard" BasePrince="115000" />
  29.       <EngineType Vol="1.8" Fuel="Benzine" Version="Full" BasePrince="115000" />
  30.       <EngineType Vol="1.9" Fuel="Diseal" Version="Standard" BasePrince="115000" />
  31.       <EngineType Vol="1.9" Fuel="Diseal" Version="Full" BasePrince="125000" />
  32.     </Type>
  33.   </Model>
  34. </Car>';

Now its time for out table. For this example there is only one table (noncompilant with 2NF and 3NF!).

Code Snippet
  1. CREATE TABLE dbo.Cars
  2. (
  3. CarID int IDENTITY(1,1) PRIMARY KEY,
  4. CarBrand nvarchar(50) not null,
  5. ModelName nvarchar(50) not null,
  6. TypName nvarchar(50) not null,
  7. Engine float not null,
  8. FuelType nvarchar(10) not null,
  9. CarVersion nvarchar(50) not null,
  10. BasePrince int not null
  11. )
  12. GO;

Code Snippet
  1. CREATE PROCEDURE dbo.ImportCars
  2.     @data xml
  3. AS
  4. BEGIN
  5.      DECLARE @handle int; --handler declaration
  6.  
  7.      --Preparing document
  8.      EXEC master.dbo.sp_xml_preparedocument @handle OUTPUT, @data;
  9.  
  10.     --Reading XML  and inserting selected values
  11.      INSERT INTO dbo.Cars(CarBrand,ModelName,TypName,
  12.         Engine, FuelType, CarVersion ,BasePrince)
  13.      SELECT * FROM OPENXML(@handle, 'Car/Model/Type/EngineType')
  14.      WITH   (CarBrand   varchar(50) '../../../@Brand', --three nodes  up
  15.              Model        varchar(50) '../../@Name', --two nodes  up
  16.              TypeName   varchar(50) '../@TypeName', --one node up
  17.              Engine     float       '@Vol', --current node attribute
  18.              Fuel       nvarchar(10)'@Fuel',
  19.              CarVersion nvarchar(50)'@Version',
  20.              Price      int         '@BasePrince')
  21.  
  22.      --remove XML  from memory
  23.      EXEC master.dbo.sp_xml_removedocument @handle;
  24. END


Now lets try our procedure:



  EXEC dbo.ImportCars @t
 SELECT * FROM dbo.Cars

Thank You.

Popular posts from this blog

Persisting Enum in database with Entity Framework

Problem statement We all want to write clean code and follow best coding practices. This all engineers 'North Star' goal which in many cases can not be easily achievable because of many potential difficulties with converting our ideas/good practices into working solutions.  One of an example I recently came across was about using ASP.NET Core and Entity Framework 5 to store Enum values in a relational database (like Azure SQL). Why is this a problem you might ask... and my answer here is that you want to work with Enum types in your code but persist an integer in your databases. You can think about in that way. Why we use data types at all when everything could be just a string which is getting converted into a desirable type when needed. This 'all-string' approach is of course a huge anti-pattern and a bad practice for many reasons with few being: degraded performance, increased storage space, increased code duplication.  Pre-requirements 1. Status enum type definition...

Using Newtonsoft serializer in CosmosDB client

Problem In some scenarios engineers might want to use a custom JSON serializer for documents stored in CosmosDB.  Solution In CosmosDBV3 .NET Core API, when creating an instance of  CosmosClient one of optional setting in  CosmosClientOptions is to specify an instance of a Serializer . This serializer must be JSON based and be of  CosmosSerializer type. This means that if a custom serializer is needed this should inherit from CosmosSerializer abstract class and override its two methods for serializing and deserializing of an object. The challenge is that both methods from  CosmosSerializer are stream based and therefore might be not as easy to implement as engineers used to assume - still not super complex.  For demonstration purpose as or my custom serializer I'm going to use Netwonsoft.JSON library. Firstly a new type is needed and this must inherit from  CosmosSerializer.  using  Microsoft.Azure.Cosmos; using  Newtonsoft.Json; usin...

Multithread processing of the SqlDataReader - Producer/Consumer design pattern

In today post I want to describe how to optimize usage of a ADO.NET SqlDataReader class by using multi-threading. To present that lets me introduce a problem that I will try to solve.  Scenario : In a project we decided to move all data from a multiple databases to one data warehouse. It will be a good few terabytes of data or even more. Data transfer will be done by using a custom importer program. Problem : After implementing a database agnostic logic of generating and executing a query I realized that I can retrieve data from source databases faster that I can upload them to big data store through HTTP client -importer program. In other words, data reader is capable of reading data faster then I can process it an upload to my big data lake. Solution : As a solution for solving this problem I would like to propose one of a multi-thread design pattern called Producer/Consumer . In general this pattern consists of a two main classes where: Producer class is res...