Skip to main content

Processing XML i SQL SERVER 2008

Many times our application uses data from external suppliers. These data, mostly recived by the Internet, is written in XML format and has different from our data mode structure. But it`s still very important for us so we want to process them by extracting data from. Of course we may procesing XML  documents in CLR or simple .NET project and than pass them to database but we should this about performance of each our solution. For example if we recieved 100MB XML  document from supplier and we processed them we still need to send those data to our database which means  that we have to pass data by the network- it`s very costly...

Now assume that the same operation, connected with proccessing XMLdocuemnt, can be done in SQL SERVER side. Looks great don`t You? So let`s begin.

First of all we should learn something more about three things:
  • master.dbo.sp_xml_preparedocument: start preparing passed text as XML document with checking document integrity. The first (OUTPUT type) parameter 'idoc' returns handle to XML cached in memory.
  • OPENXML (keyword): used to generate table from XML parameters passed as a handle to a file in memory. Allow to processing XML  document fragment.
  • master.dbo.sp_xml_removedocument: removes all information conected with passed 'idoc'  handler.
Now we are able to use elements enumerated above to create simple importing stored procedure. Let`s assumed that we are going to import the following XML document fragment (not entire XML document!!):

Code Snippet
  1. DECLARE @t as ='<Car Brand="Audi">
  2.   <Model Name="A1">
  3.     <Type TypeName="Sendan">
  4.       <EngineType Vol="1.6" Fuel="Benzine" Version="Standard" BasePrince="80000" />
  5.       <EngineType Vol="1.8" Fuel="Benzine" Version="Standard" BasePrince="85000" />
  6.       <EngineType Vol="1.8" Fuel="Benzine" Version="Full" BasePrince="95000" />
  7.       <EngineType Vol="1.9" Fuel="Diseal" Version="Standard" BasePrince="95000" />
  8.       <EngineType Vol="1.9" Fuel="Diseal" Version="Full" BasePrince="105000" />
  9.     </Type>
  10.     <Type TypeName="Coupe">
  11.       <EngineType Vol="1.6" Fuel="Benzine" Version="Standard" BasePrince="81000" />
  12.       <EngineType Vol="1.8" Fuel="Benzine" Version="Standard" BasePrince="86000" />
  13.       <EngineType Vol="1.8" Fuel="Benzine" Version="Full" BasePrince="96000" />
  14.       <EngineType Vol="1.9" Fuel="Diseal" Version="Standard" BasePrince="96000" />
  15.       <EngineType Vol="1.9" Fuel="Diseal" Version="Full" BasePrince="106000" />
  16.     </Type>
  17.   </Model>
  18.   <Model Name="A4">
  19.     <Type TypeName="Sendan">
  20.       <EngineType Vol="1.6" Fuel="Benzine" Version="Standard" BasePrince="110000" />
  21.       <EngineType Vol="1.8" Fuel="Benzine" Version="Standard" BasePrince="115000" />
  22.       <EngineType Vol="1.8" Fuel="Benzine" Version="Full" BasePrince="115000" />
  23.       <EngineType Vol="1.9" Fuel="Diseal" Version="Standard" BasePrince="115000" />
  24.       <EngineType Vol="1.9" Fuel="Diseal" Version="Full" BasePrince="125000" />
  25.     </Type>
  26.     <Type TypeName="AllRoad">
  27.       <EngineType Vol="1.6" Fuel="Benzine" Version="Standard" BasePrince="110000" />
  28.       <EngineType Vol="1.8" Fuel="Benzine" Version="Standard" BasePrince="115000" />
  29.       <EngineType Vol="1.8" Fuel="Benzine" Version="Full" BasePrince="115000" />
  30.       <EngineType Vol="1.9" Fuel="Diseal" Version="Standard" BasePrince="115000" />
  31.       <EngineType Vol="1.9" Fuel="Diseal" Version="Full" BasePrince="125000" />
  32.     </Type>
  33.   </Model>
  34. </Car>';

Now its time for out table. For this example there is only one table (noncompilant with 2NF and 3NF!).

Code Snippet
  1. CREATE TABLE dbo.Cars
  2. (
  3. CarID int IDENTITY(1,1) PRIMARY KEY,
  4. CarBrand nvarchar(50) not null,
  5. ModelName nvarchar(50) not null,
  6. TypName nvarchar(50) not null,
  7. Engine float not null,
  8. FuelType nvarchar(10) not null,
  9. CarVersion nvarchar(50) not null,
  10. BasePrince int not null
  11. )
  12. GO;

Code Snippet
  1. CREATE PROCEDURE dbo.ImportCars
  2.     @data xml
  3. AS
  4. BEGIN
  5.      DECLARE @handle int; --handler declaration
  6.  
  7.      --Preparing document
  8.      EXEC master.dbo.sp_xml_preparedocument @handle OUTPUT, @data;
  9.  
  10.     --Reading XML  and inserting selected values
  11.      INSERT INTO dbo.Cars(CarBrand,ModelName,TypName,
  12.         Engine, FuelType, CarVersion ,BasePrince)
  13.      SELECT * FROM OPENXML(@handle, 'Car/Model/Type/EngineType')
  14.      WITH   (CarBrand   varchar(50) '../../../@Brand', --three nodes  up
  15.              Model        varchar(50) '../../@Name', --two nodes  up
  16.              TypeName   varchar(50) '../@TypeName', --one node up
  17.              Engine     float       '@Vol', --current node attribute
  18.              Fuel       nvarchar(10)'@Fuel',
  19.              CarVersion nvarchar(50)'@Version',
  20.              Price      int         '@BasePrince')
  21.  
  22.      --remove XML  from memory
  23.      EXEC master.dbo.sp_xml_removedocument @handle;
  24. END


Now lets try our procedure:



  EXEC dbo.ImportCars @t
 SELECT * FROM dbo.Cars

Thank You.

Popular posts from this blog

Persisting Enum in database with Entity Framework

Problem statement We all want to write clean code and follow best coding practices. This all engineers 'North Star' goal which in many cases can not be easily achievable because of many potential difficulties with converting our ideas/good practices into working solutions.  One of an example I recently came across was about using ASP.NET Core and Entity Framework 5 to store Enum values in a relational database (like Azure SQL). Why is this a problem you might ask... and my answer here is that you want to work with Enum types in your code but persist an integer in your databases. You can think about in that way. Why we use data types at all when everything could be just a string which is getting converted into a desirable type when needed. This 'all-string' approach is of course a huge anti-pattern and a bad practice for many reasons with few being: degraded performance, increased storage space, increased code duplication.  Pre-requirements 1. Status enum type definition...

Multithread processing of the SqlDataReader - Producer/Consumer design pattern

In today post I want to describe how to optimize usage of a ADO.NET SqlDataReader class by using multi-threading. To present that lets me introduce a problem that I will try to solve.  Scenario : In a project we decided to move all data from a multiple databases to one data warehouse. It will be a good few terabytes of data or even more. Data transfer will be done by using a custom importer program. Problem : After implementing a database agnostic logic of generating and executing a query I realized that I can retrieve data from source databases faster that I can upload them to big data store through HTTP client -importer program. In other words, data reader is capable of reading data faster then I can process it an upload to my big data lake. Solution : As a solution for solving this problem I would like to propose one of a multi-thread design pattern called Producer/Consumer . In general this pattern consists of a two main classes where: Producer class is res...

Creating common partial class with Entity Framework

When we use the Entity Framework (EF) in multilayer information systems sometimes we want to extend classes generated by EF by adding some common properties or functions. Such operation can`t be conduct on *.edmx data model so we need to make some improvement in our solution. Let`s begin... Lets assumed that in our soulution we have only three layer (three project): Client console application which has reference to the second layer  - ' ConsoleApplication ' project name Class library project with class interfaces only - ' Interfaces ' project name Class library class implementation and data model referenced to 'Interfaces' project - ' Classes ' project name. Picture 1. Solution structure. Now when we have all solution structure we can focus on data model. In the ' Classes ' project we create a new folder named ' Model ' and inside add new item of ADO.NET Entity Data Model named ' Learning.edmx ' - it may be empty ...