Skip to main content

Processing XML i SQL SERVER 2008

Many times our application uses data from external suppliers. These data, mostly recived by the Internet, is written in XML format and has different from our data mode structure. But it`s still very important for us so we want to process them by extracting data from. Of course we may procesing XML  documents in CLR or simple .NET project and than pass them to database but we should this about performance of each our solution. For example if we recieved 100MB XML  document from supplier and we processed them we still need to send those data to our database which means  that we have to pass data by the network- it`s very costly...

Now assume that the same operation, connected with proccessing XMLdocuemnt, can be done in SQL SERVER side. Looks great don`t You? So let`s begin.

First of all we should learn something more about three things:
  • master.dbo.sp_xml_preparedocument: start preparing passed text as XML document with checking document integrity. The first (OUTPUT type) parameter 'idoc' returns handle to XML cached in memory.
  • OPENXML (keyword): used to generate table from XML parameters passed as a handle to a file in memory. Allow to processing XML  document fragment.
  • master.dbo.sp_xml_removedocument: removes all information conected with passed 'idoc'  handler.
Now we are able to use elements enumerated above to create simple importing stored procedure. Let`s assumed that we are going to import the following XML document fragment (not entire XML document!!):

Code Snippet
  1. DECLARE @t as ='<Car Brand="Audi">
  2.   <Model Name="A1">
  3.     <Type TypeName="Sendan">
  4.       <EngineType Vol="1.6" Fuel="Benzine" Version="Standard" BasePrince="80000" />
  5.       <EngineType Vol="1.8" Fuel="Benzine" Version="Standard" BasePrince="85000" />
  6.       <EngineType Vol="1.8" Fuel="Benzine" Version="Full" BasePrince="95000" />
  7.       <EngineType Vol="1.9" Fuel="Diseal" Version="Standard" BasePrince="95000" />
  8.       <EngineType Vol="1.9" Fuel="Diseal" Version="Full" BasePrince="105000" />
  9.     </Type>
  10.     <Type TypeName="Coupe">
  11.       <EngineType Vol="1.6" Fuel="Benzine" Version="Standard" BasePrince="81000" />
  12.       <EngineType Vol="1.8" Fuel="Benzine" Version="Standard" BasePrince="86000" />
  13.       <EngineType Vol="1.8" Fuel="Benzine" Version="Full" BasePrince="96000" />
  14.       <EngineType Vol="1.9" Fuel="Diseal" Version="Standard" BasePrince="96000" />
  15.       <EngineType Vol="1.9" Fuel="Diseal" Version="Full" BasePrince="106000" />
  16.     </Type>
  17.   </Model>
  18.   <Model Name="A4">
  19.     <Type TypeName="Sendan">
  20.       <EngineType Vol="1.6" Fuel="Benzine" Version="Standard" BasePrince="110000" />
  21.       <EngineType Vol="1.8" Fuel="Benzine" Version="Standard" BasePrince="115000" />
  22.       <EngineType Vol="1.8" Fuel="Benzine" Version="Full" BasePrince="115000" />
  23.       <EngineType Vol="1.9" Fuel="Diseal" Version="Standard" BasePrince="115000" />
  24.       <EngineType Vol="1.9" Fuel="Diseal" Version="Full" BasePrince="125000" />
  25.     </Type>
  26.     <Type TypeName="AllRoad">
  27.       <EngineType Vol="1.6" Fuel="Benzine" Version="Standard" BasePrince="110000" />
  28.       <EngineType Vol="1.8" Fuel="Benzine" Version="Standard" BasePrince="115000" />
  29.       <EngineType Vol="1.8" Fuel="Benzine" Version="Full" BasePrince="115000" />
  30.       <EngineType Vol="1.9" Fuel="Diseal" Version="Standard" BasePrince="115000" />
  31.       <EngineType Vol="1.9" Fuel="Diseal" Version="Full" BasePrince="125000" />
  32.     </Type>
  33.   </Model>
  34. </Car>';

Now its time for out table. For this example there is only one table (noncompilant with 2NF and 3NF!).

Code Snippet
  1. CREATE TABLE dbo.Cars
  2. (
  3. CarID int IDENTITY(1,1) PRIMARY KEY,
  4. CarBrand nvarchar(50) not null,
  5. ModelName nvarchar(50) not null,
  6. TypName nvarchar(50) not null,
  7. Engine float not null,
  8. FuelType nvarchar(10) not null,
  9. CarVersion nvarchar(50) not null,
  10. BasePrince int not null
  11. )
  12. GO;

Code Snippet
  1. CREATE PROCEDURE dbo.ImportCars
  2.     @data xml
  3. AS
  4. BEGIN
  5.      DECLARE @handle int; --handler declaration
  6.  
  7.      --Preparing document
  8.      EXEC master.dbo.sp_xml_preparedocument @handle OUTPUT, @data;
  9.  
  10.     --Reading XML  and inserting selected values
  11.      INSERT INTO dbo.Cars(CarBrand,ModelName,TypName,
  12.         Engine, FuelType, CarVersion ,BasePrince)
  13.      SELECT * FROM OPENXML(@handle, 'Car/Model/Type/EngineType')
  14.      WITH   (CarBrand   varchar(50) '../../../@Brand', --three nodes  up
  15.              Model        varchar(50) '../../@Name', --two nodes  up
  16.              TypeName   varchar(50) '../@TypeName', --one node up
  17.              Engine     float       '@Vol', --current node attribute
  18.              Fuel       nvarchar(10)'@Fuel',
  19.              CarVersion nvarchar(50)'@Version',
  20.              Price      int         '@BasePrince')
  21.  
  22.      --remove XML  from memory
  23.      EXEC master.dbo.sp_xml_removedocument @handle;
  24. END


Now lets try our procedure:



  EXEC dbo.ImportCars @t
 SELECT * FROM dbo.Cars

Thank You.

Popular posts from this blog

Full-Text Search with PDF in Microsoft SQL Server

Last week I get interesting task to develop. The task was to search input text in PDF file stored in database as FileStream. The task implementation took me some time so I decided to share it with other developers. Here we are going to use SQL Server 2008 R2 (x64 Developers Edition), external driver from Adobe, Full-Text Search technology and FileStream technology.Because this sems a little bit comlicated let`s make this topic clear and do it step by step. 1) Enable FileStream - this part is pretty easy, just check wheter You already have enabled filestream on Your SQL Server instance - if no simply enable it as in the picture below. Picture 1. Enable filestream in SQL Server instance. 2) Create SQL table to store files  - mainly ther will be PDF file stored but some others is also be allright. Out table DocumentFile will be created in dbo schema and contain one column primary key with default value as sequential GUID. Important this is out table contains FileStream

Playing with a .NET types definition

In the last few days I spent some time trying to unify structure of one of the project I`m currently working on. Most of the changes were about changing variable types because it`s were not used right way. That is why in this post I want to share my observations and practices with you. First of all we need to understand what ' variable definition ' is and how it`s different from ' variable initialization '. This part should be pretty straightforward:   variable definition  consist of data type and variable name only <data_type> <variable_name> ; for example int i ; . It`s important to understand how variable definition affects your code because it behaves differently depends weather you work with value or reference types. In the case of value types after defining variable it always has default value and it`s never null value. However after defined reference type variable without initializing it has null value by default. variable initialization  is

Persisting Enum in database with Entity Framework

Problem statement We all want to write clean code and follow best coding practices. This all engineers 'North Star' goal which in many cases can not be easily achievable because of many potential difficulties with converting our ideas/good practices into working solutions.  One of an example I recently came across was about using ASP.NET Core and Entity Framework 5 to store Enum values in a relational database (like Azure SQL). Why is this a problem you might ask... and my answer here is that you want to work with Enum types in your code but persist an integer in your databases. You can think about in that way. Why we use data types at all when everything could be just a string which is getting converted into a desirable type when needed. This 'all-string' approach is of course a huge anti-pattern and a bad practice for many reasons with few being: degraded performance, increased storage space, increased code duplication.  Pre-requirements 1. Status enum type definition