Sometimes solutiona that we create stores data from external suppliers. These data is stored in database and than presents to end-user. The problem occured when our suppliers user XML basen technology to send data package to us. The XML standard do not allow using some special characters in text (node attribute or value) so each occurance of special characters encoded. Each of us should know that and try do decode those entities before wtiring it to database and/or presents to user. But in real many thing may go wrong and in some rows of our product database these signs may appear.
When You recognize the problem You can do three things but only two of them are correct. The first idea (wrong) is the attempt to create a CLR stored procedure or function with the System.Web.HttpUtility.HtmlDecode function. The problem is that You can`t add reference to System.Web in CLR projects! So this idea can not be executed.
Second idea is to create a stand alone console application and implement whole algorithm inside. Those algorithm have to select all records from database(for example 10GB of data) and in each single row do replacement and update. This is very quick to implement but it`s not efficient and hurt the server performance event You scheduled task in night.
The third idea is to create a T-SQL query (stored procedure) that replace each occurance of special characters in each record. The problem is that You need to implement whole dictionary od translation for each special character (key) and its normal counterpart (value).
Example of such dictionary may looks like:
DECLARE @Dictionary TABLE --creating a table variable
(
[key] nvarchar(10), --special character pattern
[value] nvarchar(50) --normal sign
)
--two of examples
INSERT INTO @Dictionary([key], [value]) VALUES ('&','&'),('#039;','''')
After creating a dictionary table variable You can create a cursor (ie.'HtmlReplace') for the table You want to update. Inside of these cursor You need to fetch all dictionary so You have to create a nested cursor (ie. 'TempCursor'). Now, as You can see, we process each single row from our huge table and for each of this row we walkthrough each pair key-value from dictionary and that find every occurance of key which is replaced by value from dictionary. After whole row were processed we can do a simple upadte.
For example. We have single table named dbo.Objects with only two column 'Code' and 'Desc'. In the 'Desc' column we have many rows with special XML charancters. By using T-SQL Query presented below we can replace it using their counterparts.
DECLARE @Code nvarchar(50)
DECLARE @Text nvarchar(max)
DECLARE @CurrentPattern nvarchar(10);
--temporary dictiorany values
DECLARE @key nvarchar(10)
DECLARE @value nvarchar(50)
DECLARE HtmlReplace Cursor FOR
SELECT [Code],[Desc] FROM dbo.[Object]
OPEN HtmlReplace
FETCH NEXT FROM HtmlReplace
INTO @Code, @Text
WHILE @@FETCH_STATUS = 0 --this cursor runs over dbo.Object table
BEGIN
PRINT 'Before:' + @Text
DECLARE TempCursor Cursor FOR
SELECT [key],[value] FROM @Dictionary
OPEN TempCursor
FETCH NEXT FROM TempCursor
INTO @key, @value
WHILE @@FETCH_STATUS = 0 --This cursor tetching ower dictionary table replacing each occurance of current key
BEGIN
SET @CurrentPattern = '%'+@key+'%';
WHILE((SELECT PATINDEX (@CurrentPattern, @Text) )>0)
BEGIN
SET @Text = REPLACE(@Text,@key,@value)
END
FETCH NEXT FROM TempCursor
INTO @key, @value
END
CLOSE TempCursor
DEALLOCATE TempCursor
PRINT 'After:' + @Text
--final update
UPDATE dbo.[Object] SET [Desc] = @Text WHERE [Code] = @Code
FETCH NEXT FROM HtmlReplace
INTO @Code, @Text
END
CLOSE HtmlReplace
DEALLOCATE HtmlReplace
This is only idea...so if You want You can extand this query as a User Defined Function.
When You recognize the problem You can do three things but only two of them are correct. The first idea (wrong) is the attempt to create a CLR stored procedure or function with the System.Web.HttpUtility.HtmlDecode function. The problem is that You can`t add reference to System.Web in CLR projects! So this idea can not be executed.
Second idea is to create a stand alone console application and implement whole algorithm inside. Those algorithm have to select all records from database(for example 10GB of data) and in each single row do replacement and update. This is very quick to implement but it`s not efficient and hurt the server performance event You scheduled task in night.
The third idea is to create a T-SQL query (stored procedure) that replace each occurance of special characters in each record. The problem is that You need to implement whole dictionary od translation for each special character (key) and its normal counterpart (value).
Example of such dictionary may looks like:
DECLARE @Dictionary TABLE --creating a table variable
(
[key] nvarchar(10), --special character pattern
[value] nvarchar(50) --normal sign
)
--two of examples
INSERT INTO @Dictionary([key], [value]) VALUES ('&','&'),('#039;','''')
After creating a dictionary table variable You can create a cursor (ie.'HtmlReplace') for the table You want to update. Inside of these cursor You need to fetch all dictionary so You have to create a nested cursor (ie. 'TempCursor'). Now, as You can see, we process each single row from our huge table and for each of this row we walkthrough each pair key-value from dictionary and that find every occurance of key which is replaced by value from dictionary. After whole row were processed we can do a simple upadte.
For example. We have single table named dbo.Objects with only two column 'Code' and 'Desc'. In the 'Desc' column we have many rows with special XML charancters. By using T-SQL Query presented below we can replace it using their counterparts.
DECLARE @Code nvarchar(50)
DECLARE @Text nvarchar(max)
DECLARE @CurrentPattern nvarchar(10);
--temporary dictiorany values
DECLARE @key nvarchar(10)
DECLARE @value nvarchar(50)
DECLARE HtmlReplace Cursor FOR
SELECT [Code],[Desc] FROM dbo.[Object]
OPEN HtmlReplace
FETCH NEXT FROM HtmlReplace
INTO @Code, @Text
WHILE @@FETCH_STATUS = 0 --this cursor runs over dbo.Object table
BEGIN
PRINT 'Before:' + @Text
DECLARE TempCursor Cursor FOR
SELECT [key],[value] FROM @Dictionary
OPEN TempCursor
FETCH NEXT FROM TempCursor
INTO @key, @value
WHILE @@FETCH_STATUS = 0 --This cursor tetching ower dictionary table replacing each occurance of current key
BEGIN
SET @CurrentPattern = '%'+@key+'%';
WHILE((SELECT PATINDEX (@CurrentPattern, @Text) )>0)
BEGIN
SET @Text = REPLACE(@Text,@key,@value)
END
FETCH NEXT FROM TempCursor
INTO @key, @value
END
CLOSE TempCursor
DEALLOCATE TempCursor
PRINT 'After:' + @Text
--final update
UPDATE dbo.[Object] SET [Desc] = @Text WHERE [Code] = @Code
FETCH NEXT FROM HtmlReplace
INTO @Code, @Text
END
CLOSE HtmlReplace
DEALLOCATE HtmlReplace
This is only idea...so if You want You can extand this query as a User Defined Function.
Thank You