Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to encode XML in T SQL without the additional XML overhead

I have a database which (For whatever reason) has a column containing pipe delimited data.

I want to parse this data quickly, so I've thought of converting this column (nvarchar) into an XML by replacing the pipes with XML attributes and putting it into an XML data typed column somewhere else.

It works, except in the case where that column had a character that required encoding, such a '<' character.

I found I could encode XML using FOR XML clause, however, that appears to inject some XML tags around the data.

For example: (this gives error on bad character)

SELECT CAST('<f>' + replace(value,'|','</f><f>') + '</f>' AS XML)
FROM TABLE

this gives xml encoded value, but wraps it in "< value> < /value>" tag

  SELECT value
   FROM table
  FOR XML PATH('')

Any ideas on how I can get the XML encoded value without this extra tag added, so I can convert the pipe format to XML after it's done (preferably in one swoop)?

EDIT: since people are asking, this is what 5 potential rows of data might look like

foo
foo|bar
foo|bar|1
foo||
baz|

And the results would be

Col1, Col2, Col3
foo,null,null
foo,bar,null
foo,bar,1
foo,null,null
baz,null,null

I'm achieving this by using the resulting XML type in a sub query such as: (it can be up to 4 columns pr 3 pipes in any given row)

    SELECT 
          *,
         x.query('f[1]').value('.','nVarChar(2048)')  Col1
        ,x.query('f[2]').value('.','nVarChar(2048)')  Col2
        ,x.query('f[3]').value('.','nvarchar(2048)')  Col3
        ,x.query('f[4]').value('.','nvarchar(2048)')  Col4
    FROM
    (
        SELECT *,
        CAST('<f>' + REPLACE(Value,'|','</f><f>') + '</f>' AS XML) as x
     FROM  table
    ) y

@srutzky makes a great point. No, I do not need to do XML here at all. If I can find a fast & clean way to parse pipes in a set based operation, I'll do that. Will review the SQL# documentation...

like image 609
drew Avatar asked Jan 22 '26 17:01

drew


2 Answers

SELECT CAST('<values><f>' + 
    REPLACE(
        REPLACE(
            REPLACE(
                REPLACE(
                    REPLACE(value,'&','&amp;')
                    ,'"','&quot;')
                ,'<','&lt;')
            ,'>','&gt;')
        ,'|','</f><f>') + '</f></values>' AS XML)
FROM TABLE;
like image 139
Bruce Dunwiddie Avatar answered Jan 25 '26 10:01

Bruce Dunwiddie


You could try the following BUT you need to make sure the content is "xml safe", in other words the content does not contain values which xml will reject (look into xml element content parsing). Try the following...it's test script to see if it does what you want..

UPDATE:

ok, it might help if I read the question all the way through...2 steps...split the pipes and then xml all the split items...try this:

Create the following function:

CREATE FUNCTION [dbo].[udf_SPLIT]
(
  @s nvarchar(max),
  @trimPieces bit,
  @returnEmptyStrings bit,
  @delimiter nvarchar(10)
)
RETURNS @t TABLE (val nvarchar(max))
AS
BEGIN
    DECLARE @i int, @j int

    SELECT @i = 0, @j = (LEN(@s) - LEN(REPLACE(@s,@delimiter,'')))

    ;WITH cte AS
    (
        SELECT i = @i + 1,
            s = @s,
            n = substring(@s, 0, charindex(@delimiter, @s)),
            m = substring(@s, charindex(@delimiter, @s)+1, len(@s) - charindex(@delimiter, @s))
            UNION ALL
        SELECT i = cte.i + 1,
            s = cte.m, 
            n = substring(cte.m, 0, charindex(@delimiter, cte.m)),
            m = substring(cte.m, charindex(@delimiter, cte.m) + 1, len(cte.m)-charindex(@delimiter, cte.m))
        FROM cte
        WHERE i <= @j
    )
    INSERT INTO @t (val)
        SELECT [pieces]
        FROM (
            SELECT CASE 
                WHEN @trimPieces = 1 THEN LTRIM(RTRIM(CASE WHEN i <= @j THEN n ELSE m END))
                ELSE CASE WHEN i <= @j THEN n ELSE m END
                END AS [pieces]
            FROM cte
            ) t
        WHERE (@returnEmptyStrings = 0 AND LEN(pieces) > 0)
        OR (@returnEmptyStrings = 1)
        OPTION (maxrecursion 0)

RETURN

END

next try the following to test...

DECLARE @str nvarchar(500) = 'test|<html>this</html>|boogie woogie| SDGDSFG| game<br /> on |working| this|'

SELECT REPLACE(
        REPLACE(
            REPLACE(
                REPLACE([val],'&','&amp;')
                ,'"','&quot;')
            ,'<','&lt;')
        ,'>','&gt;')
     AS [f]
FROM [dbo].[udf_SPLIT](@str,1,0,'|')
FOR XML PATH('')

If not totally correct, hopefully will put you on right path...

HTH

Dave

like image 38
Dave Avatar answered Jan 25 '26 09:01

Dave



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!