Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

STRING_SPLIT with string position

Tags:

sql

sql-server

I have a table with two columns of comma-separated strings. The way the data is formatted, the number of comma-separated items in both columns is equal, and the first value in colA is related to the first value in colB, and so on. (It's obviously not a very good data format, but it's what I'm working with.)

If I have the following row (PrimaryKeyID | column1 | column2):

1 | a,b,c | A,B,C

then in this data format, a & 1 are logically related, b & 2, etc.

I want to use STRING_SPLIT to split these columns, but using it twice obviously crosses them with each other, resulting in a total of 9 rows.

1 | a | A
1 | b | A    
1 | c | A    
1 | a | B    
1 | b | B    
1 | c | B    
1 | a | C
1 | b | C    
1 | c | C

What I want is just the 3 "logically-related" columns

1 | a | A
1 | b | B
1 | c | C

However, STRING_SPLIT(myCol,',') doesn't appear to save the String Position anywhere.

I have done the following:

SELECT tbl.ID,
      t1.Column1Value,
      t2.Column2Value
FROM myTable tbl
INNER JOIN (
   SELECT t.ID, 
       ss.value AS Column1Value, 
       ROW_NUMBER() OVER (PARTITION BY t.ID ORDER BY t.ID) as StringOrder
   FROM myTable t
   CROSS APPLY STRING_SPLIT(t.column1,',') ss
) t1 ON tbl.ID = t1.ID
INNER JOIN (
   SELECT t.ID, 
       ss.value AS Column2Value, 
       ROW_NUMBER() OVER (PARTITION BY t.ID ORDER BY t.ID) as StringOrder
   FROM myTable t
   CROSS APPLY STRING_SPLIT(t.column2,',') ss
) t1 ON tbl.ID = t2.ID AND t1.StringOrder = t2.StringOrder

This appears to work on my small test set, but in my opinion there is no reason to expect it to work guaranteed every time. The ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) is obviously a meaningless ordering, but it appears that, in absence of any real ordering, STRING_SPLIT is returning the values in the "default" order that they were already in. Is this "expected" behaviour? Can I count on this? Is there any other way of accomplishing what I'm attempting to do?

I got what I wanted (I think) with the following UDF. However it's pretty slow. Any suggestions?

CREATE FUNCTION fn.f_StringSplit(@string VARCHAR(MAX),@delimiter VARCHAR(1))
RETURNS @r TABLE
(
    Position INT,
    String VARCHAR(255)
)
AS
BEGIN

    DECLARE @current_position INT
    SET @current_position = 1

    WHILE CHARINDEX(@delimiter,@string) > 0 BEGIN
    
        INSERT INTO @r (Position,String) VALUES (@current_position, SUBSTRING(@string,1,CHARINDEX(@delimiter,@string) - 1))

        SET @current_position = @current_position + 1
        SET @string = SUBSTRING(@string,CHARINDEX(@delimiter,@string) + 1, LEN(@string) - CHARINDEX(@delimiter,@string))
        
    END

    --add the last one
    INSERT INTO @r (Position, String) VALUES(@current_position,@string)

    RETURN
END
like image 853
mark Avatar asked Nov 06 '25 01:11

mark


1 Answers

The only way I've discovered to expressively maintain the order of the String_Split() function this is using the Row_Number() function with a literal value in the "order by".

For example:

declare @Version nvarchar(128)
set @Version = '1.2.3';

with V as (select value v, Row_Number() over (order by (select 0)) n from String_Split(@Version, '.'))
    select
        (select v from V where n = 1) Major,
        (select v from V where n = 2) Minor,
        (select v from V where n = 3) Revision

Returns:

Major Minor Revision
----- ----- ---------
1     2     3        

Update: if you are using a newer version of SQL Server, you can now provide an optional third bit argument which indicates that and ordinal column should also be included in the result. See my other answer here for more details.

like image 81
N8allan Avatar answered Nov 07 '25 15:11

N8allan