Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can you sort string value or array in SQL

Hello Stackoverflow SQL experts,

What I am looking for: A way to sort string of text in Snowflake SQL.

Example: My table looks like something like this:

---------------------
| ID   | REFS       |
---------------------
| ID1  | 'ANN,BOB'  |
| ID2  | 'BOB,ANN'  |
---------------------

As you can see my ID1 and ID2 are referred by both Ann and Bob. But because they were inputted in different orders, they aren't recognized as a group. Is there a way to sort the String/list values in REF? to clean up REFs? so when I do counts and group bys. it would be

--------------------------
| REFS       | COUNT(ID) |
--------------------------
| 'ANN,BOB'  | 2         |
--------------------------

Instead of....

--------------------------
| REFS       | COUNT(ID) |
--------------------------
| 'ANN,BOB'  | 1         |
| 'BOB,ANN'  | 1         |
--------------------------

What I have tried:

  • TO_ARRAY(REFS) - But this just creates two lists, ['ANN','BOB'] and ['BOB','ANN']
  • SPLIT(REFS,',') - This also just creates

I have other REF lists containing all sorts of combinations.

'BOB,CHRIS,ANN'
'BOB,CHRIS'
'CHRIS'
'DAVE,ANN'
'ANN,ERIC'
'FRANK,BOB'
...
like image 217
jcsql Avatar asked Sep 02 '25 05:09

jcsql


2 Answers

You should fix the data model! Storing multiple values in a string is a bad idea. That said, you can split, unnest, and reaggregate. I think this works in Snowflake:

select t.*,
       (select list_agg(s.value, ',') within group (order by s.value)
        from table(split_to_table(t.refs, ',')) s
       ) normalized_refs
from t;
like image 174
Gordon Linoff Avatar answered Sep 04 '25 21:09

Gordon Linoff


WITH data(id, refs) as (
    SELECT * FROM VALUES
        ('ID1', 'ANN,BOB'),
        ('ID2', 'BOB,ANN'),
        ('ID3', 'CHRIS,BOB,ANN')
)
SELECT order_arry, count(distinct(id)) as count 
FROM (

  SELECT array_agg(val) WITHIN GROUP (ORDER BY val) over (partition by id) as order_arry, id
  FROM (
    SELECT d.id, trim(s.value) as val
    FROM data d, lateral split_to_table(d.refs, ',') s
  )
)
GROUP BY 1 ORDER BY 1;

gives:

ORDER_ARRY  COUNT
[    "ANN",    "BOB"  ] 2
[    "ANN",    "BOB",    "CHRIS"  ] 1

but as Gordon notes, the partiton by is not needed thus the distinct is also not needed;

SELECT ordered_arry, count(id) as count 
FROM (

  SELECT id, array_agg(val) WITHIN GROUP (ORDER BY val) as ordered_arry
  FROM (
    SELECT d.id, trim(s.value) as val
    FROM data d, lateral split_to_table(d.refs, ',') s
  )
  GROUP BY 1
)
GROUP BY 1 ORDER BY 1;
like image 37
Simeon Pilgrim Avatar answered Sep 04 '25 21:09

Simeon Pilgrim