Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Aggregate data from multiple rows into single row

In my table each row has some data columns Priority column (for example, timestamp or just an integer). I want to group my data by ID and then in each group take latest not-null column. For example I have following table:

id  A       B       C       Priority
1   NULL    3       4       1
1   5       6       NULL    2
1   8       NULL    NULL    3
2   634     346     359     1
2   34      NULL    734     2

Desired result is :

id  A   B   C   
1   8   6   4   
2   34  346 734 

In this example table is small and has only 5 columns, but in real table it will be much larger. I really want this script to work fast. I tried do it myself, but my script works for SQLSERVER2012+ so I deleted it as not applicable.

Numbers: table could have 150k of rows, 20 columns, 20-80k of unique ids and average SELECT COUNT(id) FROM T GROUP BY ID is 2..5

Now I have a working code (thanks to @ypercubeᵀᴹ), but it runs very slowly on big tables, in my case script can take one minute or even more (with indices and so on).

How can it be speeded up?

SELECT 
    d.id,
    d1.A,
    d2.B,
    d3.C
FROM 
    ( SELECT id
      FROM T
      GROUP BY id
    ) AS d
  OUTER APPLY
    ( SELECT TOP (1) A
      FROM T 
      WHERE id = d.id
        AND A IS NOT NULL
      ORDER BY priority DESC
    ) AS d1 
  OUTER APPLY
    ( SELECT TOP (1) B
      FROM T 
      WHERE id = d.id
        AND B IS NOT NULL
      ORDER BY priority DESC
    ) AS d2 
  OUTER APPLY
    ( SELECT TOP (1) C
      FROM T 
      WHERE id = d.id
        AND C IS NOT NULL
      ORDER BY priority DESC
    ) AS d3 ;

In my test database with real amount of data I get following execution plan: enter image description here

like image 904
Alex Zhukovskiy Avatar asked Nov 22 '25 08:11

Alex Zhukovskiy


2 Answers

This should do the trick, everything raised to the power 0 will return 1 except null:

DECLARE @t table(id int,A int,B  int,C int,Priority int)
INSERT @t
VALUES (1,NULL,3   ,4   ,1),
(1,5   ,6   ,NULL,2),(1,8   ,NULL,NULL,3),
(2,634 ,346 ,359 ,1),(2,34  ,NULL,734 ,2)

;WITH CTE as
(
  SELECT id, 
  CASE WHEN row_number() over 
    (partition by id order by Priority*power(A,0) desc) = 1 THEN A END A,
  CASE WHEN row_number() over 
    (partition by id order by Priority*power(B,0) desc) = 1 THEN B END B,
  CASE WHEN row_number() over 
    (partition by id order by Priority*power(C,0) desc) = 1 THEN C END C
  FROM @t
)
SELECT id, max(a) a, max(b) b, max(c) c
FROM CTE
GROUP BY id

Result:

id  a   b   c
1   8   6   4
2   34  346 734
like image 127
t-clausen.dk Avatar answered Nov 24 '25 23:11

t-clausen.dk


One alternative that might be faster is a multiple join approach. Get the priority for each column and then join back to the original table. For the first part:

select id,
       max(case when a is not null then priority end) as pa,
       max(case when b is not null then priority end) as pb,
       max(case when c is not null then priority end) as pc
from t
group by id;

Then join back to this table:

with pabc as (
      select id,
             max(case when a is not null then priority end) as pa,
             max(case when b is not null then priority end) as pb,
             max(case when c is not null then priority end) as pc
      from t
      group by id
     )
select pabc.id, ta.a, tb.b, tc.c
from pabc left join
     t ta
     on pabc.id = ta.id and pabc.pa = ta.priority left join
     t tb
     on pabc.id = tb.id and pabc.pb = tb.priority left join
     t tc
     on pabc.id = tc.id and pabc.pc = tc.priority ;

This can also take advantage of an index on t(id, priority).

like image 31
Gordon Linoff Avatar answered Nov 24 '25 21:11

Gordon Linoff



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!