BACKGROUND:
I have data that looks like this
date        src    subsrc   subsubsrc   param1  param2
2020-02-01  src1    ksjd    dfd8        47      31    
2020-02-02  src1    djsk    zmnc        44      95    
2020-02-03  src2    skdj    awes        92      100   
2020-02-04  src2    mxsf    kajs        80      2     
2020-02-05  src3    skdj    asio        46      53    
2020-02-06  src3    dekl    jdqo        19      18    
2020-02-07  src3    dskl    dqqq        69      18    
2020-02-08  src4    sqip    riow        64      46    
2020-02-09  src5    ss01    qwep        34      34    
I am trying to aggregate for all time, last 30 days and last 90 days (no rolling sum)
So my final data would look like this:
src     subsrc  subsubsrc   p1_all  p1_30   p1_90   p2_all  p2_30   p2_90
src1    ksjd    dfd8        7       1       7       98      7        98
src1    djsk    zmnc        0       0       0       0       0         0
src2    skdj    awes        12      12      12      4       4         4
src2    mxsf    kajs        6       6       6       31      31       31
src3    skdj    asio        0       0       0       0       0         0
src3    dekl    jdqo        20      20      20      17      17        17
src3    dskl    dqqq        3       3       3       4       4         4
src4    sqip    qwep        0       0       0       0       0         0
src5    ss01    qwes        15      15      15      2       2         2
ABOUT DATA:
WHAT I HAVE TRIED:
This is what I have come up with:
SELECT src, subsubsrc, subsubsrc,
SUM(param1) as param1_all,
SUM(CASE WHEN DATE_DIFF(CURRENT_DATE,date,day) <= 30 THEN param1 END) as param1_30,
SUM(CASE WHEN DATE_DIFF(CURRENT_DATE,date,day) <= 90 THEN param1 END) as param1_90,
SUM(param2) as param2_all,
SUM(CASE WHEN DATE_DIFF(CURRENT_DATE,date,day) <= 30 THEN param2 END) as param2_30,
SUM(CASE WHEN DATE_DIFF(CURRENT_DATE,date,day) <= 90 THEN param2 END) as param2_90,
FROM `MY_TABLE`
GROUP BY src
ORDER BY src
This actually works but I can anticipate how long this query is going to become for multiple sources and even more param columns.
I have been trying something called "Filtered aggregate functions (or manual pivot)" explained HERE. But I am unable to understand/implement it for my case.
Also I have looked at dozens of answers and most of them are running sums for each day OR are complicated cases of this basic calculation. Maybe I am not searching it correctly.
As you can see I am newbie in SQL and would really appreciate any help.
Your query looks quite good; conditional aggregation is the canonical method to pivot a dataset.
One way to possibly increase performance would be to change the date filter in the conditional expressions: using a date function precludes the use of an index.
Instead, you could phrase this as:
select 
    src, 
    subsrc, 
    subsubsrc,
    sum(param1) as param1_all,
    sum(case when date >= current_date - interval 30 day then param1 end) as param1_30,
    sum(case when date >= current_date - interval 90 day then param1 end) as param1_90,
    sum(param2) as param2_all,
    sum(case when date >= current_date - interval 30 day then param2 end) as param2_30,
    sum(case when date >= current_date - interval 90 day then param2 end) as param2_90
from my_table
group by src, subsrc, subsubsrc
order by src, subsrc, subsubsrc
For performance, the following index may be helpul: (src, subsrc, subsubsrc, date).
Note that I included all three non-aggregated columns (src, subsrc, subsubsrc) in the group by clause: starting MySQL 5.7, this is by default mandatory (although you can play around with sql modes to alter that behavior) - and most other databases implement the same constraint.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With