I want to calculate the cumulative percentiles for a given partition/group in SQL. For example the input data looks like -
CustID Product ID quantity_purchased
1 111 2
2 111 3
3 111 2
4 111 5
1 222 2
2 222 6
4 222 7
6 222 2
I want to get cumulative percentiles on each product ID group. The output should be -
Product ID min 25% 50% 75% max
111 2 2 2.5 3.5 5
222 2 2 2.5 5.25 7
So, basically for Product ID 111 I need to take the percentiles of quantity_purchased for only product ID 111 but as I proceed further in the column the percentiles should be cumulative meaning for product ID 222 the percentiles will be calculated considering quantity_purchased values of both Product ID 111 and product ID 222 (2,3,2,5,2,6,7,2). Similarly, if there was product ID 333 in data then for product ID 333, I would compute the percentiles based on all of the quantity_purchased values associated with product 111, product 222, and product 333 and store the result in the product 333 row. How to achieve this using SQL?
This is very curious, but I think you need to expand the data for each product id:
select t.product_id, min(t2.quantity_purchased), max(t2.quantity_purchased),
percentile_cont(0.25) within group (order by t2.quantity_purchased),
percentile_cont(0.50) within group (order by t2.quantity_purchased),
percentile_cont(0.75) within group (order by t2.quantity_purchased)
from t join
t t2
on t2.product_id <= t.product_id
group by t1.product_id;
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With