Given data:
| grp | data1 | data2 | data3 |
|---|---|---|---|
| a | 2 | 1 | 2 |
| a | 4 | 6 | 3 |
| b | 3 | 2 | 1 |
| b | 7 | 3 | 5 |
Expected output:
| grp | sum(data1) | sum(data2)/sum(data1) | sum(data3)/sum(data1) |
|---|---|---|---|
| a | 6 | 1.166666667 | 0.83 |
| a | 10 | 0.5 | 0.6 |
Assume custom aggregation can be dependent on multiple columns and not always a simple division operation. I know using SQL query it's possible, but I am interested in an answer with apply and aggregate function if possible.
You can use groupby + assign here to generate required aggregations. You can apply whatever aggregate function is needed.
g = df.groupby('grp')
# for custom agg func use .agg(custom_agg_func)
# ^^^^^
g[['data1']].agg('sum').assign(sum2 = lambda df: g['data2'].sum()/df['data1'],
sum3 = lambda df: g['data3'].sum()/df['data1'])
# ^^^^^^
# you can use custom agg func of your choice
data1 sum2 sum3
grp
a 6 1.166667 0.833333
b 10 0.500000 0.600000
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With