I know that in pandas, I can do something like this, where I apply multiple aggregations to the same column:
import pandas as pd
df = pd.DataFrame({'id':[1,1,2,2], 'x1':[0,1,0,1], 'x2':[1,0,1,0],'x3':[0,1,0,1], 'x4':[1,0,1,0]})
df.groupby('id').agg({'x1':['sum', 'max'], 'x2':['sum','max']})
Is there a syntax shortcut to do a similar thing, except this time apply the same aggregation to multiple columns? However, I am also looking to perform more than one type of aggregation.
Valid Syntax Example
df.groupby('id').agg({'x1':sum, 'x2':sum, 'x3':mean, 'x4':mean})
Desired Outcome Example
df.groupby('id').agg({['x1', 'x2']:sum, ['x3', 'x4']:mean})
I know this isn't a valid key-value pair, but hopefully illustrates what I'm aiming for. As to why I want to do this, my current aggregation statement is getting long and I am looking for ways to shorten it.
If want use list in keys of dictionary it is not valid in python.
Close, what you need is specify columns after groupby, but it working only for one aggregate function:
df.groupby('id')['x1', 'x2'].sum()
Or:
df.groupby('id')['x1', 'x2'].agg('sum')
If want some more dynamic solution one is create dictionary of tuples and then flatten values, only is necessary all values unique in tuples, because dict by definition has unique keys:
d = {('x1', 'x2'):['sum','max'], ('x3', 'x4'):'mean'}
d1 = {x:v for k, v in d.items() for x in k}
print (d1)
{'x1': ['sum', 'max'], 'x2': ['sum', 'max'], 'x3': 'mean', 'x4': 'mean'}
print (df.groupby('id').agg(d1))
x1 x2 x3 x4
sum max sum max mean mean
id
1 1 1 1 1 0.5 0.5
2 1 1 1 1 0.5 0.5
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With