Suppose, i group my DataFrame A by key
A = pd.DataFrame({ 'key':['II','I','I','III','II'],
'Z':['a', 'b', 'c', 'd', 'e'],
'd':[1,2,0,2,0],
'e':[0,2,0,3,0],
'f':[0,3,0,4,0],})
And i want diffrent aggregation for each column. E.g.:
sum() for f max() for emean() for dZ (ae, bc, d)As i'm not able to extract columns separatly from DataFrameGroupedBy, i have to split A in 4 diffrent DataFrames with columns [key, c],[key, d],[key, e],[key, f] before the groupby, apply diffrent agregations to each, then merge by key.
This seems a little rediculous and needs a lot of code. Are there more elegant ways?
You can use agg by dict of columns and aggregate functions:
df = A.groupby('key').agg({'f':'sum','e':'max','d':'mean', 'Z': ''.join})
print (df)
d Z f e
key
I 1.0 bc 3 2
II 0.5 ae 0 0
III 2.0 d 4 3
You can also use agg and the passed dictionary to name the columns.
f = dict(
f={'Sum of f': 'sum'},
e={'Max of e': 'max'},
d={'Mean of d': 'mean'},
Z={'Concat of Z': 'sum'},
)
A.groupby('key').agg(f)
f e d Z
Sum of f Max of e Mean of d Concat of Z
key
I 3 2 1.0 bc
II 0 0 0.5 ae
III 4 3 2.0 d
However, if you don't want the pd.MultiIndex it's probably easier to use rename
f = dict(
f='sum',
e='max',
d='mean',
Z='sum',
)
m = dict(
f='Sum of f',
e='Max of e',
d='Mean of d',
Z='Concat of Z'
)
A.groupby('key').agg(f).rename(columns=m)
Sum of f Max of e Mean of d Concat of Z
key
I 3 2 1.0 bc
II 0 0 0.5 ae
III 4 3 2.0 d
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With