Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python pandas perform same aggregation on multiple columns

I know that in pandas, I can do something like this, where I apply multiple aggregations to the same column:

import pandas as pd
df = pd.DataFrame({'id':[1,1,2,2], 'x1':[0,1,0,1], 'x2':[1,0,1,0],'x3':[0,1,0,1], 'x4':[1,0,1,0]})
df.groupby('id').agg({'x1':['sum', 'max'], 'x2':['sum','max']})

Is there a syntax shortcut to do a similar thing, except this time apply the same aggregation to multiple columns? However, I am also looking to perform more than one type of aggregation.

Valid Syntax Example

df.groupby('id').agg({'x1':sum, 'x2':sum, 'x3':mean, 'x4':mean})

Desired Outcome Example

df.groupby('id').agg({['x1', 'x2']:sum, ['x3', 'x4']:mean})

I know this isn't a valid key-value pair, but hopefully illustrates what I'm aiming for. As to why I want to do this, my current aggregation statement is getting long and I am looking for ways to shorten it.

like image 829
ZachTurn Avatar asked Nov 29 '25 22:11

ZachTurn


1 Answers

If want use list in keys of dictionary it is not valid in python.


Close, what you need is specify columns after groupby, but it working only for one aggregate function:

df.groupby('id')['x1', 'x2'].sum()

Or:

df.groupby('id')['x1', 'x2'].agg('sum')

If want some more dynamic solution one is create dictionary of tuples and then flatten values, only is necessary all values unique in tuples, because dict by definition has unique keys:

d = {('x1', 'x2'):['sum','max'], ('x3', 'x4'):'mean'}
d1 = {x:v for k, v in d.items() for x in k}
print (d1)
{'x1': ['sum', 'max'], 'x2': ['sum', 'max'], 'x3': 'mean', 'x4': 'mean'}

print (df.groupby('id').agg(d1))
    x1      x2       x3   x4
   sum max sum max mean mean
id                          
1    1   1   1   1  0.5  0.5
2    1   1   1   1  0.5  0.5
like image 81
jezrael Avatar answered Dec 02 '25 11:12

jezrael



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!