I'm trying to get the mean for each column while grouped by id. But I don't get it to work as I want to.
The data:
ID Property3 Property2 Property3
1 10.2 ... ...
1 20.1
1 51.9
1 15.8
1 12.5
...
1203 104.4
1203 11.5
1203 19.4
1203 23.1
What I got so far:
I got those two tries. But they are both just for one column and I don't know how to do it for more then just one.:
data.groupby('id')['property1'].apply(lambda grp: grp.nlargest(100).mean())
1 37.897989
2 33.059432
3 34.926530
4 33.036137
data.groupby('id').agg({'property1': {lambda grp: grp.nlargest(100).mean()}})
id property1 <lambda>
1 37.897989
2 33.059432
3 34.926530
4 33.036137
What I want:
Idealy, I would like to have a dataframe as follows:
ID Property3 Property2 Property3
1 37.8 5.6 2.3
2 33.0 1.5 10.4
3 34.9 91.5 10.3
4 33.0 10.3 14.3
So that each row contains the mean values for the 100 biggest values for EACH column grouped by id.
Use GroupBy.agg with omit columns for processing all columns in DataFrame without ID:
df = data.groupby('ID').agg(lambda grp: grp.nlargest(100).mean()).reset_index()
print (df)
ID Property1 Property2 Property3
0 1.0 22.1 ... ...
1 1203.0 39.6 ... ...
Or specify columns after groupby:
df = (data.groupby('ID')['Property1','Property2','Property3']
.agg(lambda grp: grp.nlargest(100).mean())
.reset_index())
Use GroupBy.agg with as_index=False
so that id is a column:
data.groupby('id',as_index=False).agg(lambda grp: grp.nlargest(100).mean())
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With