I have two dataframes with observations on rows and features (or group membership) on columns, e.g.:
> data_df
a b c
A 1 2 1
B 0 1 3
C 0 0 1
D 2 1 1
E 1 1 1
> mask_df
g1 g2
A 0 1
B 1 0
C 1 0
D 1 0
E 0 1
I want to group and aggregate (by sum) the values in the first dataframe (data_df
) conditional on the binary values (masks) in the second dataframe (mask_df
). The result should be the following (groups x features):
> aggr_df
a b c
g1 2 2 5
g2 2 3 2
Is there a way in pandas to group the first dataframe (data_df
) using the masks contained in a second dataframe (mask_df
) in a single command?
You can do this cheaply with dot
and groupby
:
data_df.groupby(mask_df.dot(mask_df.columns)).sum()
a b c
g1 2 2 5
g2 2 3 2
Where,
mask_df.dot(mask_df.columns)
A g2
B g1
C g1
D g1
E g2
dtype: object
Which works well assuming each row always has exactly one column set to 1
.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With