I have a data frame that looks like this:
Id Category Score
1 M 0.2
2 C 0.4
2 M 0.3
1 C 0.1
2 M 0.3
1 M 0.2
1 C 0.1
1 C 0.1
2 C 0.4
I want to group by Id and Category, then find the max Score, and create a new variable called Category_Label whose rows are equal to the Category at max score index.
The output should look like this
Id Category Score Category_Label
1 M 0.2 M
2 C 0.4 C
2 M 0.3 C
1 C 0.1 M
2 F 0.03 C
1 M 0.2 M
1 C 0.1 M
1 E 0.01 M
2 C 0.4 C
In other words, the new variable 'Category_Labelshould be equal to the row ofCategory` that corresponds to the max score of all the 1s
I tried this:
df[df['Category_Label']] == df.loc[df.groupby(['Id','Category'])['Score'].transform(lambda a: a.max())],'Category' ]
But I am far away!! I looked into this question and this, but they are not helpful enough.
idxmax to find where the max positions are. transform to broadcast across all indices.loc to grab Category valuesdf.assign(
Category_Label=df.loc[
df.groupby('Id').Score.transform('idxmax'),
'Category'
].values
)
Id Category Score Category_Label
0 1 M 0.2 M
1 2 C 0.4 C
2 2 M 0.3 C
3 1 C 0.1 M
4 2 M 0.3 C
5 1 M 0.2 M
6 1 C 0.1 M
7 1 C 0.1 M
8 2 C 0.4 C
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With