I have pandas dataframe as:
df
Id      Name        CaseId       Value 
82      A1          case1.01     37.71 
1558    A3          case1.01     27.71 
82      A1          case1.06     29.54 
1558    A3          case1.06     29.54 
82      A1          case1.11     12.09 
1558    A3          case1.11     32.09 
82      A1          case1.16     33.35 
1558    A3          case1.16     33.35 
For each Id, Name pair I need to select the CaseId with maximum value.
i.e. I am seeking the following output:
Id      Name        CaseId       Value 
82      A1          case1.01     37.71
1558    A3          case1.16     33.35
I tried the following:
import pandas as pd
pd.pivot_table(df, index=['Id', 'Name'], columns=['CaseId'], values=['Value'], aggfunc=[np.max])['amax']
But all it does is for each CaseId as column it gives maximum value and not the results that I am seeking above.
sort_values + drop_duplicates
df.sort_values('Value').drop_duplicates(['Id'],keep='last')
Out[93]: 
     Id Name    CaseId  Value
7  1558   A3  case1.16  33.35
0    82   A1  case1.01  37.71
Since we post same time , adding more method
df.sort_values('Value').groupby('Id').tail(1)
Out[98]: 
     Id Name    CaseId  Value
7  1558   A3  case1.16  33.35
0    82   A1  case1.01  37.71
This should work:
df = df.sort_values('Value', ascending=False).drop_duplicates('Id').sort_index()
Output:
     Id Name    CaseId  Value
0    82   A1  case1.01  37.71
7  1558   A3  case1.16  33.35
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With