I have a dataframe:
df_test = pd.DataFrame({'col': ['paris', 'paris', 'nantes', 'berlin', 'berlin', 'berlin', 'tokyo'],
'id_res': [12, 12, 14, 28, 8, 4, 89]})
col id_res
0 paris 12
1 paris 12
2 nantes 14
3 berlin 28
4 berlin 8
5 berlin 4
6 tokyo 89
I want to create a "check" column whose values are as follows:
The output I want is therefore:
col id_res check
0 paris 12 False
1 paris 12 False
2 nantes 14 False
3 berlin 28 True
4 berlin 8 False
5 berlin 4 False
6 tokyo 89 False
I tried with groupby but no satisfactory result. Can anyone help me plz
Create 2 boolean masks then combine them and find the highest id_res
value per col
:
m1 = df['col'].duplicated(keep=False)
m2 = ~df['id_res'].duplicated(keep=False)
df['check'] = df.index.isin(df[m1 & m2].groupby('col')['id_res'].idxmax())
print(df)
# Output
col id_res check
0 paris 12 False
1 paris 12 False
2 nantes 14 False
3 berlin 28 True
4 berlin 8 False
5 berlin 4 False
6 tokyo 89 False
Details:
>>> pd.concat([df, m1.rename('m1'), m2.rename('m2')])
col id_res check m1 m2
0 paris 12 False True False
1 paris 12 False True False
2 nantes 14 False False True
3 berlin 28 True True True # <- group to check
4 berlin 8 False True True # <- because
5 berlin 4 False True True # <- m1 and m2 are True
6 tokyo 89 False False True
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With