I have the following pandas DataFrame df:
import pandas as pd
mydictionary = {'id': ['11X', '11X', '22X', '33A'],
'grade': [68, 74, 77, 78],
'checkdate': ["2019-12-26", "2019-12-27", "2019-12-26", "2019-12-25"]}
df = pd.DataFrame(mydictionary)
I want to sort values by checkdate and drop duplicates by id while keeping the newest entries.
The expected result is this one:
id grade checkdate
11X 74 2019-12-27
22X 77 2019-12-26
33A 78 2019-12-25
I know how to sort values:
df.sort_values("checkdate")
Also, I know how to drop duplicates:
df.drop_duplicates(subset=["id"], keep='first', inplace=True)
But how to put these two things together?
You can try:
import pandas as pd
mydictionary = {'id': ['11X', '11X', '22X', '33A'],
'grade': [68, 74, 77, 78],
'checkdate': ["2019-12-26", "2019-12-27", "2019-12-26", "2019-12-25"]}
df = pd.DataFrame(mydictionary)
df['checkdate'] = pd.to_datetime(df['checkdate'])
df2 = df.sort_values(by=['checkdate']).drop_duplicates('id', keep='last')
print(df2)
Result:
id grade checkdate
3 33A 78 2019-12-25
2 22X 77 2019-12-26
1 11X 74 2019-12-27
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With