Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to sort values by date and drop duplicates by a column?

Tags:

python

pandas

I have the following pandas DataFrame df:

import pandas as pd

mydictionary = {'id': ['11X', '11X', '22X', '33A'],
    'grade': [68, 74, 77, 78],
    'checkdate': ["2019-12-26", "2019-12-27", "2019-12-26", "2019-12-25"]}

df = pd.DataFrame(mydictionary)

I want to sort values by checkdate and drop duplicates by id while keeping the newest entries.

The expected result is this one:

id    grade   checkdate
11X   74      2019-12-27
22X   77      2019-12-26
33A   78      2019-12-25

I know how to sort values:

df.sort_values("checkdate")

Also, I know how to drop duplicates:

df.drop_duplicates(subset=["id"], keep='first', inplace=True)

But how to put these two things together?

like image 581
Fluxy Avatar asked Dec 09 '25 01:12

Fluxy


1 Answers

You can try:

import pandas as pd

mydictionary = {'id': ['11X', '11X', '22X', '33A'],
                'grade': [68, 74, 77, 78],
                'checkdate': ["2019-12-26", "2019-12-27", "2019-12-26", "2019-12-25"]}

df = pd.DataFrame(mydictionary)

df['checkdate'] = pd.to_datetime(df['checkdate'])
df2 = df.sort_values(by=['checkdate']).drop_duplicates('id', keep='last')
print(df2)

Result:

    id  grade  checkdate
3  33A     78 2019-12-25
2  22X     77 2019-12-26
1  11X     74 2019-12-27
like image 57
Rene Avatar answered Dec 10 '25 15:12

Rene



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!