Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove/Drop duplicates EXCEPT where row contains certain string

I need to drop all duplicates using Pandas, except for the ones where the cell contains a certain string.

Given that DF is:

NAME     ID        
Joe      110
Joe      123
Joe     PENDING
Mary    PENDING
Mary     110
Justin   123

I need to keep rows where 'ID' is PENDING, and at the same time drop the rest of the duplicates.

Desired output looks like this:

NAME     ID        
Joe      110
Joe      123
Joe     PENDING
Mary    PENDING
like image 805
VRumay Avatar asked Oct 28 '25 18:10

VRumay


1 Answers

You could use duplicated:

import pandas as pd

data = [['Joe', 110],
        ['Joe', 123],
        ['Joe', 'PENDING'],
        ['Mary', 'PENDING'],
        ['Mary', 110],
        ['Justin', 123]]

df = pd.DataFrame(data=data, columns=['NAME', 'ID'])

print(df[~df.duplicated('ID') | (df['ID'] == 'PENDING')])

As an alternative you could do:

print(df[df.ID.duplicated(keep='last') | df.ID.eq('PENDING')])

Output

   NAME       ID
0   Joe      110
1   Joe      123
2   Joe  PENDING
3  Mary  PENDING
like image 68
Dani Mesejo Avatar answered Oct 31 '25 08:10

Dani Mesejo