Remove/Drop duplicates EXCEPT where row contains certain string

Question

I need to drop all duplicates using Pandas, except for the ones where the cell contains a certain string.

Given that DF is:

NAME     ID        
Joe      110
Joe      123
Joe     PENDING
Mary    PENDING
Mary     110
Justin   123

I need to keep rows where 'ID' is PENDING, and at the same time drop the rest of the duplicates.

Desired output looks like this:

NAME     ID        
Joe      110
Joe      123
Joe     PENDING
Mary    PENDING

Dani Mesejo · Accepted Answer

You could use duplicated:

import pandas as pd

data = [['Joe', 110],
        ['Joe', 123],
        ['Joe', 'PENDING'],
        ['Mary', 'PENDING'],
        ['Mary', 110],
        ['Justin', 123]]

df = pd.DataFrame(data=data, columns=['NAME', 'ID'])

print(df[~df.duplicated('ID') | (df['ID'] == 'PENDING')])

As an alternative you could do:

print(df[df.ID.duplicated(keep='last') | df.ID.eq('PENDING')])

Output

   NAME       ID
0   Joe      110
1   Joe      123
2   Joe  PENDING
3  Mary  PENDING

Remove/Drop duplicates EXCEPT where row contains certain string

Tags:

python

python-3.x

pandas

duplicates

VRumay

1 Answers

Dani Mesejo

Recent Activity

Donate For Us

Remove/Drop duplicates EXCEPT where row contains certain string

Tags:

python

python-3.x

pandas

duplicates

VRumay

1 Answers

Dani Mesejo

Related questions

Recent Activity

Donate For Us