Drop_duplicates in a dataframe and keep the one with a specific column value

Question

I am having a dataframe df:

columnA columnB columnC columnD columnE
A        B         10      C       C
A        B         10      D       A
B        C         20      A       A
B        A         20      D       A
B        A         20      D       C

I want to drop the duplicates if there are duplicates entries for columnA, columnB, columnC in my case the duplicates are:

columnA columnB columnC columnD columnE
A        B         10      C       C
A        B         10      D       A
B        A         20      D       A
B        A         20      D       C

How can I keep the one of the duplicate rows, where columnE is equal to C ? So that the output for the full dataframe is:

columnA columnB columnC columnD columnE
A        B         10      C       C
B        C         20      A       A
B        A         20      D       C

jezrael · Accepted Answer

You can use DataFrame.sort_values for prefer C values first with DataFrame.drop_duplicates and or original order add DataFrame.sort_index:

out = (df.sort_values('columnE', key=lambda x: x.ne('C'))
         .drop_duplicates(['columnA','columnB','columnC'])
         .sort_index())
print (out)
  columnA columnB  columnC columnD columnE
0       A       B       10       C       C
2       B       C       20       A       A
4       B       A       20       D       C

Or use DataFrameGroupBy.idxmax for indices with prefer C with DataFrame.loc for select rows and Series.sort_values for original ordering:

idx = df['columnE'].eq('C').groupby([df['columnA'],df['columnB'],df['columnC']]).idxmax()
out = df.loc[idx.sort_values()]
print (out)
  columnA columnB  columnC columnD columnE
0       A       B       10       C       C
2       B       C       20       A       A
4       B       A       20       D       C

Drop_duplicates in a dataframe and keep the one with a specific column value

Tags:

python

pandas

PV8

1 Answers

jezrael

Recent Activity

Donate For Us

Drop_duplicates in a dataframe and keep the one with a specific column value

Tags:

python

pandas

PV8

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us