Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Remove duplicate and original row - Pandas

Tags:

python

pandas

I am attempting to drop all records which have a duplicate from the below DataFrame df.

  sales_id sales_line
    100      1
    100      1
    200      1
    300      2
    300      2
    400      3
    500      1
    500      1
    600      5

The expected output I am trying to achieve is seen below.

sales_id sales_line
    200      1
    400      3
    600      5

Any assistance that anyone could provide would be greatly appreciated.

like image 754
moe_95 Avatar asked Dec 28 '25 22:12

moe_95


2 Answers

Use DataFrame.drop_duplicates with keep=False for remove duplicates in all columns:

df = df.drop_duplicates(keep=False)
print (df)
   sales_id  sales_line
2       200           1
5       400           3
8       600           5
like image 151
jezrael Avatar answered Dec 31 '25 11:12

jezrael


You can try with drop_duplicates(self, subset=None, keep="first", inplace=False)

In your case, the important bit of the function is the keep=False.

import pandas as pd


data = { 'sales_id' : [100, 100, 200, 300, 300, 400, 500, 500, 600], 'sales_line' : [1, 1, 1, 2, 2, 3, 1, 1, 5] }

df = pd.DataFrame(data)
print('Source DataFrame:\n', df)

df_dropped = df.drop_duplicates(subset=['sales_id', 'sales_line'], keep=False)
print('Result DataFrame:\n', df_dropped)
like image 44
Daemon Painter Avatar answered Dec 31 '25 11:12

Daemon Painter



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!