Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Time efficient way of dropping duplicates in a large dataframe of different types

Say I have this dataframe:

col1 col2

'a' [1,2,3]

'a' [1,2,3]

'b' [4,5,6]

and I want to drop the duplicates (in this case the first two rows). How would I accomplish this in a time efficient Pythonic manner (my full dataframe is millions of rows and 7 columns)

like image 297
Hanuman95 Avatar asked Oct 23 '25 13:10

Hanuman95


1 Answers

you can try converting to something hashable and then drop

inplace=True will overwrite your database

df["col2"] = df["col2"].transform(lambda k: tuple(k))
df.drop_duplicates(inplace=True)
like image 81
woblob Avatar answered Oct 26 '25 04:10

woblob



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!