Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

pandas duplication removing nans

I am trying to check for duplicates. I use df['name_duplicated'] = df.duplicated('name', keep=False) However, this treats any row with name = NaN as a duplicate.

Does anyone know how to get around this?

I am trying df[pd.isnull(df['name'])]['name_duplicated'] = False but I get an error.

like image 914
As3adTintin Avatar asked Oct 23 '25 17:10

As3adTintin


1 Answers

You could try also checking for NaNs and doing a boolean and operation on the results of the duplicated call

df['name_duplicated'] = df.duplicated('name', keep=False) & df['name'].notnull()
like image 57
philngo Avatar answered Oct 25 '25 05:10

philngo