The need to rows that have NaN values in them but are also duplicates. For example this table:
    A   B   C
0   foo 2   3
1   foo nan nan
2   foo 1   4
3   bar nan nan
4   foo nan nan
Should become this:
    A   B   C
0   foo 2   3
2   foo 1   4
3   bar nan nan
How can i do that?
Use boolean indexing:
df = df[~df['A'].duplicated(keep=False) | df[['B','C']].notnull().any(axis=1)]
print (df)
     A    B    C
0  foo  2.0  3.0
2  foo  1.0  4.0
3  bar  NaN  NaN
Explanation:
Test column A for not duplicates - duplicated with ~ for invert boolean mask:
print (~df['A'].duplicated(keep=False))
0    False
1    False
2    False
3     True
4    False
Name: A, dtype: bool
Check non missing values in B,C columns:
print (df[['B','C']].notnull())
       B      C
0   True   True
1  False  False
2   True   True
3  False  False
4  False  False
And then at least one True per row with DataFrame.any:
print (df[['B','C']].notnull().any(axis=1))
0     True
1    False
2     True
3    False
4    False
dtype: bool
Chain together by | for bitwise OR:
print (~df['A'].duplicated(keep=False) | df[['B','C']].notnull().any(axis=1))
0     True
1    False
2     True
3     True
4    False
dtype: bool
Slightly different to jezrael's solution:
>>> df                                                                                                     
     A    B    C
0  foo  2.0  3.0
1  foo  NaN  NaN
2  foo  1.0  4.0
3  bar  NaN  NaN
4  foo  NaN  NaN
>>>
>>> df.drop(index=df[df.duplicated(keep=False)].isnull().any(1).index)                                  
     A    B    C
0  foo  2.0  3.0
2  foo  1.0  4.0
3  bar  NaN  NaN
Steps:
>>> df.duplicated(keep=False)                                                                            
0    False
1     True
2    False
3    False
4     True
dtype: bool
>>>
>>> df[df.duplicated(keep=False)]                                                                       
      A   B   C
1  foo NaN NaN
4  foo NaN NaN
>>>
>>> df[df.duplicated(keep=False)].isnull()                                                                 
       A     B     C
1  False  True  True
4  False  True  True
>>>
>>> df[df.duplicated(keep=False)].isnull().any(1).index                                                          
Int64Index([1, 4], dtype='int64')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With