I have the following dataframe
data = [
{'col1': 11, 'col2': 111, 'col3': 1111},
{'col1': 22, 'col2': 222, 'col3': 2222},
{'col1': 33, 'col2': 333, 'col3': 3333},
{'col1': 44, 'col2': 444, 'col3': 4444}
]
and the following list:
lst = [(11, 111), (22, 222), (99, 999)]
I would like to get out of my data only rows that col1 and col2 do not exist in the lst
result for above example would be:
[
{'col1': 33, 'col2': 333, 'col3': 3333},
{'col1': 44, 'col2': 444, 'col3': 4444}
]
how can I achieve that?
import pandas as pd
df = pd.DataFrame(data)
list_df = pd.DataFrame(lst)
# command like ??
# df.subtract(list_df)
If need test by pairs is possible compare MultiIndex created by both columns in Index.isin with inverted mask by ~ in boolean indexing:
df = df[~df.set_index(['col1','col2']).index.isin(lst)]
print (df)
col1 col2 col3
2 33 333 3333
3 44 444 4444
Or with left join by merge with indicator parameter:
m = df.merge(list_df,
left_on=['col1','col2'],
right_on=[0,1],
indicator=True,
how='left')['_merge'].eq('left_only')
df = df[mask]
print (df)
col1 col2 col3
2 33 333 3333
3 44 444 4444
You can create a tuple out of your col1 and col2 columns and then check if those tuples are in the lst list. Then drop the fines with True values.
df.drop(df.apply(lambda x: (x['col1'], x['col2']), axis =1)
.isin(lst)
.loc[lambda x: x==True]
.index)
With this solution you don't even have to make the second list a dataframe
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With