Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Python Pandas remove rows containing values from a list

Tags:

python

pandas

I am comparing two large CSVs with Pandas both containing contact information. I want to remove any rows from one CSV that contain any of the email addresses from the other CSV.

So if I had

DF1

name phone email
1    1     [email protected]
2    2     [email protected]
3    3     [email protected]

DF2

name phone email
x    y     [email protected]
a    b     [email protected]

I would be left with

DF3

name phone email
1    1     [email protected]

I don't care about any columns except the email addresses. This seems like it would be easy, but I'm really struggling with this one.

Here is what I have, but I don't think this is even close:

def remove_warm_list_duplicates(dataframe):
    '''Remove rows that have emails from the warmlist'''
    warm_list = pd.read_csv(r'warmlist/' + 'warmlist.csv'
                            , encoding="ISO-8859-1"
                            , error_bad_lines=False)
    warm_list_emails = warm_list['Email Address'].tolist()
    dataframe = dataframe[dataframe['Email Address'].isin(warm_list_emails) == False]
like image 483
jacobherrington Avatar asked Nov 30 '25 14:11

jacobherrington


1 Answers

You can use pandas isin()

df3 = df1[~df1['email'].isin(df2['email'])]

Resulting df

    name    phone   email
0   1       1       [email protected]
like image 114
Vaishali Avatar answered Dec 02 '25 05:12

Vaishali



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!