Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Drop rows that do not contain a string within a value using pandas

I want to use python pandas to drop rows in a spreadsheet that do not contain "CAT" within the "Equipment" column. Here is what I have so far:

import pandas as pd
import openpyxl
import warnings
import xlrd

warnings.filterwarnings("ignore")

file_name = input("Enter File Name: ")
df = pd.read_csv(file_name)


for i in range(len(df["Equipment"])):
    if "CAT" in df["Equipment"][i]:
        print (df["Equipment"][i])
    else:
        df.drop([i])

df.to_excel("new_mp.xlsx")

The script prints out the correct values in the terminal but does not drop the rows in the new spreadsheet "new_mp.xlsx". Can someone point me in the right direction?

Example Data:

Equipment
CAT 259B3 1818 OE Skid Steer 2011 CAT
T-14 Towmaster Trailer 3124 OE Trailer 2008
CAT 938M Loader RPO RENTAL 2017 CAT 938M
Rental Water Truck 55571 Rental Water Truck international water truck
like image 345
cbennett423 Avatar asked Aug 31 '25 00:08

cbennett423


2 Answers

You don't need a loop here; you can do this with str.contains:

v = df[df["Equipment"].str.contains('CAT')]
print(v)
                                  Equipment
0     CAT 259B3 1818 OE Skid Steer 2011 CAT
2  CAT 938M Loader RPO RENTAL 2017 CAT 938M

Or, use query with engine='python':

v = df.query('Equipment.str.contains("CAT")', engine='python')
print(v)
                                  Equipment
0     CAT 259B3 1818 OE Skid Steer 2011 CAT
2  CAT 938M Loader RPO RENTAL 2017 CAT 938M

Finally, write to disk:

v.to_excel("new_mp.xlsx")
like image 163
cs95 Avatar answered Sep 03 '25 00:09

cs95


You can use this to filter the rows.

df[df['Equipment'].apply(lambda x: 'CAT' in x)].to_excel("new_mp.xlsx")

Edit:

Alright, let me explain this inside out :

lambda x: 'CAT' in x takes as input x and returns True if 'CAT' is in x.

Now the df['Equipment'].apply method will invoke the lambda function mentioned above on every value of the series. The result will be a boolean array which is then passed on to the df as boolean mask.

Hope I didn't overdo it.

like image 25
Omkar Sabade Avatar answered Sep 03 '25 01:09

Omkar Sabade