I want to use python pandas to drop rows in a spreadsheet that do not contain "CAT" within the "Equipment" column. Here is what I have so far:
import pandas as pd
import openpyxl
import warnings
import xlrd
warnings.filterwarnings("ignore")
file_name = input("Enter File Name: ")
df = pd.read_csv(file_name)
for i in range(len(df["Equipment"])):
if "CAT" in df["Equipment"][i]:
print (df["Equipment"][i])
else:
df.drop([i])
df.to_excel("new_mp.xlsx")
The script prints out the correct values in the terminal but does not drop the rows in the new spreadsheet "new_mp.xlsx". Can someone point me in the right direction?
Example Data:
Equipment
CAT 259B3 1818 OE Skid Steer 2011 CAT
T-14 Towmaster Trailer 3124 OE Trailer 2008
CAT 938M Loader RPO RENTAL 2017 CAT 938M
Rental Water Truck 55571 Rental Water Truck international water truck
You don't need a loop here; you can do this with str.contains
:
v = df[df["Equipment"].str.contains('CAT')]
print(v)
Equipment
0 CAT 259B3 1818 OE Skid Steer 2011 CAT
2 CAT 938M Loader RPO RENTAL 2017 CAT 938M
Or, use query
with engine='python'
:
v = df.query('Equipment.str.contains("CAT")', engine='python')
print(v)
Equipment
0 CAT 259B3 1818 OE Skid Steer 2011 CAT
2 CAT 938M Loader RPO RENTAL 2017 CAT 938M
Finally, write to disk:
v.to_excel("new_mp.xlsx")
You can use this to filter the rows.
df[df['Equipment'].apply(lambda x: 'CAT' in x)].to_excel("new_mp.xlsx")
Edit:
Alright, let me explain this inside out :
lambda x: 'CAT' in x
takes as input x and returns True if 'CAT' is in x.
Now the df['Equipment'].apply
method will invoke the lambda
function mentioned above on every value of the series. The result will be a boolean array which is then passed on to the df
as boolean mask.
Hope I didn't overdo it.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With