Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to remove rows in a dataframe that contain certain words in R?

I'm trying to remove rows in my dataframe that contain a certain word or certain sequences of words. for example:

mydf <- as.data.frame(read.xlsx("C:\\data.xlsx, 1, header=T"))
head(df)
#     NO    ARTICLE    
# 1   34    New York Times reports blabla
# 2   42    Financial Times reports blabla
# 3   21    Greenwire reports blabla
# 4    3    New York Times reports blabla
# 5   46    Newswire reports blabla

I want to remove the rows that contain the string "New York Times" and "Newswire" from my data.frame. I have tried different approaches using %in% or grep, but I'm not quite sure how to use this!

How do I do that?

like image 227
cptn Avatar asked Dec 05 '25 05:12

cptn


1 Answers

Per my comment, use grepl, which returns a logical value when a specified string is found in your vector. In your case, something like:

df[!grepl('New York Times',df$Article),]

should do the trick.

like image 136
Thomas Avatar answered Dec 07 '25 22:12

Thomas



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!