I'm trying to remove rows in my dataframe that contain a certain word or certain sequences of words. for example:
mydf <- as.data.frame(read.xlsx("C:\\data.xlsx, 1, header=T"))
head(df)
# NO ARTICLE
# 1 34 New York Times reports blabla
# 2 42 Financial Times reports blabla
# 3 21 Greenwire reports blabla
# 4 3 New York Times reports blabla
# 5 46 Newswire reports blabla
I want to remove the rows that contain the string "New York Times" and "Newswire" from my data.frame. I have tried different approaches using %in% or grep, but I'm not quite sure how to use this!
How do I do that?
Per my comment, use grepl, which returns a logical value when a specified string is found in your vector. In your case, something like:
df[!grepl('New York Times',df$Article),]
should do the trick.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With