When using str_dectect() you can use the | operator like so...
example_df <- data.frame(
letters = c("A B C", "C B E", "F C B", "A D E", "F G C")
)
example_df %>% filter(str_detect(letters, "B|C"))
And it will return all rows except the fourth (where letters = "A D E").
I want to do the same with str_detect() but looking for a combination of letters.
I imagined you could just replace the | operator with the & operator and the following would return all rows except the last two.
example_df <- data.frame(
letters = c("A B C", "C B E", "F C B", "A D E", "F G C")
)
example_df %>% filter(str_detect(letters, "B&C"))
However, this doesn't work. Does anyone know how I can make this work using str_detect or another tidyverse method (I can get it to work with grepl but need to find a tidyverse solution).
You can do it using Perl-style "non-consuming lookahead":
example_df <- data.frame(
letters = c("A B C", "C B E", "F C B", "A D E", "F G C", "B B E")
)
library(tidyverse)
example_df %>% filter(str_detect(letters, "(?=.*B)(?=.*C)"))
#> letters
#> 1 A B C
#> 2 C B E
#> 3 F C B
Created on 2022-03-23 by the reprex package (v2.0.1)
This looks for anything followed by B, but doesn't advance; then it looks for anything followed by C. That's accepted by default in str_detect, but if you wanted to do the same sort of thing in base R functions, you'd need the perl = TRUE option, e.g.
grep("(?=.*B)(?=.*C)", example_df$letters, perl = TRUE, value = TRUE)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With