Let's say I want to find all words in which letter "e" appears exactly two times. When I define this pattern:
pattern1 <- "e.*e"
grep(pattern1, stringr::words, value = T)
RegEx also matches words such as "therefore", because "e" appears (at least) two times as well. The point is, I don't want my pattern to be "at least", I want it to be "exactly n times".
This pattern...
pattern2 <- "e{2}"
...finds words with two letter "e", but only if they appear one after each other ("feel", "agre" etc). I'd like to combines these two patterns to find all words with exact number of not necessarily consecutive appearances of a letter "e".
You may use:
^(?:[^e]*e){2}[^e]*$
See the regex demo. The (?:...)
is a non-capturing group that allows quantifying a sequence of subpatterns and is thus easily adjustable to match 3, 4 or more specific sequences in a string.
Details
^
- start of string(?:[^e]*e){2}
- 2 occurrences of
[^e]*
- any 0+ chars other than e
e
- an e
[^e]*
- any 0+ chars other than e
$
- end of stringSee the R demo below:
x <- c("feel", "agre", "degree")
rx <- "^(?:[^e]*e){2}[^e]*$"
grep(rx, x, value = TRUE)
## => [1] "feel"
Note that instead of value = T
it is safer to use value = TRUE
as T
might be redefined in the code above.
We can use a pattern to match zero or more characters that are not 'e' ([^e]*
) from the start (^
) of the string, followed by character 'e', then another set of characters that are not 'e' followed by 'e', and zero or more characters not an 'e' until the end ($
) of the string
res <- grep("^[^e]*e[^e]*e[^e]*$", stringr::words, value = TRUE)
stringr::str_count(res, "e")
#[1] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
#[58] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
#[115] 2 2 2 2 2 2 2
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With