Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

R Regex Repetition

Tags:

regex

r

Working through R4DS Strings chapter and am getting confused about the following regular expression example:

x <- "1888 is the longest year in Roman numerals: MDCCCLXXXVIII"

str_view(x, "C?")

This code returns no match

Using the ? I understand specifies either match 0 or 1 time and repetition is "greedy" and will match the longest string possible, so why isn't 1 "C" matched?

Additionally, the below code matches the first "CC":

x <- "1888 is the longest year in Roman numerals: MDCCCLXXXVIII"

str_view(x, "CC?")

Thanks

like image 567
Alexander Turner Avatar asked Dec 17 '25 08:12

Alexander Turner


1 Answers

I think it does return a match, but it's the empty string.

Explanation:

  1. The regex engine starts by checking if the first character matches
  2. M does not match C.
  3. But wait, the C is optional.
  4. The empty string matches.
  5. Success!

On the other hand CC? can't match at the start of the string, so the engine has to step through the string until it finds the first C, and will then match regardless of how many Cs there are.

Moral: Never construct a regex where all tokens are optional, allowing an empty match (unless you're planning to do exactly that).

like image 82
Tim Pietzcker Avatar answered Dec 19 '25 21:12

Tim Pietzcker



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!