I found that non-greedy regex match only become non-greedy when anchoring to the front, not to the end:
$ echo abcabcabc | perl -ne 'print $1 if /^(a.*c)/'
abcabcabc
# OK, greedy match
$ echo abcabcabc | perl -ne 'print $1 if /^(a.*?c)/'
abc
# YES! non-greedy match
Now look at this, when anchoring to the end:
$ echo abcabcabc | perl -ne 'print $1 if /(a.*c)$/'
abcabcabc
# OK, greedy match
$ echo abcabcabc | perl -ne 'print $1 if /(a.*?c)$/'
abcabcabc
# what, non-greedy become greedy?
why is that? how come it doesn't print abc as before?
(The problem was found in my Go code, but illustrated in Perl for simplicity).
$ echo abcabcabc | perl -ne 'print $1 if /(a.*?c)$/' abcabcabc # what, non-greedy become greedy?
Non-greedy means it'll match the fewest characters possible at the current location such that the entire pattern matches.
After matching a at position 0, bcabcab is the least .*? can match at position 1 while still satisfying the rest of the pattern.
"abcabcabc" = /a.*?c$/ in detail:
a matches 1 char (a).
.*? matches 0 chars (empty string).
c fails to match. Backtrack!.*? matches 1 char (b).
c matches 1 char (c).
$ fails to match. Backtrack!.*? matches 2 chars (bc).
c fails to match. Backtrack!.*? matches 7 chars (bcabcab).
c matches 1 char (c).
$ matches 0 chars (empty string). Match successful!"abcabcabc" = /a.*c$/ in detail (for contrast):
a matches 1 char (a).
.* matches 8 chars (abcabcabc).
c fails to match. Backtrack!.* matches 7 chars (abcabcab).
c matches 1 char (c).
$ matches 0 chars (empty string). Match successful!Tip: Avoid patterns with two instances of a non-greediness modifier. Unless you are using them as an optimization, there's a good chance they can match something you don't want them to match. This is relevant here because patterns implicitly start with \G(?s:.*?)\K (unless cancelled out by a leading ^, \A or \G).
What you want is one of the following:
/a[^a]*c$/
/a[^c]*c$/
/a[^ac]*c$/
You could also use one of the following:
/a(?:(?!a).)c$/s
/a(?:(?!c).)c$/s
/a(?:(?!a|c).)c$/s
It would be inefficient and unreadable to use these latter three in this situation, but they will work with boundaries that are longer than one character.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With