Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

The behavior of /g mode matching

On this article, it mentioned

Make sure you are clear on the fact that an expression pattern is tested on each individual character. And that, just because the engine moves forward when following the pattern and looking for a match it still backtracks and examines each character in a string until a match is found or if the global flag is set until all characters are examined.

But what I tested in Javascript

"aaa@bbb".match(/a+@b+/g)

does not produce a result like:

["aaa@bbb", "aa@bbb", "a@bbb"]

It only produces ["aaa@bbb"]. It seems it does not examine each character to test the pattern. Can anyone can explain a little on matching steps ? Thanks.

like image 285
vancewang Avatar asked Dec 06 '25 19:12

vancewang


2 Answers

That explanation is mixing two distinct concepts that IMO should be kept separated

A) backtracking

When looking for a match the normal behavior for a quantifier (?, *, +) is to be "greedy", i.e. to munch as much as possible... for example in /(a+)([^b]+)/ tested with aaaacccc all the a will be part of group 1 even if of course they also match the char set [^b] (everything but b).

However if grabbing too much is going to prevent a match the RE rules require that the quantifier "backtracks" capturing less if this allows the expression to match. For example in (a+)([^b]+) tested with aaaa the group 1 will get only three as, to leave one for group 2 to match.

You can change this greedy behavior with "non-greedy quantifiers" like *?, +?, ??. In this case stills the engine does backtracking but with the reverse meaning: a non-greedy subexpression will eat as little as possible that allows the rest of expression to match. For example (a+)(a+b+) tested with aaabbb will leave two as for group 1 and abbb for group 2, but (a+?)(a+b+) with the same string instead will leave only one a for group 1 because this is the minimum that allows matching the remaining part.

Note that also because of backtracking the greedy/non-greedy options doesn't change if an expression has a match or not, but only how big is the match and how much goes to each sub-expression.

B) the "global" option

This is something totally unrelated to backtracking and simply means that instead of stopping at the first match the search must find all non-overlapping matches. This is done by finding the first match and then starting again the search after the end of the match.

Note that each match is computed using the standard regexp rules and there is no look-ahead or backtracking between different matches: in other words if making for example a greedy match shorter would give more matches in the string this option is not considered... a+[^b]+ tested with aaaaaa is going to give only one match even if g option is specified and even if the substrings aa, aa, aa would have been each a valid match for the regexp.

like image 133
6502 Avatar answered Dec 08 '25 07:12

6502


/g does not mean it will try to find every possible subset of characters in the input string which may match the given pattern. It means that once a match is found, it will continue searching for more substrings which may match the pattern starting from the previous match onward.

For example:

"aaa@bbb ... aaaa@bbbb".match(/a+@b+/g);

Will produce

["aaa@bbb", "aaaa@bbbb"]
like image 41
p.s.w.g Avatar answered Dec 08 '25 09:12

p.s.w.g



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!