Capturing groups and lookarounds

Question

I want to know how capturing groups (or non capturing) are affecting lookarounds in Regex. Here are 2 example:

test (?:(?!<start).)+

test (?!<start).+

I would appreciate if anybody can explain how regex engine is interpreting both cases in details.

Kobi · Accepted Answer

Look-arounds are zero-width. In that respect, it doesn't make much sense to place them on their own inside a capturing group, they don't capture anything more interesting than an empty string (much like \b vs. (\b). _{^{Edge cases involve back-referencing an optional group, but that isn't very interesting.}}
Positive looharounds - (?=...) and (?<=...) - can capture groups. For example, /(?=(\b\w+\b))/ will result in positive empty matches, where each match has a non-empty group. For example, /(?<=(.))\1/ will match characters that follow identical characters.
Negative looharounds - (?!...) and (?<!...) - cannot capture groups. That makes a lot of sense when you think about it, because the never match, but they can use capturing groups within them. For example, ^(?!.*(.).*\1).*$ will match a line that does not contain duplicated letters. _{^{Again, how \1 behaves, in that case, out of the group is not particularity interesting.}}

Now, to your example. The two patterns match different texts:

(?:(?!<start).)+ - Check we are not after the text start, and then match all characters (of the line). Examples:
1. Input "start1234end", matches the whole input - the start position isn't after the word "start".
2. Input "before123startAfter" Suppose the previous match was "before123start" (on a different pattern the allows that), the next match cannot start here, and will skip one character: "fter".
(?:(?!<start).)+ - Here, the lookbehind assertion is repeated for every character (for intuition: if a group (?:...)+ is a loop, the assertion is inside the loop). A character will not be matched if it is directly after the string start:
1. Input "start1234end" - First match will be "start". The engine cannot match the next '1' (because it isn't a character that isn't after start), so the match stops. The next match will be "234end".

Donate For Us