PHP uses the PCRE regex library, which does not support repetition in lookbehinds.
If repetition is in the lookbehind (e.g., (?<=\d+)), PHP will normally issue a warning like this:
Warning: preg_match_all(): Compilation failed: lookbehind assertion is not fixed length at offset 7 in lookbehind.php on line 10
However, I have found a case where compilation does not fail when I think it should.
These fail to compile, as expected:
/(?<=X*)a//(?<=X+)a//(?<=(X)*)a/However, /(?<=(X)+)a/ does compile. This should be functionally equivalent to /(?<=(X){1,})a/, which also compiles. On the other hand, if I actually add an upper bound to that range
(e.g., /(?<=(X){1,2})a/), that fails to compile. I think /(?<=(X)+)a/ and /(?<=(X){1,})a/ should also fail to compile, but they do not. Why not?
Here's some code:
$str = 'aXaaXXaaaXXXaaaa';
$regex = '/(?<=((?:X)+))a+/';
preg_match_all($regex, $str, $matches, PREG_OFFSET_CAPTURE|PREG_SET_ORDER);
print_r($matches);
I've complicated the pattern slightly to add a capturing group around the multiple Xs. Here are my results:
Array (
[0] => Array (
[0] => Array (
[0] => aa
[1] => 2
)
[1] => Array (
[0] => X
[1] => 1
)
)
[1] => Array (
[0] => Array (
[0] => aaa
[1] => 6
)
[1] => Array (
[0] => X
[1] => 5
)
)
[2] => Array (
[0] => Array (
[0] => aaaa
[1] => 12
)
[1] => Array (
[0] => X
[1] => 11
)
)
)
It clearly matches the as that follow Xs, which is correct. However, subpattern 1 appears to only match one X, not all of them. If I add an a at the beginning of the lookbehind so that it must find all the Xs in between, here are my results:
$regex = '/(?<=(a(?:X)+))a+/';
Array (
[0] => Array (
[0] => Array (
[0] => aa
[1] => 2
)
[1] => Array (
[0] => aX
[1] => 0
)
)
)
It only matches once (where there is only one X). Effectively, (X)+ and (X){1,} are being reduced to (X){1} (which is allowable due to its fixed length).
I hate to cry, "Bug!" as soon as I find something that doesn't do what I expect, but it sure seems like one. The pattern isn't rejected like I expect, and then it doesn't behave as I would expect it to even if it were a valid pattern.
So I ask:
+ but not *?X+ fails; (X)+ is allowed ?Any insight is most appreciated. Thank you.
It's not a PHP bug. If it is a bug (and it does look like one) it is a PCRE bug and should be reported there. However, check the PCRE version in phpinfo() and compare it with the latest version. If it is not up-to-date try running the same regexes directly in the latest PCRE before posting a bug report.
PCRE version 8.32-RC1 2012-08-08
re> /(?<=(X)+)a/ Failed: lookbehind assertion is not fixed length at offset 8 re>
Probably was a bug. Please update to the latest PCRE.
Btw, you can use \K to create unlimited backreferences.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With