I have a regex in a variable, that includes a substring inside \Q...\E containing an opening bracket. I'm expecting that [ would be interpreted like a vanilla character by the parser since it's inside a \Q...\E section.
This seems to be the case when the regex comes as a literal in the program, but the parser fails on it when it comes in a variable.
Here's a simplified example.
This works:
$r = qr/\Qa[b\E\d+/;
if ("a[b1" =~ $r) { print "match\n"; }
This fails:
$v='\Qa[b\E\d+';
$r=qr/$v/;
It dies at line 2 with
Unmatched [ in regex; marked by <-- HERE in m/\Qa[ <-- HERE b\E\d+/
Why would Perl reject this? And only when interpolated from a variable and not inline with the same regex?
I can't see anything explaining it in the FAQ's How do I match a regular expression that's in a variable? or perlop's Regexp Quote-Like Operators.
This is with Perl 5.14.2 (Ubuntu 12.04) if the version matters, with default settings.
\Q has nothing to do with regular expressions. When the regex engine sees \Q, it doesn't recognize it, spits out a warning, and treats it like \\Q.
>perl -we"$re='\Qa'; qr/$re/
Unrecognized escape \Q passed through in regex; marked by <-- HERE in m/\Q <-- HERE a/ at -e line 1.
Like interpolation, \Q is recognized by double-quoted string literals and similar. Like interpolation, it's gotta be part of a literal (Perl code) to work.
>perl -E"$pat=q{\Q!}; say qr/$pat/"
(?^u:\Q!)
>perl -E"$pat=qq{\Q!}; say qr/$pat/"
(?^u:\!)
>perl -E"$x='!'; $pat=q{$x}; say qr/$pat/"
(?^u:$x)
>perl -E"$x='!'; $pat=qq{$x}; say qr/$pat/"
(?^u:!)
Solutions:
$v="\Qa[b\E\\d+";$v=qr/\Qa[b\E\d+/;$v=quotemeta('a[b').'\d+';A Perl regular expression is first evaluated as if it was a simple double-quoted string. Any embedded variables are interpolated, and escape sequences that don't originate from interpolated variables are processed. This is the point when special operators like \L, \U and \Q...\E are acted on.
The processing stops there in double-quoted strings, but in regular expressions the string is then compiled.
In your example you have
$v = '\Qa[b\E\d+';
and because you have used single quotes, this string isn't changed at all.
You then interpolate it into a regular expression with
$r = qr/$v/;
but, because escape sequences inside interpolated variables are untouched, the string is passed as it is to the regex compiler, which reports that the expression is invalid because it contains an unmatched an unescaped open bracket. If you remove that bracket you still get an error; this time it is Unrecognized escape \Q passed through in regex showing that the \Q...\E hasn't been processed and appears as literals.
What would work is to change your assignment to $v to use double quotes instead, like this
my $v = "\Qa[b\E\\d+";
The backslash on \d has to be doubled up otherwise is would just vanish. Now the \Q...\E has been acted on, and $v is equal to a\[b\d+. Compiling this as a regex works fine.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With