I have come across a peculiarity in a plperl stored procedure on Postgres 9.2 with Perl 5.12.4.
The curious behavior can be reproduced using this "broken" SP:
CREATE FUNCTION foo(VARCHAR) RETURNS VARCHAR AS $$
    my ( $re ) = @_;
    $re = ''.qr/\b($re)\b/i;
    return $re;
$$ LANGUAGE plperl;
When executed:
# select foo('foo');
ERROR:  Unable to load utf8.pm into plperl at line 3.
BEGIN failed--compilation aborted.
CONTEXT:  PL/Perl function "foo"
However, if I move the qr// operation into an eval, it works:
CREATE OR REPLACE FUNCTION bar(VARCHAR) RETURNS VARCHAR AS $$
    my ( $re ) = @_;
    eval "\$re = ''.qr/\\b($re)\\b/i;";
    return $re;
$$ LANGUAGE plperl;
Result:
# select bar('foo');
       bar       
-----------------
 (?^i:\b(foo)\b)
(1 row)
Why does the eval bypass the automatic use utf8?
Why is use utf8 even required in the first place? My code is not in UTF8, which is said to be the only time one should use utf8.
If anything, I might expect the eval version to break without use utf8, in the case where the input to the script contained non-ASCII values. (Further testing shows that passing non-ASCII values to bar() does indeed cause the eval to fail with the same error)
DO 'elog(WARNING, join ", ", sort keys %INC)' language plperl;:
WARNING: Carp.pm, Carp/Heavy.pm, Exporter.pm, feature.pm, overload.pm, strict.pm, unicore/Heavy.pl, unicore/To/Fold.pl, unicore/lib/Perl/SpacePer.pl, utf8.pm, utf8_heavy.pl, vars.pm, warnings.pm, warnings/register.pm
CONTEXT: PL/Perl anonymous code block
DO
But not so on the machine demonstrating the odd behavior:
WARNING: Carp.pm, Carp/Heavy.pm, Exporter.pm, feature.pm, overload.pm, overloading.pm, strict.pm, vars.pm, warnings.pm, warnings/register.pm
CONTEXT: PL/Perl anonymous code block
DO
This question is not about how to get my target machine to load utf8 automatically; I know how to do that. I'm curious why it seems to be necessary in the first place.
In the verison that's failing, you're executing
$re = ''.qr/\b($re)\b/i
In the version that's succeeding, you're executing
$re = ''.qr/\b(foo)\b/i
Sounds like qr// needs utf8.pm when the pattern was compiled as a Unicode pattern (whatever that means), but the latter isn't compiled as a Unicode pattern.
The failure to load utf8.pm is due to the limitations imposed by the Safe compartment created by plperl.
The fix is to load the module outside the Safe compartment.
The workaround is to use the more efficient
$re = '(?^u:\\b(?i:'.$re.')\\b)';
I had the same issue and I fixed it by adding
plperl.on_init = 'use utf8; use re; package utf8; require "utf8_heavy.pl";'
to postgresql.conf file.
I hope this will help someone.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With