Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Why does Perl's m//g operator sometimes result in NULLs being introduced into text?

We ran into some strange results recently in one of our Perl scripts, where the NULL character (\0 in Perl) was being introduced into some text. We ultimately tracked it down to the //g operator being used on the Perl m// match operator by accident. Until this happened, I wasn't even aware you could use //g with the m// operator, as I had only ever used it with the s/// operator.

In any event, even though we have fixed the bug by removing the errant //g, I would love to know WHY this small script introduces a NULL character into the text! :-)

my $text = "01";

if ($text =~ m/(\d+)/g)
{
    $text = "A$1";
}

if ($text =~ m/\0/)
{
    print "Text contains NULL!\n";
}

Subtle changes that prevent NULL from appearing: If I change the value of $text (e.g. to just "0" or just "1" or many other combinations), the NULL is no longer introduced. If I change the assignment value from "A$1" to just "$1", the NULL is no longer introduced. If I assign "A$1" to a totally different variable, then NULL is not introduced into that variable. And if I remove the //g operator during the m// match, the NULL is not introduced.

Can a Perl guru explain this behavior please? I could not find anything by googling.

like image 904
Mason G. Zhwiti Avatar asked Sep 07 '25 14:09

Mason G. Zhwiti


2 Answers

if ($text =~ m/(\d+)/g)

is wrong. Specifically, code of the form if (/.../g) is wrong. It makes no sense conceptually ("If match until it doesn't match"???) and can give undesired results.

$_ = "01ab";
if (/(\d+)/g) { say $1; }   # 01
if (/(.*)/g)  { say $1; }   # ab!!!

Get rid of the "g".


The end of a string is normally followed by a NUL.

$ perl -MDevel::Peek -e'Dump "01"'
SV = PV(0x88b4740) at 0x88d1368
  REFCNT = 1
  FLAGS = (PADTMP,POK,READONLY,pPOK)
  PV = 0x88d52f0 "01"\0
  CUR = 2
  LEN = 12

Your version of Perl appears to have a bug where it's matching that NUL when the starting position of the match is at the end of the string. No NULs are being inserted. Fortunately, if you fix your buggy code, you won't suffer from this bug.


../perl/Porting/bisect.pl           \
   --target=miniperl --expect-fail  \
   --start=v5.13.0 --end=v5.14.0    \
   -e'
      my $text = "01";
      if ($text =~ m/(\d+)/g) { $text = "A$1"; }
      exit($text =~ m/\0/ ? 1 : 0);
   '

shows that it was fixed by 6f1401dc2acd2a2b85df22b0a74e5f7e6e0a33aa.

Based git tag --contains 6f1401dc2acd2a2b85df22b0a74e5f7e6e0a33aa, 5.13.2 is the first dev release and 5.14.0 is the first production release to have the fix.

like image 55
ikegami Avatar answered Sep 10 '25 10:09

ikegami


This is clearly a bug. Check it on the latest version, if it's still a problem, here's how to file a bug report:

http://perldoc.perl.org/perlbug.html

like image 22
Dan Avatar answered Sep 10 '25 08:09

Dan