Hi im working with large string data in Perl variable (its raw email body, so it can contain attachments). Having interesting issue with Perl's substr. Seems its leak or im doing something wrong (if yes, what?). Consider code:
#!/usr/local/bin/perl
use strict;
my $str = 'a'x10_000_000;
system("ps up $$"); #22mb used (why?)
#USER   PID %CPU %MEM   VSZ   RSS  TT  STAT STARTED      TIME COMMAND
#alt  64398  0,0  0,2 33292 22700   7  S+J  22:41     0:00,03 /usr/local/bin/perl ./t.pl
substr($str, 0, 1)='';
system("ps up $$"); #No leak
#USER   PID %CPU %MEM   VSZ   RSS  TT  STAT STARTED      TIME COMMAND
#alt  64398  0,0  0,2 33292 22732   7  S+J  22:41     0:00,04 /usr/local/bin/perl ./t.pl
substr($str, 500);
system("ps up $$"); #Leaked 10Mb (why?!)
#USER   PID %CPU %MEM   VSZ   RSS  TT  STAT STARTED      TIME COMMAND
#alt  64398  0,0  0,3 43532 32520   7  S+J  22:41     0:00,05 /usr/local/bin/perl ./t.pl
my $a = substr($str, 500);
system("ps up $$"); #Leaked 10Mb + Copyed 10Mb
#USER   PID %CPU %MEM   VSZ   RSS  TT  STAT STARTED      TIME COMMAND
#alt  64398  0,0  0,5 64012 52096   7  S+J  22:41     0:00,09 /usr/local/bin/perl ./t.pl
undef $a; #Free scalar's memory
system("ps up $$"); #Free'd 10Mb
#USER   PID %CPU %MEM   VSZ   RSS  TT  STAT STARTED      TIME COMMAND
#alt  64398  0,0  0,4 53772 42308   7  S+J  22:41     0:00,09 /usr/local/bin/perl ./t.pl
# Total leaked 2 times for 10Mb each
Take look at substr($str, 500); command. In addition its return copy of the string (that's ok), it leaks same amount of memory, so if you using return value its twice piece of memory one of which is lost for whole time script working...  Also, seems its not any kind of "internal buffer" since it leaks each call..
Note this case of 10Mb increase is not "re-useful" memory since subsequent calls get more and more memory..
Any suggestions how to fix or avoid that?
My Perl version 5.14.2; same behavior i got on my work (5.8.8)
If you stick a ps check before allocating $str you'll find about 2 megs is being used just to run perl.  Thus allocating $str takes 20 megs, double the size of the string being created.  If I had to guess what's happening, perl has to make a 10 meg string and then copy it into $str.  20 megs.  It holds onto that allocated memory to use later.
substr($str, 0, 1)='' causes $str to point at a new C string, you can see this with Devel::Peek, but the process memory is not increased.  It's possible it used the memory freed from allocating the memory for 'a' x 10_000_000.
my $a = substr($str, 500); has a similar issue.  The substr makes a new 10 meg string and then copies it into $a needing 20 megs.  Why it takes more system memory to do this I'm not sure.  It's possible perl allocated memory from the previous 10 meg chunk it got from the operating system and so wasn't a single 10 meg chunk any more and it had to ask the OS for more.
undef $a definitely clears the C string associated with $a, you can see with Devel::Peek, but perl does not necessarily release the memory back to the operating system.
That's my best guess anyway.
Long story short, when memory is released by a process back to the operating system is complicated and operating systems do it differently. Here's one discussion specifically about perl and one about Linux.
According to perlglossary, buried deep into Perl internals, there is a thingie called scratchpad:
The area in which a particular invocation of a particular file or subroutine keeps some of its temporary values, including any lexically scoped variables.
Here's the code generated by perl -MO=Concise leak.pl:
...
10    <;> nextstate(main 3 leak.pl:30) v:*,&,{,x*,x&,x$,$ ->11
15    <2> sassign vKS/2 ->16
13       <@> substr[t16] sK/2 ->14
-           <0> ex-pushmark s ->11
11          <0> padsv[$str:2,4] s ->12
12          <$> const(IV 500) s ->13
14       <0> padsv[$a:3,4] sRM*/LVINTRO ->15
16    <;> nextstate(main 4 leak.pl:35) v:*,&,{,x*,x&,x$,$ ->17
...
Observe the padsv[$str:2,4] statement.
Now, if I run the code with some debug flags (perl -DmX leak.pl), the source of "leak" becomes clearer:
USER   PID  %CPU %MEM      VSZ    RSS   TT  STAT STARTED      TIME COMMAND
stas 55970   1.5  0.3  2454528  21548 s001  S+    6:11PM   0:00.04 perl -DmX leak.pl
...
Pad 0x7f8a328062c8[0x7f8a3240d040] sv:      16 sv=0x7f8a32833298
0x7f8a3240a2a0: (02222) free
0x10d013000: (02223) malloc 9999500 bytes
...
Pad 0x7f8a328062c8[0x7f8a3240d040] sv:      15 sv=0x7f8a328332c8
0x7f8a3240a560: (02231) free
0x10d99d000: (02232) malloc 9999500 bytes
...
USER   PID  %CPU %MEM      VSZ    RSS   TT  STAT STARTED      TIME COMMAND
stas 55970   1.5  0.5  2474064  41084 s001  S+    6:11PM   0:00.06 perl -DmX leak.pl
So, that's just Perl using the scratchpad.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With