When you store a large string in a scalar, perl allocates the memory to store that string and associate it with the scalar. It uses the same memory even if you assign a much shorter value to the same scalar. Use the functional form of undef
to let perl reuse that memory for something else. This is important when you want to reuse the variable or the lifetime of the variable is very long.
Joel Berger told me about this trick at the Nordic Perl Workshop when we were talking about non-blocking reads from a stream where a line might be much longer than the underlying IO buffer size. You’d need to store the partial line in a buffer until the rest of it arrived. The problem is that one large line could force the interpreter to use a huge chunk of memory that it wouldn’t reuse for anything else. Do this enough in a long-running program and you have a leak problem.
Joel pointed me to Mojo issue 1256 that reported such a problem with Mojo::IOLoop::Stream. A buffer variable would hold incoming data before it went on its way. That variable stuck around instead of going out of scope so it could store data between reads. If one iteration put a much larger string than normal the scalar wouldn’t give it up. Paul Evans showed the way to fix it.
Consider some string that stores some large string. The interpreter allocates and handles all of the memory for to hold that string and point the scalar at it. Devel::Peek‘s Dump
can show you the low-level details of a variable:
use Devel::Peek; $Devel::Peek::pv_limit = 20; # truncate the string in output my $buffer = 'Buster' x 1e6; Dump $buffer;
The output shows the scalar data structure (the SV) and the pointer value (PV) that it points to. The LEN
shows the number octets used. This isn’t surprising:
SV = PV(0x7fcbba003e70) at 0x7fcbba01bc80 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x10834b000 "BusterBusterBusterBu"...\0 CUR = 6000000 LEN = 6000002
After you use that large string you might replace it with a smaller string. You don’t need all that memory and you might expect that Perl will free some of it since the long string is gone:
use Devel::Peek; $Devel::Peek::pv_limit = 20; # truncate the string in output my $buffer = 'Buster' x 1e6; $buffer = 'Buster'; Dump $buffer;
That’s not the case. You had a long string there and you might have a long string again so the interpreter keeps that pointer just in case. The LEN
is the same although only six octets are used. This ties up a bunch of memory:
SV = PV(0x7fb957002e70) at 0x7fb957817880 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x102936000 "Buster"\0 CUR = 6 LEN = 6000002
An empty string is as good a string as any other. You might think that would release the memory:
use Devel::Peek; $Devel::Peek::pv_limit = 20; # truncate the string in output my $buffer = 'Buster' x 1e6; $buffer = ''; Dump $buffer;
Now the length is 0 but all the memory is still there. The LEN
hasn’t changed:
SV = PV(0x7fc28e003e70) at 0x7fc28e01bc80 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x10ee99000 ""\0 CUR = 0 LEN = 6000002
Assigning undef
does a slightly different thing:
use Devel::Peek; $Devel::Peek::pv_limit = 20; # truncate the string in output my $buffer = 'Buster' x 1e6; $buffer = undef; Dump $buffer;
The pointer still holds the old string but the POK
and pPOK
flags have disappeared. The signals that the SV can’t trust the value in the pointer even though it hasn’t changed it.
SV = PV(0x7fed79003e70) at 0x7fed79806e80 REFCNT = 1 FLAGS = () PV = 0x108347000 "BusterBusterBusterBu"...\0 CUR = 6000000 LEN = 6000002
Assign another value to $buffer
after you assigned undef
:
use Devel::Peek; $Devel::Peek::pv_limit = 20; # truncate the string in output my $buffer = 'Buster' x 1e6; $buffer = undef; $buffer = 'Ginger'; Dump $buffer;
Setting the new value reuses the same memory and restores the flags that signal that the value is okay to use:
SV = PV(0x7f8fa4803e70) at 0x7f8fa481bc80 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x101ec1000 "Ginger"\0 CUR = 6 LEN = 6000002
To release that memory, use undef
in its functional form rather than assigning it:
use Devel::Peek; $Devel::Peek::pv_limit = 20; # truncate the string in output my $buffer = 'Buster' x 1e6; Dump $buffer; undef $buffer; Dump $buffer; $buffer = 'Ginger'; Dump $buffer;
After the undef
there’s no PV. When you assign a new value, the interpreter can use different memory for it. Your process’s memory footprint probably will not get smaller but the interpreter can use that memory for something else.
In the last part of the output you see that the LEN
is a more appropriate value:
SV = PV(0x7ff0ae803e70) at 0x7ff0af007280 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x10c87d000 "BusterBusterBusterBu"...\0 CUR = 6000000 LEN = 6000002 SV = PV(0x7ff0ae803e70) at 0x7ff0af007280 REFCNT = 1 FLAGS = () PV = 0 SV = PV(0x7ff0ae803e70) at 0x7ff0af007280 REFCNT = 1 FLAGS = (POK,IsCOW,pPOK) PV = 0x7ff0ae60b0f0 "Ginger"\0 CUR = 6 LEN = 10 COW_REFCNT = 1
Things to remember
- Perl manages the memory for you and will keep big chunks of memory around just in case.
- Assigning a new value, including the empty string or
undef
doesn’t necessarily release the memory that scalar uses. - Using
undef
in the functional form releases the pointer from the scalar and you’ll get a new PV when you assign another value.
Here’s a complete example for anyone who wants to demo in one shot.