Perl v5.36 new features

These items are in Perl New Features, a book from Perl School that you can buy on LeanPub or Amazon. Your support helps me to produce more content.


Slurp a file from the command line with -g

This is a chapter in Perl New Features, a book from Perl School that you can buy on LeanPub or Amazon. Your support helps me to produce more content.


Perl v5.36 adds the -g switch as a shortcut for -0777, which undefines the input record separator so you can read an entire file as a single string. This is often called “slurping”, and is useful when you need to process text that spans several lines.

The input record separator

The input record separator is the character (or characters) that Perl’s line-input operator uses to determine when a line has ended. By default, that’s a newline (U+0010), but you can use any string you like by setting $/ ($INPUT_RECORD_SEPARATOR). Sometimes the form feed is a useful separator for multiline records:

$/ = "\f";

On the command line, the -0 switch is a quick way to set the value for $/. Without a value, it uses the null byte, which is sometimes a useful as a separator:

% perl -MO=Deparse -0 -e 1
BEGIN { $/ = "\000"; $\ = undef; }
'???';
-e syntax OK

A number in octal or hexadecimal sets $/ to some other single character:

% perl -MO=Deparse -0014 -e 1
BEGIN { $/ = "\f"; $\ = undef; }
'???';
-e syntax OK
% perl -MO=Deparse -0xC -e 1
BEGIN { $/ = "\f"; $\ = undef; }
'???';
-e syntax OK

Any number above 0377 octal (more than 255 decimal) sets $/ to undef:

% perl -MO=Deparse -0377 -e 1
BEGIN { $/ = "\377"; $\ = undef; }
'???';
-e syntax OK
% perl -MO=Deparse -0400 -e 1
BEGIN { $/ = undef; $\ = undef; }
'???';
-e syntax OK

Conventionally, though, the Perl documentation has used 777 as the value to get undef probably since it’s easier to remember:

% perl -MO=Deparse -0777 -e 1
BEGIN { $/ = undef; $\ = undef; }
'???';
-e syntax OK

The new -g is a short synonym for -0777, so it does the same thing that :

% perl -MO=Deparse -e 1
'???';
-e syntax OK
% perl -MO=Deparse -g -e 1
BEGIN { $/ = undef; $\ = undef; }
'???';
-e syntax OK

Far more than you ever wanted to know

The -0 switch has some other interesting behavior, and has a few other interesting features. Since I’m already writing about this feature, I might as well keep going.

Single-character line ending

You can use an octal or hexadecimal number after the -0 to choose the single character that you want to use as the line ending. I’ve often used the form feed (U+000C) to separate multi-line records. The particular character doesn’t matter as long as it doesn’t appear in the data (so the null byte might be useful too):

% perl -le "print qq(one\ntwo\nthree\n\fA\nB\nC\n\f9\n10\n11\n)" > formfeed.txt

When you read F by lines with no change to the input record separator, you see the three records separated by “blank” lines, which are really the form feed:

% perl -ne 'print' formfeed.txt
one
two
three

A
B
C

9
10
11

You can see that easier when you replace the invisible characters with their ordinal values, which you do in octal here:

% perl -pe 's/(\P{Print})/sprintf(q(%03o),ord($1)) . "\n"/eg' formfeed.txt
one012
two012
three012
014
A012
B012
C012
014
9012
10012
11012
012

When you use the octal value of the form feed for the number after the -0 switch and output lines surrounded by angle brackets, you get three lines (with the newlines and line-ending form feed in tact):

% perl -014 -ne 'print qq(<$_>)' formfeed.txt
<one
two
three

><A
B
C

><9
10
11

>

You could have also specified this with three digits, -0014, or as hexadecimal with a leading x, like -0xC. The hexadecimal version is valuable when you need to specify a character past the largest single octet value you can get out of three octal digits, which is 0377.

There’s a catch though. If you want to set the input record separator to a wide character, you need to ensure that you read the input correctly. For the ☃ (U+2603 SNOWMAN) to be the separator, which takes up three octets in UTF-8, you need to read the input as UTF-8 too. The -C is one way to do that:

% perl -0x2603 -C -ne 'print qq(<$_>)' snowmen.txt >>

You aren’t able to specify multiple characters as a line separator since B thinks the extra characters are a file for input:

% perl -MO=Deparse -0x0100x2603 -e
No Perl script found in input

Slurping an entire file

If you specify an octal value 400 or higher, which is more than 8 bits, Perl sets the input record separator to undef. With no defined value for $/, Perl slurps the entire input. But, this is different than setting the empty string (a defined value), which I write about in the next section.

You’ve probably seen -0777, perhaps the most common use of -0:

% perl -0777 -ne 'print qq(<$_>)' dog.txt
<Newfoundland
Golden Retreiver
Boxer
>

That F is actually read through the ARGV filehandle, which does some trickery to make it look like all the input is coming from one source. However, the line input operator can’t read across the command line files; B figures out when one file is empty, closes it, then opens the next file. So, each file appears to be its own line:

% perl -0777 -ne 'print qq(<$_>)'⏎dog.txt cat.txt lizard.txt
<Newfoundland
Golden Retreiver
Boxer
><Tabby
Marmalade
Tiger
><Monitor
Iguana
Godzilla
>

If you wanted all the files to be one lines, route them through standard input before they get to B. This only looks like a useless use of B:

% cat dog.txt cat.txt lizard.txt |⏎perl -0 -ne 'print qq(<$_>)'

=head1 Paragraph mode

“Paragraph mode” is a special case. The -00 sets the input record separator to the empty string. That’s different than the undefined value even though both are false:

% perl -MO=Deparse -00 -e 1
BEGIN { $/ = ""; $\ = undef; }
'???';
-e syntax OK

When the input record separator is the empty string, B treats it as if it is multiple consecutive newlines. This has the same effect as if the input record separator were the pattern \n+ Not only that, put it collapses the multiple newlines to exactly two newlines:

% perl -00 -ne 'print qq(<$_>)' paras.txt
<First line first para
Second line first para
Third line first para

><After first blank line
Second line after first blank line
Third line after first blank line

><After 2nd blank line
2nd line after 2nd blank line
3rd line after 2nd blank line
>

Summary

Here’s a quick summary of the various incantations of the -0 switch:

Switch Input Record Separator Note
-0 \000 null byte
-00 empty string, but “\n+” paragraph mode
-0014 8-bit character, in octal form feed
-0xC 8-bit character, in hex form feed
-0400 undef, above 8-bit slurp
-0777 undef, idiomatic slurp
-g undef slurp, new in v5.36
-0x1FF \777 character, include -C actual \777
-0x2603 wide character, include -C snowman

From the Perl documentation

Automatically turn on warnings

This is a chapter in Perl New Features, a book from Perl School that you can buy on LeanPub or Amazon. Your support helps me to produce more content.



Perl v5.36 automatically turns on warnings when you specify the minimum Perl version with use:

use v5.36;  # use warnings for free

Since this form has been turning on strictures since v5.12 (Implicitly turn on strictures with Perl 5.12), you no longer have to specify warnings or strict at the top of my program.

Continue reading “Automatically turn on warnings”

Iterate over multiple elements at the same time

This is a chapter in Perl New Features, a book from Perl School that you can buy on LeanPub or Amazon. Your support helps me to produce more content.


This feature was promoted to a stable version in v5.40.

Perl v5.36 adds experimental support that allows a foreach (or for) to loop iterate over multiple values at the same time by specifying multiple control variables. This is incredibly cool:

use v5.36;
use experimental qw(for_list);

my @animals = qw( Buster Mimi Ginger Nikki );
foreach my( $s, $t ) ( @animals ) {
	say "$s ^^^ $t";
	}

The output shows two iterations of the loop, each which grabbed two values from the list:

Buster ^^^ Mimi
Ginger ^^^ Nikki

Add another parameter; the list now doesn’t divide evenly between the parameters, so any parameter that can’t match with a list item gets undef, just like normal list assignment:

use v5.36;
use experimental qw(for_list);

foreach my( $s, $t, $u ) ( @animals ) {
	say "$s ^^^ $t ^^^ $u";
	}

Since use v5.36 also turns on warnings, you get those “uninitialized” warnings for free when you use those undef values:

Buster ^^^ Mimi ^^^ Ginger
Nikki ^^^  ^^^
Use of uninitialized value ...
Use of uninitialized value ...

Another interesting use combines the new builtin::indexed feature that gets you the index and value at the same time:

use v5.36;
use experimental qw(for_list builtin);
use builtin qw(indexed);

my @animals = qw( Buster Mimi Ginger Nikki );
foreach my( $i, $value ) ( indexed(@animals) ) {
	say "$i: $value";
	}

That’s a bit nicer than going through the indices to access the value in an additional statement:

foreach my $i ( 0 .. $#animals ) {
	my $value = $animals[$i];
	say "$i: $value";
	}

No placeholders (yet)

So far, this new syntax doesn’t have a way to skip values. In a normal list assignment, you discard a value coming from the right hand list with a literal undef:

my( $s, undef, $t ) = @animals

Try that in the for list and you get a syntax error:

foreach my( $s, undef, $u ) ( @animals ) {  # ERROR!
	say "$s ^^^ $u";
	}

Hash keys and values

I’m tempted to use this for hashes, although each inside a while is still probably better since it doesn’t have to build the entire input list in one go:

use experimental qw(for_list);

my %animals = (
	cats => [ qw( Buster Mimi Ginger ) ],
	dogs => [ qw( Nikki ) ],
	);

foreach my( $k, $v ) ( %animals ) {
	say "$k ^^^ @$v";
	}

Since those hash values are array refs, it would be helpful if this feature could use the refaliasing and declared_refs features (Mix assignment and reference aliasing with declared_refs):

use experimental qw(for_list);
use experimental qw(refaliasing declared_refs);

my %animals = (
	cats => [ qw( Buster Mimi Ginger ) ],
	dogs => [ qw( Nikki ) ],
	);

foreach my( $k, \@v ) ( %animals ) {
	say "$k ^^^ @v";
	}

Sadly, the parser doesn’t expect the reference operator inside that for list:

syntax error ... near ", \"

Doing

Prior to builtin multiple iteration, the best way to do the same thing was probably the List::MoreUtils (not part of core) module. The natatime function, which I wished was named n_at_a_time, grabs the number of elements that you specify and returns them as a list. Since it returns a list instead of an array reference, it’s easier to use it with a while:

use List::MoreUtils qw(natatime);

my @x = ('a' .. 'g');
my $iterator = natatime 3, @x;

while( my @vals = $iterator->() ) {
	print "@vals\n";
	}

Another approach uses splice. The easiest thing might be to do it destructively since that requires no index fiddling:

my @x = 'a' .. 'g';
my @temp = @x;

while( my @vals = splice @temp, 0, 3, () ) {
	print "@vals\n";
	}

Here’s an example from the L documentation that does the same thing:

sub nary_print {
  my $n = shift;
  while (my @next_n = splice @_, 0, $n) {
	say join q{ -- }, @next_n;
  }
}

nary_print(3, qw(a b c d e f g h));
# prints:
#   a -- b -- c
#   d -- e -- f
#   g -- h

Playing with the array indices can get this done, but it comes with a lot of baggage. First, an array slice doesn’t return an empty list, so you can’t use that as a condition in the while as in the previous examples. Since it fills in the missing elements with undef, outputting the values possibly comes with warnings. Even if you want to accept those annoyances, you still have to manage the end of array condition ($#X) yourself:

my @x = 'a' .. 'g';

my $start = 0;
my $n     = 3;

while( $start <= $#x ) {
	no warnings qw(uninitialized);
	my @vals = @x[$start, $start + $n - 1];
	print "@vals\n";
	$start += $n;
	}

So yeah, having a multiple iterator feature built into Perl is a huge win.

Summary

The experimental for_list feature lets you take multiple elements of the list in each iteration. This doesn't yet handle many of the list assignment features that would make this as useful as people will want it to be.

From the Perl documentation

  1. perlsyn