The match and substitution operators, as well as regex quoting with qr//
, use flags to signal certain behavior of the match or interpretation of the pattern. The flags that change the interpretation of the pattern are listed in the documentation for qr//
in perlop (and maybe in other places in earlier versions of the documentation): Continue reading “Know the difference between regex and match operator flags”
Category: chapters
Install scripts from distributions
Perl’s distribution system is quite powerful and supported by a variety of tools that can make life easier for you. Most people tend to think that “distributions” are synonymous with modules, but that’s only one of the uses for distributions. Continue reading “Install scripts from distributions”
Make deep copies
When you want to make a completely disconnected copy of a hash or an array, it’s not enough to merely assign it to a new variable name, at least in the general case: Continue reading “Make deep copies”
Make links to per-version tools
In Item 110: Compile and install your own perl
s, we showed you how to compile and install several versions of perl
so that they don’t conflict with each other and you can use them simultaneously. Since they don’t install their programs, they are left in their $prefix/bin directories. With several perl
s, each of which has their own modules directories, using tools such as cpan
and perldoc
can get confusing. Which version of those tools are you using and which perl
are they trying to use? Continue reading “Make links to per-version tools”
Avoid accidently creating methods from module exports
Perl’s object system is fuzzy. Methods are really just subroutines and classes are just packages, which means that any subroutine in a package is also a method in that class. Your class might have subroutines that you’ve never even noticed, so you end up with methods that you didn’t want in your interface. Continue reading “Avoid accidently creating methods from module exports”
Know how Perl handles scientific notation in string to number conversions.
A recent question on Stackoverlow asked about the difference between the same floating numbers being stored in scientific notation and written out. Why does 0.76178
come out differently than 7.6178E-01
When Perl stores them, they can come out as slightly different numbers. This is related to the perlfaq answer to Why am I getting long decimals (eg, 19.9499999999999) instead of the numbers I should be getting (eg, 19.95)?, but a bit more involved. You’ll see how to skip the whole mess at the end, but be patient. Continue reading “Know how Perl handles scientific notation in string to number conversions.”
Slides for “Effective Perl: Unicode” at Frozen Perl 2010
At Frozen Perl I did a quick presentation about Unicode and Perl. I had to do some work on the slides before releasing them publicly, but here they are… Be sure to look at the author notes if you want more detailed information.
Watch out for disappearing strings when you decode
In the Effective Perl class I gave at Frozen Perl last week, I got a question I didn’t have the quick answer to. What happens to the strings when Encode's
decode
function only partially decodes the string?
The default behavior for decode
always decodes the entire string, although it uses substitution character (0xFFFD, which may look like ? on the screen) anywhere that it finds an error in the encoding:
You can change how decode
handles problems by supplying a third argument to it, using one of the constants FB_DEFAULT
,
use 5.010; use strict; use warnings; use Encode qw(decode :fallbacks); binmode STDOUT, ":utf8"; foreach my $fallback ( qw( FB_DEFAULT FB_CROAK FB_WARN FB_QUIET ) ) { my $fallback_value = do { no strict 'refs'; &{"$fallback"} }; my $octets = do { use bytes; "\x41\x42\x43\x61\xCC\x61\x41\x42\x43" }; my $decoded = eval { decode( 'utf8', $octets, $fallback_value ) }; say "$fallback: ", show_chars( $decoded ), " [$octets]"; } sub show_chars { use bytes; defined $_[0] ? join( ':', map { sprintf "%X", ord } split //, $_[0] ) : 'undefined'; }
The string you’re using is "\x41\x42\x43\x61\xCC\x61\x41\x42\x43"
. It’s “ABCa.aABC” where that "\xCC"
in the middle is an error. It’s the starting a combining character but it doesn’t have a valid octet following it. When you print it, it looks a bit odd (ABCaÃŒaABC
) because Perl is treating it as bytes since you used use bytes;
in the scope that you created it.
The output shows the fallback type, the characters (in hex separated by colons), and in the braces, the value of $octets
after the operation:
FB_DEFAULT: 41:42:43:61:FFFD:61:41:42:43 [ABCaÃŒaABC] FB_CROAK: undefined [ABCaÃŒaABC] FB_WARN: 41:42:43:61 [ÃŒaABC] FB_QUIET: 41:42:43:61 [ÃŒaABC] utf8 "\xCC" does not map to Unicode at ...
In the FB_DEFAULT
case, the \xCC
turned into the substitution character, \xFFFD
. Notice that the split //
worked on characters, so the two-byte substitution character has four letters in the hex representation.
In the FB_CROAK
case, the decode
dies, the return value is undef
, and $octets
stays the same. decode
doesn’t mess with the argument at all.
Both FB_WARN
and FB_QUIET
do the same thing, although FB_WARN
whines about it. They each tell decode
to handle as much of the string as it can. When it finds an error, it returns what it had so far (represented by 41:42:43:61
, which is ABCa
). However, it also removes that part from the input string, leaving only the part of the string from the error onward. This gives you a chance to examine the string where decode
left off so you can decide what to do on your own. You might take off offending bits and start the processing again.
It’s documented that decode
changes its input, but not right next to the main documentation for that function. You have to read the “Handling Malformed Data” section later in the Encode docs.
You might notice the problem if you try to decode
a string literal:
use Encode qw(decode :fallbacks); my $decoded = decode( 'utf8', "\x61\xCC\x61", FB_WARN );
You get the error about modifying a read-only value:
Modification of a read-only value attempted ...
If you don’t want decode
to mess with your argument, you can use a bitmask to adjust the fallback value. decode
looks for the LEAVE_SRC
bit to be set (and it only matters for FB_WARN
and FB_QUIET
), so just OR it away:
use Encode qw(decode :fallbacks LEAVE_SRC); my $decoded = decode( 'utf8', "\x61\xCC\x61", FB_WARN | LEAVE_SRC );
If you want to keep the original octet sequence, save a copy before you pass it to decode
.
Manage your Perl modules with git
In Item 110: Compile and install your own perls, you saw how to install multiple versions of perl
and to maintain each of the installations separately. Doing something with one version of Perl doesn’t affect any of the other versions.
You can take that a step further. Within each installation, you can use a source control system to manage your Perl modules. In this post you’ll use git, which has the advantage that you don’t need a server.
First, install your perl
into its own directory:
% ./Configure -des -Dprefix=/usr/local/perls/perl-5.10.1 % make test % make install
Second, before you do anything else with your newly installed perl
, put your new directory into source control:
% cd /usr/local/perls/perl-5.10.1 % git init % git add . % git commit -a -m "Initial installation of Perl 5.10.1"
You’re not quite done there, though. You’re on the master branch:
% git branch * master
You want to keep at least one pristine branch that is the initial state of your perl
installation. You can always come back to it:
% git checkout -b pristine Switched to a new branch 'pristine'
Leave that branch alone and switch back to master:
% git checkout master Switched to branch 'master'
From here you can do many things, but you probably want to consider the master
branch your “stable” branch. You don’t want to commit anything to that branch until you know it works. When you install new modules, use a different branch until you know you want to keep them:
% git checkout -b unstable Switched to a new branch 'unstable' % cpan LWP::Simple % git add . % git commit -a -m "* Installed LWP::Simple"
After using your newly installed modules for awhile and deciding that it’s stable, merge your unstable
with master
. Once merged, switch back to the unstable
branch to repeat the process:
% git checkout master Switched to a new branch 'master' % git merge unstable % git checkout unstable
Anytime that you want to start working with a clean installation, you start at the pristine
branch and make a new branch from there:
% git checkout pristine % git checkout -b newbranch
If you aren’t tracking your perl
in source control already, just tracking a master
and unstable
branch can give you an immediate benefit. However, you can take this idea a step further.
With just one perl
installation, you can create multiple branches to try out different module installations. Instead of merging these branches, you keep them separate. When you want to test your application with a certain set of modules, you merely switch to that branch and run your tests. When you want to test against a different set, change branches again. That can be quite a bit simpler than managing multiple directories that you have to constantly add or remove from @INC
.
Know what creates a scope
Scopes can be confusing. Perl 5 introduced lexical, or my
, variables that are only visible in the scope in which you define them. To properly scope your variables, you need to know what can define a scope and what doesn’t.
You commonly see lexical variables for subroutine arguments, for instance:
sub foo { my( $self, @args ) = @_; ...; }
The variables $self
and @args
don’t exist outside of that subroutine (ignoring black magic with things such as PadWalker). Lexicals variables have limited effect and no action at a distance, making them invaluable for robust programming. Not only that, but since the lexical variable names only matter in their scope, you don’t have to know about all of the variables that you have already defined to choose variable names in your scope.
Before Perl 5, all variables were package variables (so, global). Perl 5 couldn’t just ignore all of the existing Perl 4 programs, so it ended up supporting both the global package variables and lexical variables. That can make things confusing if you don’t understand the difference.
First, you need to know what makes a scope. Most people can give you at least one answer: a block creates a scope. Blocks show up in the syntax of many of Perl’s commonly used features:
# a subroutine definition block, perhaps anonymous sub foo { ... } my $foo = sub { ... }; # blocks for control stuctures foreach ( @array ) { ... } while( $condition ) { ... } if( $condition ) { ... } # blocks related to functions: my $result = do { ... }; my @transformed = map { ... } @input; my @filtered = grep { ... } @input; # blocks in regular expressions m/(?{...})/
Sometimes you can create the lexical variable outside of the block even though it’s scoped to the block. You can declare the lexical variable in the the test for while
or
foreach my $index ( 0 .. 5 ) { print "index: $index\n"; } while( my $line = <DATA> ) { print "line: $line"; } if( my $foo = 'abc' ) { print "foo is $foo\n"; }
You don’t need a control structure or operator to use a block to define the scope. You can use a bare block to create a scope:
# bare blocks { my $cat = 'Buster'; ...; }
Most Perler’s could identify blocks as scope definers, but there’s another scope definer that many people miss. File this away for your job interview trivia: a file is a scope too. You can’t see lexical variables outside of the file in which you define them, even if you don’t explicitly create the scope with a block. It’s as if there is a virtual block around the entire file.
You can use the file scope to create private class variables. The methods you define in the same file can see the private variables, but code in other files, such as subclasses, can’t mess with them:
package Some::Class; my $private = 0; # only visible in this file sub some_method { ...; # can see $private }
If you want other parts of the program to get or set the value in this private variable despite its scope, you can provide accessor methods. This gives you a chance to head off any shenanigans before you allow someone to change the value:
package Some::Class; my $private = 0; sub get_private { $private } sub set_private { $private = $_[1] }
Some people extend the idea of private class variables too far because they think that a package creates a scope. It doesn’t. A package merely defines the default package unless you explicitly specify one. Since lexical variables aren’t connected to packages, they don’t care want the current package is. If you change the package, even if it’s in another block:
package Some::Class; my $n = 'Can you see me?'; { package main; # $n still visible here } package Some::Class::Subclass; # $n still visible
There are some more tricks with scopes and what constitutes a scoped variable, but that’s a matter for a future Item.
Things to remember
- Lexical variables are only visible in their scope.
- A block defines a scope.
- A file defines a scope.
- A package does not define a scope.