book – Page 8 – The Effective Perler

The Perl 5.12 yada yada operator

Perl v5.12 adds a placeholder operator, ..., called the yada yada operator, after an episode of Seinfeld where the interesting parts of the story are replaced with “yada yada yada”. Continue reading “The Perl 5.12 yada yada operator”

Declare your Pod encoding

Pod::Simple 3.21 changed its behavior when it encountered a non-ASCII character in Pod without an encoding. Instead of handling it quietly, it now gives a warning. That’s not so bad, but Test::Pod uses Pod::Simple, and whenever it sees a warning, pod_ok fails, as it did in my Mac::Errors module: Continue reading “Declare your Pod encoding”

Hide namespaces from PAUSE

The Perl Authors Upload Server (PAUSE) is responsible for analyzing distributions on their way to CPAN. PAUSE indexes the distributions to discover the package names that it contains so it can add them to the data files that many of the CPAN clients use to figure out what to download to install the module that you request. It also compares the package names that it finds to a list of permissions it maintains. Continue reading “Hide namespaces from PAUSE”

Don’t use auto-dereferencing with each or keys

[Update: Perl v5.24 removes this experimental feature, for the reasons I list, among others.]

Perl 5.14 added an auto-dereferencing features to the hash and array operators, and I wrote about those in Use array references with the array operators. I’ve never particularly liked that feature, but I don’t have to like everything. Additionally, Perl 5.12 expanded the job of keys and values to also work on arrays. Continue reading “Don’t use auto-dereferencing with each or keys”

Look up Unicode properties with an inversion map

Perl comes with extracts of the Unicode character data, but it hasn’t been easy to look up all of the information Perl knows about a character. Perl v5.15.7 adds a way to created an inverted map based on the property that you want to access.

Continue reading “Look up Unicode properties with an inversion map”

Fold cases properly

You might think that you know how to compare strings regardless of case, and you’re probably wrong. After you read this Item, you’ll be able to do it correctly and without doing any more work than you were doing before. Perl handles all the details for you. Continue reading “Fold cases properly”

Use SUB to get a reference to the current subroutine

What if you want to write a recursive subroutine but you don’t know the name of the current subroutine? Since Perl is a dynamic language and code references are first class objects, you might not know the name of the code reference, if it even has a name. Perl 5.16 introduces __SUB__ as a special sequence to return a reference to the current subroutine. You could almost do the same thing without the new feature, but each of those have drawbacks you might want to avoid. Continue reading “Use __SUB__ to get a reference to the current subroutine”

Understand the order of operations in double quoted contexts

Perl’s powerful string manipulation tools include case-shifting operators that change the parts of a double-quoted string. There are many other things that happen in a double-quoted string too, so you need to know where these operators fit in with each other. Continue reading “Understand the order of operations in double quoted contexts”

Define grammars in regular expressions

[ This is the 100th Item we’ve shared with you in the two years this blog has been around. We deserve a holiday and we’re taking it, so read us next year! Happy Holidays.]

Perl 5.10 added rudimentary grammar support in its regular expressions. You could define many subpatterns directly in your pattern, use them to define larger subpatterns, and, finally, when you have everything in place, let Perl do the work. Continue reading “Define grammars in regular expressions”

Loose match Unicode character names

The charnames module can now handle loose name matching, as outlined in Unicode Standard Annex #44. This accounts for the various ways people are abusing things.

Consider the character 😻, (U+1F63B SMILING CAT FACE WITH HEART-SHAPED EYES). If you want to interpolate that into a string, you have to use the exact name:

use v5.16;
use open qw(:std :utf8);

say "\N{SMILING CAT FACE WITH HEART-SHAPED EYES}";

Starting with v5.16, the \N{} in a double-quoted string automatically imports :long and :short. There’s another one that you can import yourself, but it’s a bit costly.

Some people don’t like all uppercase strings, so they might want to type it out as title or lowercase:

use v5.16;
use open qw(:std :utf8);

say "\N{Smiling Cat Face With Heart-Shaped Eyes}";

That doesn’t work and you get an error:

Unknown charname 'Smiling Cat Face With Heart-Shaped Eyes'

Import :loose from charnames and it will works:

use v5.16;
use open qw(:std :utf8);
use charnames qw(:loose);

say "\N{Smiling Cat Face With Heart-Shaped Eyes}";

The loose naming rules involve three things, which makes the loose matching slow:

Ignore case folding
Ignore whitespace
Ignore “medial” hyphens (letters on either side)

So all of these work, even the one with consecutive hyphens:

use v5.16;
use open qw(:std :utf8);
use charnames qw(:loose);

say "\N{Smiling Cat Face With Heart Shaped Eyes}";
say "\N{SmilingCatFaceWithHeartShapedEyes}";
say "\N{Smiling-Cat-Face-With-Heart-Shaped-Eyes}";
say "\N{Smiling----Cat-Face-----With-Heart-----Shaped-Eyes}";

Some problematic names

This doesn’t work out well for some names, and Perl developer Karl Williamson made some comments about this to the Unicode Consortium in 2010. There are some names that have hyphens next to whitespace (so, not medial hyphens), but if you ignore whitespace first, then the hyphen isn’t next to whitespace.

Not only that, removing the hyphen can turn it into a character’s name into that for a completely different character:

U+0F68 TIBETAN LETTER A
U+0F60 TIBETAN LETTER -A
U+0FB8 TIBETAN SUBJOINED LETTER A
U+0FB0 TIBETAN SUBJOINED LETTER -A
U+116C HANGUL JUNGSEONG OE
U+1180 HANGUL JUNGSEONG O-E