Hide namespaces from PAUSE

The Perl Authors Upload Server (PAUSE) is responsible for analyzing distributions on their way to CPAN. PAUSE indexes the distributions to discover the package names that it contains so it can add them to the data files that many of the CPAN clients use to figure out what to download to install the module that you request. It also compares the package names that it finds to a list of permissions it maintains. Continue reading “Hide namespaces from PAUSE”

Don’t use auto-dereferencing with each or keys

[Update: Perl v5.24 removes this experimental feature, for the reasons I list, among others.]

Perl 5.14 added an auto-dereferencing features to the hash and array operators, and I wrote about those in Use array references with the array operators. I’ve never particularly liked that feature, but I don’t have to like everything. Additionally, Perl 5.12 expanded the job of keys and values to also work on arrays. Continue reading “Don’t use auto-dereferencing with each or keys”

Look up Unicode properties with an inversion map

Perl comes with extracts of the Unicode character data, but it hasn’t been easy to look up all of the information Perl knows about a character. Perl v5.15.7 adds a way to created an inverted map based on the property that you want to access.

Continue reading “Look up Unicode properties with an inversion map”

Use __SUB__ to get a reference to the current subroutine

What if you want to write a recursive subroutine but you don’t know the name of the current subroutine? Since Perl is a dynamic language and code references are first class objects, you might not know the name of the code reference, if it even has a name. Perl 5.16 introduces __SUB__ as a special sequence to return a reference to the current subroutine. You could almost do the same thing without the new feature, but each of those have drawbacks you might want to avoid. Continue reading “Use __SUB__ to get a reference to the current subroutine”

Define grammars in regular expressions

[ This is the 100th Item we’ve shared with you in the two years this blog has been around. We deserve a holiday and we’re taking it, so read us next year! Happy Holidays.]

Perl 5.10 added rudimentary grammar support in its regular expressions. You could define many subpatterns directly in your pattern, use them to define larger subpatterns, and, finally, when you have everything in place, let Perl do the work. Continue reading “Define grammars in regular expressions”

Loose match Unicode character names

The charnames module can now handle loose name matching, as outlined in Unicode Standard Annex #44. This accounts for the various ways people are abusing things.

Consider the character 😻, (U+1F63B SMILING CAT FACE WITH HEART-SHAPED EYES). If you want to interpolate that into a string, you have to use the exact name:

use v5.16;
use open qw(:std :utf8);

say "\N{SMILING CAT FACE WITH HEART-SHAPED EYES}";

Starting with v5.16, the \N{} in a double-quoted string automatically imports :long and :short. There’s another one that you can import yourself, but it’s a bit costly.

Some people don’t like all uppercase strings, so they might want to type it out as title or lowercase:

use v5.16;
use open qw(:std :utf8);

say "\N{Smiling Cat Face With Heart-Shaped Eyes}";

That doesn’t work and you get an error:

Unknown charname 'Smiling Cat Face With Heart-Shaped Eyes'

Import :loose from charnames and it will works:

use v5.16;
use open qw(:std :utf8);
use charnames qw(:loose);

say "\N{Smiling Cat Face With Heart-Shaped Eyes}";

The loose naming rules involve three things, which makes the loose matching slow:

  • Ignore case folding
  • Ignore whitespace
  • Ignore “medial” hyphens (letters on either side)

So all of these work, even the one with consecutive hyphens:

use v5.16;
use open qw(:std :utf8);
use charnames qw(:loose);

say "\N{Smiling Cat Face With Heart Shaped Eyes}";
say "\N{SmilingCatFaceWithHeartShapedEyes}";
say "\N{Smiling-Cat-Face-With-Heart-Shaped-Eyes}";
say "\N{Smiling----Cat-Face-----With-Heart-----Shaped-Eyes}";

Some problematic names

This doesn’t work out well for some names, and Perl developer Karl Williamson made some comments about this to the Unicode Consortium in 2010. There are some names that have hyphens next to whitespace (so, not medial hyphens), but if you ignore whitespace first, then the hyphen isn’t next to whitespace.

Not only that, removing the hyphen can turn it into a character’s name into that for a completely different character:

  • U+0F68 TIBETAN LETTER A
  • U+0F60 TIBETAN LETTER -A
  • U+0FB8 TIBETAN SUBJOINED LETTER A
  • U+0FB0 TIBETAN SUBJOINED LETTER -A
  • U+116C HANGUL JUNGSEONG OE
  • U+1180 HANGUL JUNGSEONG O-E