Look up Unicode properties with an inversion map

Perl comes with extracts of the Unicode character data, but it hasn’t been easy to look up all of the information Perl knows about a character. Perl v5.15.7 adds a way to created an inverted map based on the property that you want to access.

Continue reading “Look up Unicode properties with an inversion map”

Use __SUB__ to get a reference to the current subroutine

What if you want to write a recursive subroutine but you don’t know the name of the current subroutine? Since Perl is a dynamic language and code references are first class objects, you might not know the name of the code reference, if it even has a name. Perl 5.16 introduces __SUB__ as a special sequence to return a reference to the current subroutine. You could almost do the same thing without the new feature, but each of those have drawbacks you might want to avoid. Continue reading “Use __SUB__ to get a reference to the current subroutine”

Define grammars in regular expressions

[ This is the 100th Item we’ve shared with you in the two years this blog has been around. We deserve a holiday and we’re taking it, so read us next year! Happy Holidays.]

Perl 5.10 added rudimentary grammar support in its regular expressions. You could define many subpatterns directly in your pattern, use them to define larger subpatterns, and, finally, when you have everything in place, let Perl do the work. Continue reading “Define grammars in regular expressions”

Loose match Unicode character names

The charnames module can now handle loose name matching, as outlined in Unicode Standard Annex #44. This accounts for the various ways people are abusing things.

Consider the character 😻, (U+1F63B SMILING CAT FACE WITH HEART-SHAPED EYES). If you want to interpolate that into a string, you have to use the exact name:

use v5.16;
use open qw(:std :utf8);

say "\N{SMILING CAT FACE WITH HEART-SHAPED EYES}";

Starting with v5.16, the \N{} in a double-quoted string automatically imports :long and :short. There’s another one that you can import yourself, but it’s a bit costly.

Some people don’t like all uppercase strings, so they might want to type it out as title or lowercase:

use v5.16;
use open qw(:std :utf8);

say "\N{Smiling Cat Face With Heart-Shaped Eyes}";

That doesn’t work and you get an error:

Unknown charname 'Smiling Cat Face With Heart-Shaped Eyes'

Import :loose from charnames and it will works:

use v5.16;
use open qw(:std :utf8);
use charnames qw(:loose);

say "\N{Smiling Cat Face With Heart-Shaped Eyes}";

The loose naming rules involve three things, which makes the loose matching slow:

  • Ignore case folding
  • Ignore whitespace
  • Ignore “medial” hyphens (letters on either side)

So all of these work, even the one with consecutive hyphens:

use v5.16;
use open qw(:std :utf8);
use charnames qw(:loose);

say "\N{Smiling Cat Face With Heart Shaped Eyes}";
say "\N{SmilingCatFaceWithHeartShapedEyes}";
say "\N{Smiling-Cat-Face-With-Heart-Shaped-Eyes}";
say "\N{Smiling----Cat-Face-----With-Heart-----Shaped-Eyes}";

Some problematic names

This doesn’t work out well for some names, and Perl developer Karl Williamson made some comments about this to the Unicode Consortium in 2010. There are some names that have hyphens next to whitespace (so, not medial hyphens), but if you ignore whitespace first, then the hyphen isn’t next to whitespace.

Not only that, removing the hyphen can turn it into a character’s name into that for a completely different character:

  • U+0F68 TIBETAN LETTER A
  • U+0F60 TIBETAN LETTER -A
  • U+0FB8 TIBETAN SUBJOINED LETTER A
  • U+0FB0 TIBETAN SUBJOINED LETTER -A
  • U+116C HANGUL JUNGSEONG OE
  • U+1180 HANGUL JUNGSEONG O-E

Create your own dualvars

Perl’s basic data type is the scalar, which takes its name from the mathematical term for “single item”. However, the scalar is really two things. You probably know that a scalar can be either a number or a string, or a number that looks the same as its string, or a string that can be a number. What you probably don’t know is that a scalar can be two separate and unrelated values at the same time, making it a dualvar. Continue reading “Create your own dualvars”

Make disposable web servers for testing

If you project depends on a interaction with a web server, especially a remote one, you have some challenges with testing that portion. Even if you can get it working for you, when you distribute your code, someone else might not be able to reach your server for testing. Instead of relying on an external server, you can use a local server that you write especially for your test suite. Continue reading “Make disposable web servers for testing”