A Unicode character has properties; it knows things about itself. Perl v5.10 introduced a way to match a character that has certain properties that v5.10 supports. In some cases you can match a particular property value. Now v5.12 allows you can match any Unicode property by its value. The newly-supported ones include Numeric_Value
and Age
, for example:
\p{Numeric_Value: 1} \p{Nv=7} \p{Age: 3.0}
Here’s a program to match the numeric value you specify in the command line (interpolation happens first, then regex parsing):
use v5.12; use open qw(:std :utf8); use Unicode::UCD; say "Unicode ", Unicode::UCD::UnicodeVersion(); foreach ( 1 .. 0x10fffd ) { next unless chr =~ m/ \p{Numeric_Value: $ARGV[0]} /x; printf "%s (U+%04X)\n", chr, $_; }
You can find all the characters that have a numeric value of 3. There are 96 matches (in Unicode 5.2.0 at least):
$ perl pnv.pl 3 Unicode 5.2.0 3 (U+0033) ³ (U+00B3) ٣ (U+0663) ۳ (U+06F3) ߃ (U+07C3) ३ (U+0969) ...
You’re not limited to single decimal digits either. Some characters have numeric values greater than 10:
$ perl pnv.pl 11 Unicode 5.2.0 Ⅺ (U+216A) ⅺ (U+217A) ⑪ (U+246A) ⑾ (U+247E) ⒒ (U+2492) ⓫ (U+24EB)
The highest value I found is 100,000:
$ perl pnv.pl 100000 Unicode 5.2.0 ↈ (U+2188)
If you use a value that isn’t known for that property, you get an error:
$ perl5.12.5 pnv.pl -3 Unicode 5.2.0 Can't find Unicode property definition "Numeric_Value: -3" at ...
You can pre-empt that by constructing the pattern ahead of time and noticing the problem before you go through the code numbers:
use v5.10; use open qw(:std :utf8); use Unicode::UCD; say "Unicode ", Unicode::UCD::UnicodeVersion(); my $pattern = eval { qr| \p{Numeric_Value: $ARGV[0]} |x }; die "Invalid pattern for <$ARGV[0]>!\n" unless $pattern; foreach ( 1 .. 0x10fffd ) { next unless eval chr =~ $pattern; printf "%s (U+%04X)\n", chr, $_; }
In character classes
Use these property values in a character class to match one of several numeric values:
use v5.12; use open qw(:std :utf8); use Unicode::UCD; say "Unicode ", Unicode::UCD::UnicodeVersion(); foreach ( 1 .. 0x10fffd ) { next unless chr =~ m/ [\p{nv=1}\p{nv=3}\p{nv=7}] /x; printf "%s (U+%04X)\n", chr, $_; }
Now I match the 262 characters with one of those numeric values:
$ perl5.12.5 pnv.pl | more Unicode 5.2.0 1 (U+0031) 3 (U+0033) 7 (U+0037) ³ (U+00B3) ¹ (U+00B9) ١ (U+0661) ٣ (U+0663) ...
Some of these characters have numeric values but aren’t “numbers” in the Perl sense. Try adding the superscript 3 and the superscript 1 and you don’t get superscript 4 (wouldn’t that be nice?):
$ perl -Mutf8 -le 'print "³" + "¹"' 0