Perl v5.26 updates itself to Unicode 9. That’s not normally exciting news but people have been pretty enthusiastic about the 72 new emojis that come. As far as Perl cares, they are just valid code points like all of the other ones.
I went to Emojipedia to see what I could do. I’d already used the ๐ฆ in Learning Perl 6, although ๐ฆ would be more appropriate here. I was curious where these new emojis are in the UCD.
First, how many new characters show up in Unicode 9? I could look this up but it’s easier and more fun to do it in Perl:
my $count = 0; foreach ( 0 .. 0x10FFFF ) { my $char = chr; next if $char =~ /\p{Present_In: 8.0}/; next if $char =~ /\p{Unassigned}/; $count++; } say "There are $count new chars";
This skips anything that was present in the previous version (8.0) and then any code points that are not assigned. This gives exactly 7,500:
There are 7500 new chars
What are those other 7,428 characters? You could figure that out just as quickly. But who cares? They aren’t emojis.
The 72 new emojis are listed on Emojipedia, so you can use Mojo::UserAgent to grab those. Since these are certainly wide characters you have to do some extra setup (see the “Unicode Primer” chapter in Learning Perl or the various perluni* docs):
use v5.26; use utf8; use strict; use warnings; use open qw(:std :utf8); use charnames qw(); use Mojo::UserAgent; my $ua = Mojo::UserAgent->new; my $url = 'https://blog.emojipedia.org/new-unicode-9-emojis/'; my $tx = $ua->get( $url ); die "That didn't work!\n" unless $tx->success; say $tx->result ->dom ->find( 'ul:not( [class] ) li a' ) ->map( 'text' ) ->map( sub { my $c = substr $_, 0, 1; [ $c, ord($c), charnames::viacode( ord($c) ) ] }) ->sort( sub { $a->[1] <=> $b->[1] } ) ->map( sub { sprintf '%s (U+%05X) %s', $_->@* } ) ->join( "\n" );
From this I see the list of new emoji are not contiguous:
๐บ (U+1F57A) MAN DANCING ๐ค (U+1F5A4) BLACK HEART ๐ (U+1F6D1) OCTAGONAL SIGN ๐ (U+1F6D2) SHOPPING TROLLEY ๐ด (U+1F6F4) SCOOTER ๐ต (U+1F6F5) MOTOR SCOOTER ๐ถ (U+1F6F6) CANOE ๐ค (U+1F919) CALL ME HAND ๐ค (U+1F91A) RAISED BACK OF HAND ๐ค (U+1F91B) LEFT-FACING FIST ๐ค (U+1F91C) RIGHT-FACING FIST ๐ค (U+1F91D) HANDSHAKE ๐ค (U+1F91E) HAND WITH INDEX AND MIDDLE FINGERS CROSSED ๐ค (U+1F920) FACE WITH COWBOY HAT ๐คก (U+1F921) CLOWN FACE ๐คข (U+1F922) NAUSEATED FACE ๐คฃ (U+1F923) ROLLING ON THE FLOOR LAUGHING ๐คค (U+1F924) DROOLING FACE ๐คฅ (U+1F925) LYING FACE ๐คฆ (U+1F926) FACE PALM ๐คง (U+1F927) SNEEZING FACE ๐คฐ (U+1F930) PREGNANT WOMAN ๐คณ (U+1F933) SELFIE ๐คด (U+1F934) PRINCE ๐คต (U+1F935) MAN IN TUXEDO ๐คถ (U+1F936) MOTHER CHRISTMAS ๐คท (U+1F937) SHRUG ๐คธ (U+1F938) PERSON DOING CARTWHEEL ๐คน (U+1F939) JUGGLING ๐คบ (U+1F93A) FENCER ๐คผ (U+1F93C) WRESTLERS ๐คฝ (U+1F93D) WATER POLO ๐คพ (U+1F93E) HANDBALL ๐ฅ (U+1F940) WILTED FLOWER ๐ฅ (U+1F941) DRUM WITH DRUMSTICKS ๐ฅ (U+1F942) CLINKING GLASSES ๐ฅ (U+1F943) TUMBLER GLASS ๐ฅ (U+1F944) SPOON ๐ฅ (U+1F945) GOAL NET ๐ฅ (U+1F947) FIRST PLACE MEDAL ๐ฅ (U+1F948) SECOND PLACE MEDAL ๐ฅ (U+1F949) THIRD PLACE MEDAL ๐ฅ (U+1F94A) BOXING GLOVE ๐ฅ (U+1F94B) MARTIAL ARTS UNIFORM ๐ฅ (U+1F950) CROISSANT ๐ฅ (U+1F951) AVOCADO ๐ฅ (U+1F952) CUCUMBER ๐ฅ (U+1F953) BACON ๐ฅ (U+1F954) POTATO ๐ฅ (U+1F955) CARROT ๐ฅ (U+1F956) BAGUETTE BREAD ๐ฅ (U+1F957) GREEN SALAD ๐ฅ (U+1F958) SHALLOW PAN OF FOOD ๐ฅ (U+1F959) STUFFED FLATBREAD ๐ฅ (U+1F95A) EGG ๐ฅ (U+1F95B) GLASS OF MILK ๐ฅ (U+1F95C) PEANUTS ๐ฅ (U+1F95D) KIWIFRUIT ๐ฅ (U+1F95E) PANCAKES ๐ฆ (U+1F985) EAGLE ๐ฆ (U+1F986) DUCK ๐ฆ (U+1F987) BAT ๐ฆ (U+1F988) SHARK ๐ฆ (U+1F989) OWL ๐ฆ (U+1F98A) FOX FACE ๐ฆ (U+1F98B) BUTTERFLY ๐ฆ (U+1F98C) DEER ๐ฆ (U+1F98D) GORILLA ๐ฆ (U+1F98E) LIZARD ๐ฆ (U+1F98F) RHINOCEROS ๐ฆ (U+1F990) SHRIMP ๐ฆ (U+1F991) SQUID
Things to remember
- v5.26 updates to Unicode 9 with 7,500 new characters.
- You can check Unicode version of characters with the
Present_In
,In
, orAge
properties. ord
gets you the code number andcharnames::viacode
can use the code number to get you the code name.
I have also enjoyed exploring the universe of Unicode characters and emoji. I built this browser based search tool (which does have a Perl component for generating the data file): https://www.mclean.net.nz/ucf/
Using this tool you can search for a character by matching keywords in the description; or paste in a character to find out more details. Then explore the other characters around that one in the code chart. You can bookmark a specific character or search term and store your favourites in the scratchpad. Use the ‘Help’ button to get started.
Question: Are there official translations for the Emoji names or is English the only namespace supported? Can a Spanish programmer specify \N{CORAZON_NEGRO} or is she stuck typing \N{BLACK HEART}?
Matthew, for Unicode itself, the names are only English. See sec. 4.8 of http://www.unicode.org/versions/Unicode11.0.0/ch04.pdf — “The character names in the Unicode Standard are identical to those of the English-language edition of ISO/IEC 10646.” I don’t know about other languages of 10646.
perldoc charnames
gives instructions for creating new aliases for code points, so that a Spanish (or whatever language) programmer could create a file of whatever names they want.