The perl
interpreter is getting much better with its Unicode support, but that doesn’t mean everything just works because most of the code you probably are about is in modules, which might not have kept up. Some of this becomes apparent when you give another module some Unicode strings for it to output.
For instance, the latest release of Test::Builder module has a known issue with filehandle layers (Google Code or Github). Write a short Test::More
program and use some wide characters (those with code numbers above 255). Since your source code has these literal wide characters, you need to utf8
to let perl
know that it should interpret your source as UTF-8 instead of Latin-1 (Item 72. Use Unicode in your source code):
use Test::More; # 0.98 or less use utf8; ok( 1, 'My ☃ is melting!' ); done_testing();
When you run this, using Test::More
0.98 or lower, you get wide character warnings:
Wide character in print at /usr/local/perls/perl-5.14.1/lib/5.14.1/Test/Builder.pm line 1759. ok 1 - My ☃ is melting! 1..1
This is a problem only with the Test::Builder
that comes with Test-Simple-0.98
or lower; Test-Simple-2.00
and later, a big rewrite of most of the underpinnings, solves this problem.
You might try various things to turn off those warnings, but you can’t use the warnings
pragma because that only affects the state of warnings in the lexical scope, but the warning comes from another file:
use utf8; use open IO => ':encoding(UTF-8)'; use Test::More; # 0.98 or less no warnings; # won't work ok( 1, "My ☃ is melting!" ); done_testing();
If warnings
can’t do the job, the old school Perlers might reach for $^W
, the global state of warnings, but that doesn’t work either (and it’s poor practice except in the most extreme cases):
use utf8; use open IO => ':encoding(UTF-8)'; use Test::More; # 0.98 or less $^W = 0; # won't work either ok( 1, "My ☃ is melting!" ); done_testing();
You might try to make turn on the utf8
on all filehandles (Item 73. Tell Perl which encoding to use):
use Test::More; # 0.98 or less use utf8; use open IO => ':encoding(UTF-8)'; ok( 1, "My ☃ is melting!" ); done_testing();
That appears not to work. Maybe you think that it’s an ordering problem because you load Test::More
first. That’s a common thing to do in test scripts because that’s the purpose of the code. So you switch the order, but it still doesn’t work:
use utf8; use open IO => ':encoding(UTF-8)'; use Test::More; # 0.98 or less ok( 1, "My ☃ is melting!" ); done_testing();
You can back up a step back. The problem disappears if you set the filehandles with the -C
command-line switch, using the S
to set the standard filehandles to UTF-8. However, this only works with at least Perl 5.14, so this won’t work with v5.10 or v5.12. Now you get no warnings:
$ perl5.14.1 snowman.t Wide character in print at /usr/local/perls/perl-5.14.1/lib/5.14.1/Test/Builder.pm line 1759. ok 1 - My ☃ is melting! 1..1 $ perl5.14.1 -CS snowman.t ok 1 - My ☃ is melting! 1..1
You can also set the PERL_UNICODE
environment variable to the empty string, which has the same effect as -CSDL
. This is not one of the options that we showed in Item 73. Again, this works only in Perl 5.14:
$ env PERL_UNICODE='' perl5.14.1 snowman.t ok 1 - My ☃ is melting! 1..1
That is, if perl
sets up the filehandles right away, you don’t have a problem. If you wait until the program has started, you’re out of luck. And, if you aren’t using Perl 5.14, you’re out of luck either way.
Fortunately, there’s a very easy fix because Test::Builder
, the workhorse behind Test::More
is set up in a way that lets you easily fix these sorts of problem (Item 55. Make flexible output and Hide low-level details behind an interface). You can access the builder object that Test::More
uses so you can affect them:
use utf8; use open IO => ':encoding(UTF-8)'; use Test::More; # 0.98 or less foreach my $method ( qw(output failure_output) ) { binmode Test::More->builder->$method(), ':encoding(UTF-8)'; } ok( 1, "My ☃ is melting!" ); done_testing();
Now you get no warnings across the two supported Perl versions and the latest unsupported version:
$ perl5.10.1 snowman.t ok 1 - My ☃ is melting! 1..1 $ perl5.12.2 snowman.t ok 1 - My ☃ is melting! 1..1 $ perl5.14.1 snowman.t ok 1 - My ☃ is melting! 1..1
Once Test::Builder
2.0 is a stable release and most people are using it, you might not have to play these games. However, you can’t always completely control which versions other people use, so you might have to play so version games (similar to Item 83. Limit your distributions to the right platforms):
use utf8; use open IO => ':encoding(UTF-8)'; use Test::More; if( Test::Builder->VERSION < 2 ) { foreach my $method ( qw(output failure_output) ) { binmode Test::More->builder->$method(), ':encoding(UTF-8)'; } } ok( 1, "My ☃ is melting!" ); done_testing();
That’s not pretty, but it gets the job done. Fortunately, Test::More
gives you a way to do that. For other modules, you might have to play more extreme games.
Things to remember
Test::Builder
0.98 and lower has a problem with Perl’s IO layers.Test::Builder
2.0 already fixes this issue, but there isn’t a stable release yet.- Access
Test::More
‘s builder object and set the filehandle layers that you need.
Well… thanks!