Unicode character ranges have the same gotchas as the ASCII character ranges, although they become more apparent and more important. You’re probably used to creating a range for all the letters, like the character classes [A-Z]
or [a-z]
, the range 'a' .. 'z'
, or the range in a transliteration, and not having a problem. If you look at the ASCII sequence, you see that there is an unbroken series of letters in those ranges. Continue reading “Be careful with Unicode character ranges”
Author: brian d foy
Hide low-level details behind an interface
Perl 5.16 makes the Perl special variable, $$
, writeable, but with some magic. That’s the variable that holds the process ID. Why would you ever want to do that? There’s not much to write about with this new feature, but there’s plenty to write against it since it introduces more magic (see commit 9cdac2 on June 13, 2011). Continue reading “Hide low-level details behind an interface”
Perl 5.16 new features
The Perl 5 Porters are currently working on Perl 5.15, the development track that will end up as Perl 5.16. By reading the perldelta515* documentation, you can get a peek at what will mostly likely be in the next maintenance version of Perl. You need to read each of the perldelta5* in a development series because they only document the changes in to the previous development release, not the cumulative changes since the last major release.
We’ll cover these features, although we might have to stretch some of the posts a bit since some of these aren’t that radical:
- Perl v5.16 now sets proper magic on lexical $_
__SUB__
returns a reference to the current subroutine- Proper case folding with
fc
and\F
- Look up Unicode properties with an inversion map
CORE::
works on all keywordscontinue
no longer requires the “switch” feature- $$ is no longer read-only
\N{...}
can now have Unicode loose name matching- Features inside the debugger
- Breakpoints with file names
Set default regular expression modifiers
Are you tired of adding the same modifiers to all of your regular expressions? For instance, if you might always add the /u
modifier to turn on Unicode semantics on all of your patterns, including qr//
, m//
, and s///
. Instead of remembering to do that to every pattern, the re
that ships with Perl 5.14 now lets you do that for all patterns in the current lexical scope. You can also turn off a modifier for the rest of the scope. Continue reading “Set default regular expression modifiers”
Treat Unicode strings as grapheme clusters
If you need to work with Unicode strings, you probably don’t want to use Perl’s built-in string manipulation functions. This might seem a strange thing to say about a lnaguage whose main feature is string processing, but it’s a consequence of Perl’s ease in string processing.
Consider what a string is. Think of that for a moment. Write out your definition if you need to. Now, what is a string in Perl? Does it match your definition? Continue reading “Treat Unicode strings as grapheme clusters”
Set the line number and filename of string evals
Errors from a string eval can be tricky to track down since perl
doesn’t tell you where the eval was. It treats each of the string evals as a separate, virtual file because it doesn’t remember where the string argument came from. Since perl
compiles that during the run phase (see Know the phases of a Perl program’s execution), the information the compiler dragged along for filenames and line numbers is so longer around. Continue reading “Set the line number and filename of string evals”
The \R generic line ending
Perl v5.10 adds a regular expression shortcut \R
that matches anything the Unicode specification thinks is a line ending. It looks similar to a character class shortcut but it’s not. It can match the sequence of carriage-return line-feed, but character classes don’t match sequence. Continue reading “The \R generic line ending”
Use for() instead of given()
[Lexical $_
was removed in v5.24]
Perl 5.10 introduced the given-when
feature, a fancier version of the C switch
feature. However, it was poorly designed and tested and depended on two other dubious features, the lexical $_
and smart-matching. Parts of this feature are salvageable, but you should avoid the literal given
(and probably the lexical $_
and the smart matching, but I’ll skip those for this Item). Continue reading “Use for() instead of given()”
Override die with END or CORE::GLOBAL::die
Perl lets you override the effects of warn and die by redefining the signals that Perl sends when you call those functions. You probably don’t want to use the signal from die, though, since it might mean a couple of different things. Continue reading “Override die with END or CORE::GLOBAL::die”
Understand the Test Anything Protocol
The Test Anything Protocol, or just TAP, is the formalization of Perl 5’s test structure from the Test::Harness module. Either Andreas König or Tim Bunce (they don’t remember which one of them did it) created the module, but they can’t remember who did what or when. The Changes file for the Test-Harness starts in seriousness in 2006, around the time that people started working on the next generation of Perl’s testing backend, despite it existing for several years before that. Now TAP is semi-formalized (and IETF RFC is in the works) and has it’s own website at testanything.org. Continue reading “Understand the Test Anything Protocol”