Up to v5.18, the vertical tab wasn’t part of the \s
character class shortcut for ASCII whitespace. No one really knows why. It was curious trivia that I pointed out in Know your character classes under different semantics. Whitespace in ASCII, POSIX, and Unicode represented different sets. Perl whitespace was different from POSIX whitespace by only the exclusion of the vertical tab. Now that little oversight is fixed.
I had this program to mark which sets matched which characters. I required v5.10 because that’s the first appearance of the \h
and \v
shortcuts for horizontal and vertical whitespace.
use 5.010; use charnames qw(:full); print <<"LEGEND"; s matches \\s, matches Perl whitespace h matches \\h, horizontal whitespace v matches \\v, vertical whitespace p matches [[:space:]], POSIX whitespace all characters match Unicode whitespace, \\p{Space} LEGEND printf qq(%s %s %s %s %-7s --> %s\n), qw( s h v p Ordinal Name ); print '-' x 50, "\n"; foreach my $ord ( 0 .. 0x10ffff ) { next unless chr($ord) =~ /\p{Space}/; my( $s, $h, $v, $posix ) = map { chr($ord) =~ m/$_/ ? 'x' : ' ' } ( qr/\s/, qr/\h/, qr/\v/, qr/[[:space:]]/ ); printf qq(%s %s %s %s 0x%04X --> %s\n), $s, $h, $v, $posix, $ord, charnames::viacode($ord); }
Under v5.10, the top of the output showed that \s
did not include the vertical tab, which the UCS names LINE TABULATION.
$ perl5.10.1 spaces s matches \s, matches Perl whitespace h matches \h, horizontal whitespace v matches \v, vertical whitespace p matches [[:space:]], POSIX whitespace all characters match Unicode whitespace, \p{Space} s h v p Ordinal --> Name -------------------------------------------------- x x x 0x0009 --> CHARACTER TABULATION x x x 0x000A --> LINE FEED x x 0x000B --> LINE TABULATION x x x 0x000C --> FORM FEED x x x 0x000D --> CARRIAGE RETURN x x x 0x0020 --> SPACE
Run under v5.18, the output changes slightly to have another x
in the third row (line 12).
$ perl5.18.0 spaces s matches \s, matches Perl whitespace h matches \h, horizontal whitespace v matches \v, vertical whitespace p matches [[:space:]], POSIX whitespace all characters match Unicode whitespace, \p{Space} s h v p Ordinal --> Name -------------------------------------------------- x x x 0x0009 --> CHARACTER TABULATION x x x 0x000A --> LINE FEED x x x 0x000B --> LINE TABULATION x x x 0x000C --> FORM FEED x x x 0x000D --> CARRIAGE RETURN x x x 0x0020 --> SPACE
I don’t foresee this breaking anything since the vertical tab seems to be a rare character, although in ETL I liked using it as a separator since I figured no one else would be using it.