In Ignore part of a substitution’s match, I showed you the match resetting \K
—it’s basically a variable-width positive lookbehind assertion. It’s a special feature to work around Perl’s lack of variable-width lookbehinds. However, v5.30 adds an experimental feature to allow a limited version of a variable-width lookbehind.
In that post, I used the example of matching variable whitespace at the beginning of a line but not replacing it:
$_ = " buster"; # lowercase! s/\A\s+\K\S+\z/\u$&/; print "Line |$_|"; # Line | Buster|
Before v5.30, a positive lookbehind fails to compile:
$_ = " buster"; # lowercase! s/(?<=\A\s+)\S+\z/\u$&/; print "Line |$_|"; # Line | Buster|
Running this on v5.28 fails:
$ perl5.28.0 vlb.pl Variable length lookbehind not implemented in regex m/(?<=\A\s+)\S+\z/ at ...
Running this on v5.30 gives a different error:
$ perl5.30.0 vlb.pl Lookbehind longer than 255 not implemented in regex m/(?<=\A\s+)\S+\z/ at ...
That's the limitation. The regex engine needs to know in advance that the length of the subpattern won't be longer than 255. The +
has an indeterminate length. Instead of the +
, use the generalized quantifier so you can specify a maximum number—255 levels of indent should be enough for anyone:
use v5.30; $_ = " buster"; # lowercase! s/(?<=\A\s{1,255})\S+\z/\u$&/; print "Line |$_|"; # Line | Buster|
This works because static analysis can tell that the pattern cannot match more than 255 characters. You do get a warning:
Line | Buster| Variable length lookbehind is experimental in regex; marked by <-- HERE in m/(?<=\A\s{1,255})\S+\z <-- HERE / at ...
Turn that experimental warning like you would other experimental warnings:
use v5.30; no warnings qw(experimental::vlb); $_ = " buster"; # lowercase! s/(?<=\A\s{1,255})\S+\z/\u$&/; print "Line |$_|"; # Line | Buster|
There's one more thing to consider though. Some characters turn into multiple characters with case folding, as you read in Fold cases properly with ß (U+00DF LATIN SMALL LETTER SHARP S) that turns into ss. If you use /i
for case insensitivity, Perl knows that this happens and counts the final number of characters in the 255 limit.
use v5.30; no warnings qw(experimental::vlb); $_ = " buster"; # lowercase! s/(?<=\A\s{1,253}ß)\S+\z/\u$&/i; print "Line |$_|"; # Line | Buster|
All of this works for either positive or negative lookbehinds.
Things to remember
- Before v5.30, you could not have variable-width lookbehinds
- v5.30 adds limited support for variable-width lookbehinds
- The lookbehind subpattern must not be able to match more than 255 characters
- If you can't determine the length of the sub pattern match, you can still use
\K