[This is a mid-week bonus item, and it’s a bit of a departure from much of what you have already seen on this blog. This is just some code that I had to write this week and I thought you’d like to see it.]
I had to find some dates inside a big string, and the problem with dates is that there are some many ways to write them, and even if I get the format right, some of the machines might use another locale. My string comes from an ls
I run as a remote command, which might show the date in one of two formats. The files changed in the last six months replaces the year with the time:
$ ls -l total 7400 -rw-r--r--@ 1 brian staff 433 Jun 22 2010 Makefile -rw-r--r--@ 1 brian staff 107721 Jan 19 09:08 appa.xml -rw-rw-r--@ 1 brian staff 76873 Jan 19 00:18 appb.xml -rw-rw-r-- 1 brian staff 1802 Jan 14 21:17 book.xml -rw-rw-r-- 1 brian staff 2457812 Jul 21 2010 book.xml.pdf -rw-rw-r-- 1 brian staff 4360 Jul 21 2010 bookinfo.xml -rw-r--r--@ 1 brian staff 25626 Jan 19 09:07 ch00.xml
Here’s the program I wrote to figure out which parts of that string is the dates, using Regexp::Common (Item 42. Don’t reinvent the regex):
use Regexp::Common qw(time); my @lines = `ls -l`; # May 3 2010 # Jan 17 18:21 $date_re = $RE{time}{strftime}{ -pat => '%b\s+%_d\s+(?:%Y|%_H:%M)' }; foreach my $line ( @lines ) { next unless $line =~ /$date_re/; my $start = $-[0]; my $stop = $+[0]; my $underline = ( ' ' x $-[0] ) . ( '^' x ($stop - $start) ); print $line; print $underline, "\n\n"; }
That regex is more sophisticated than it looks. I didn’t have to do anything to deal with month names and abbreviations, but the module will figure it out for me based on the locale of the machine on which I run the command. The regex changes depending on the language that I decide to use:
$ LC_ALL=tr_TR perl -MRegexp::Common=time -le 'print $RE{time}{strftime}{-pat=>"%b"}' (?:(?=[AOTNKEHM\Å])(?>Oca|\Å\ub|Mar|Nis|May|Haz|Tem|A\Ä\u|Eyl|Eki|Kas|Ara))
$ LC_ALL=en_US.UTF-8 perl -MRegexp::Common=time -le 'print $RE{time}{strftime}{-pat=>"%b"}' (?:(?=[SAFOJNMD])(?>Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec))
$ LC_ALL=es_ES.UTF-8 perl -MRegexp::Common=time -le 'print $RE{time}{strftime}{-pat=>"%b"}' (?:(?=[enadmsjfo])(?>ene|feb|mar|abr|may|jun|jul|ago|sep|oct|nov|dic))
Part of my code demonstrated that it found the date part of the string by underlining what it thought the date portion was. That’s all the fooling around with the @-
and @+
special variables. Those are the string locations for the start and end positions of the various capture buffers. The numbers in index 0 applies to $&
, the index 1 applies to $1
, and so on:
-rw-r--r--@ 1 brian staff 433 Jun 22 2010 Makefile ^^^^^^^^^^^^ -rw-r--r--@ 1 brian staff 107721 Jan 19 09:08 appa.xml ^^^^^^^^^^^^ -rw-rw-r--@ 1 brian staff 76873 Jan 19 00:18 appb.xml ^^^^^^^^^^^^ -rw-rw-r-- 1 brian staff 1802 Jan 14 21:17 book.xml ^^^^^^^^^^^^ -rw-rw-r-- 1 brian staff 2457812 Jul 21 2010 book.xml.pdf ^^^^^^^^^^^^ -rw-rw-r-- 1 brian staff 4360 Jul 21 2010 bookinfo.xml ^^^^^^^^^^^^ -rw-r--r--@ 1 brian staff 25626 Jan 19 09:07 ch00.xml ^^^^^^^^^^^^
This code also has to work on systems with very ancient versions of ls
. There are some switches that could have made this code much easier, especially if I can make the date column the epoch time instead do it’s not a combination of whitespace-separated fields itself.
- The
-T
switch on Mac OS X and FreeBSD displays all dates in the same format, even for the recently changed ones. - Linux versions might have the
--time-style
. - FreeBSD has the
-D
switch to specify the date format.
I’d much rather use perl
, but the equivalent is much uglier even though I can choose my field separator. Perl is ultra-portable and available in most places, but I have to do more work on a one-liner:
$ perl -le 'for(glob(q|*|)){print join qq|\t|, stat(), $_}'
However, this causes headaches later when I need to run this as a remote command and I still have to process the results to turn the data into human-readable output. The ls -l
is much nicer without requiring more work than I’d do normally.
And, as a bonus to this bonus, I discovered that Date::Parse is smart enough to deal with a date like Dec 31 12:34
. It realizes that it was last December, not the one from the current year. I can feed both formats into that module and still have the dates sort correctly.