Perl – Meta Interchange

The current stable version of Perl is 5.18.0 … but for very good reasons, Koha doesn’t require the latest and greatest. For a very long time, Koha required a minimum version of 5.8.8. It wasn’t until October 2011, nearly four years after Perl 5.10.0 was released, that a patch was pushed setting 5.10.0 as Koha’s minimum required version.

Why so long? Since Perl is used by a ton of core system scripts and utilities, OS packagers are reluctant to push ahead too quickly. Debian oldstable has 5.10.1 and Debian stable ships with 5.14.2. Ubuntu tracks Debian in this respect. RHEL5 ships with Perl 5.8 and won’t hit EOL until 2017.

RHEL5 takes it too far in my opinion, unless you really need that degree of stasis — and I’m personally not convinced that staying that far behind the cutting edge necessarily gives one much more in the way of the security. Then again, I don’t work for a bank. Suffice it to say, if you must run a recent version of Koha on RHEL5, you have your work cut out for you — compiling Perl from tarball or using something like Perlbrew to at least get 5.10 is a good idea. That will still leave you with rather a lot of modules to install from CPAN.

But since we, as Koha hackers, can count on having Perl 5.10, we can make the most of it. Here are a few constructs that were added in 5.10 that I find particularly useful for hacking on Koha.

Defined-OR operator

The defined-or operator, //, returns its left operand unless its value is undefined, in which case it returns the right operand. It lets you write:

my $a = get_a_possibly_undefined_value();
$a //= '';
print "Label: $a\n"; # won't throw a warning if the original value was undefined

or

my $a = get_a_possibly_undefined_value() // '';

rather than

my $a = get_a_possibly_undefined_value();
$a = '' unless defined($a);

or (horrors!)

my $a = get_a_possibly_undefined_value();
$a ||= ''; # if $a started out as 0...

Is this just syntactical sugar? Sure, but since Koha is a database-driven application whose schema has a lot of nullable columns, and since use of the Perl warnings pragma is mandated, it’s a handy one.

Named capture buffers

This lets you give a name to a regular expression capture group, allowing you to using the name rather than (say) $1, $2, etc. For example, you can write

if ($str =~ /tag="(?[0-9]{3})"/ ){
    print $+{tag}, "\n"; # %- is a magic hash that contains the named capture groups' contents
}

rather than

if ($str =~ /tag="([0-9]{3})"/ ){
    print $1, "\n";
}

There’s a bit of a trade-off with this because the regular expression is now a little more difficult to read. However, since the code that uses the results can avoid declaring unnecessary temporary variables and is more robust in the face of changes to the number of capture groups in the regex, that trade-off can be worth it.

UNITCHECK blocks

The UNITCHECK block joins BEGIN, END, INIT and CHECK as ways of designating blocks of code to execute during specific points during the compilation process for a Perl module. UNITCHECK code is executed right after the module has been compiled. In the patch I’m proposing for bug 10503, I found this handy to allow module initialization code to make use of functions defined in that same module.

Warning, warning!

There are some constructs that were added in Perl 5.10, including the given/when keywords and the smart match operator ~~, that are deprecated as of Perl 5.18. Consequently, I will say no more about them other than this: don’t use them! Maybe the RHEL5 adherents have a point after all.

I noticed that a patch recently submitted for Koha adds the following line to one of Koha’s Perl modules:
use utf8;
utf8 is a perfectly fine and proper Perl pragma, right? Indeed it is. The problem is that the purpose of the patch is to try to fix an issue with reading and displaying UTF-8 characters from a file. So what does use utf8; contribute to that patch?

Nothing.

The only thing that the utf8 pragma does is signal to the Perl interpreter that the Perl source code is in UTF-8. If you’re using non-ASCII characters in your source code, you’ll need the pragma. For example,
use utf8; my $str = "La lluvia en España se mantiene principalmente en el llano!";
or even
use utf8; my $str_en_inglés = "The rain in Spain falls mostly on the plain!";
If what you’re actually trying to do is ensure that the script is handling UTF-8 input and output correctly, use utf8; won’t help. This tutorial will.

Not paying close attention to Perl’s definition of truth can sometimes lead to subtle bugs. Consider a simple scalar $x that should contain a string exactly one character wide. If the original value of $x can be undefined and you want to make sure it has a default value of a single space, do not do the following:
$x ||= ' ';

Why not? If $x starts off as ‘0’, a permitted value, this line will change it to ' '. Instead, do this
$x = ' ' unless defined $x;

Remember, 0, '0', '', and undef all evaluate to Perl’s notion of false.

Category: Perl

Tutorial: Perl 5.10 and Koha

Defined-OR operator

Named capture buffers

UNITCHECK blocks

Warning, warning!

A pragma that does less than you might think

What is truth?

Defined-OR operator

Named capture buffers

UNITCHECK blocks

Warning, warning!

Share this:

Share this:

Share this: