This morning I reviewed and pushed the patch for Koha bug 11174. The patch, by Zeno Tajoli, removes one character each from two files.

One character? That should be easy to eyeball, right?

Not quite — the character in question was part of a parameter name in a very long URL. I don’t know about you, but it can take me a while to spot such a difference.

Here is an example. Can you spot the exact difference in less than 2 seconds?

$ git diff --color

diff --git a/INSTALL b/INSTALL
index ffe69ae..e92b1a3 100644
--- a/INSTALL
+++ b/INSTALL
@@ -94,7 +94,7 @@ Use the packaged version or install from CPAN
  sudo make upgrade
 
 Koha 3.4.x or later  no longer stores items in biblio records.
-If you are upgrading from an older version ou will need to do the
+If you are upgrading from an older version you will need to do the
 following two steps, they can take a long time (several hours) to
 complete for large databases

Now imagine doing this if the change occurs in the 100th character of a line that is 150 characters long.

Fortunately, git diff, as well as other commands like git show that display diffs, accepts several switches that let you display the differences in terms of words, not lines. These switches include --word-diff and --color-words. For example:

$ git diff --color-words

diff --git a/INSTALL b/INSTALL
index ffe69ae..e92b1a3 100644
--- a/INSTALL
+++ b/INSTALL
@@ -94,7 +94,7 @@ Use the packaged version or install from CPAN
  sudo make upgrade
 
Koha 3.4.x or later  no longer stores items in biblio records.
If you are upgrading from an older version ouyou will need to do the
following two steps, they can take a long time (several hours) to
complete for large databases

The difference is much easier to see now — at least if you’re not red-green color-blind. You can change the colors or not use colors at all:

$ git diff --word-diff

diff --git a/INSTALL b/INSTALL
index ffe69ae..e92b1a3 100644
--- a/INSTALL
+++ b/INSTALL
@@ -94,7 +94,7 @@ Use the packaged version or install from CPAN
 sudo make upgrade

Koha 3.4.x or later  no longer stores items in biblio records.
If you are upgrading from an older version [-ou-]{+you+} will need to do the
following two steps, they can take a long time (several hours) to
complete for large databases

Going back to the bug I mentioned, --word-diff wasn’t quite enough, though. By default, Git considers words to be delimited by whitespace, but the patch in question removed a character from the middle of a very long URL. To make the change pop out, I had to tell Git to highlight single-character changes. One way to do this is the --word-diff-regex or by passing the regex to --color-words. Here’s the final example:

$ git diff --color-words=.

diff --git a/INSTALL b/INSTALL
index ffe69ae..e92b1a3 100644
--- a/INSTALL
+++ b/INSTALL
@@ -94,7 +94,7 @@ Use the packaged version or install from CPAN
  sudo make upgrade
 
Koha 3.4.x or later  no longer stores items in biblio records.
If you are upgrading from an older version you will need to do the
following two steps, they can take a long time (several hours) to
complete for large databases

And there we have it — the difference, pinpointed.

Space doesn’t matter, except when it does.

The other day Koha bug 11308 was filed reporting a problem with the public catalog search RSS feed. This affected just the new Bootstrap theme.

The bug report noted that when clicking on the RSS feed icon, the page rendered “not like an rss feed should”. That means different things to different web browsers, but we can use an RSS feed validation service like validator.w3.org to see what feed parsers are likely to think.

Before the bug was fixed, the W3C validator reported this:

This feed does not validate.
line 2, column 0: XML parsing error: :2:0: XML or text declaration not at start of entity [help]
<?xml version=’1.0′ encoding=’utf-8′ ?>

Of course, at first glance, the XML declaration looks just fine — the key bit is that it is starting at the second line of the response.

Space matters — XML requires that if an XML declaration is present, it must be the very first thing in the document.

Let’s take a look at the patch, written by Chris Cormack, that fixes the bug:

--- a/koha-tmpl/opac-tmpl/bootstrap/en/modules/opac-opensearch.tt
+++ b/koha-tmpl/opac-tmpl/bootstrap/en/modules/opac-opensearch.tt
@@ -1,5 +1,5 @@
-[% USE Koha %]
 
+[% USE Koha %]
 [% IF ( opensearchdescription ) %]
 
    [% LibraryName |html %] Search

This patch moves the [% USE Koha %] Template Toolkit directive from before the XML declaration to after it. [% USE Koha %] loads a custom Template Toolkit module called “Koha”; further down in the template there is a use of Koha.Preference() to check the value of a system preference.

But why should importing a TT module add a blank line? By default, Template Toolkit will include all of the whitespace present in the template. Since there is a newline after the [% USE Koha %] directive, that newline is included in the response.

Awkward, when spaces matter.

However, Template Toolkit does have a way to chomp whitespace before or after template directives.

This means that an alternative fix could be something like this:

--- a/koha-tmpl/opac-tmpl/bootstrap/en/modules/opac-opensearch.tt
+++ b/koha-tmpl/opac-tmpl/bootstrap/en/modules/opac-opensearch.tt
@@ -1,4 +1,4 @@
-[% USE Koha %]
+[% USE Koha -%]

Adding a single hyphen here means that whitespace after the TT directive should be chomped — in other words, not included in the response.

Most of the time, extra whitespace doesn’t matter for the HTML emitted by Koha. But when space matters… you can use TT to control it.

The current stable version of Perl is 5.18.0 … but for very good reasons, Koha doesn’t require the latest and greatest. For a very long time, Koha required a minimum version of 5.8.8. It wasn’t until October 2011, nearly four years after Perl 5.10.0 was released, that a patch was pushed setting 5.10.0 as Koha’s minimum required version.

Why so long? Since Perl is used by a ton of core system scripts and utilities, OS packagers are reluctant to push ahead too quickly. Debian oldstable has 5.10.1 and Debian stable ships with 5.14.2. Ubuntu tracks Debian in this respect. RHEL5 ships with Perl 5.8 and won’t hit EOL until 2017.

RHEL5 takes it too far in my opinion, unless you really need that degree of stasis — and I’m personally not convinced that staying that far behind the cutting edge necessarily gives one much more in the way of the security. Then again, I don’t work for a bank. Suffice it to say, if you must run a recent version of Koha on RHEL5, you have your work cut out for you — compiling Perl from tarball or using something like Perlbrew to at least get 5.10 is a good idea. That will still leave you with rather a lot of modules to install from CPAN.

But since we, as Koha hackers, can count on having Perl 5.10, we can make the most of it. Here are a few constructs that were added in 5.10 that I find particularly useful for hacking on Koha.

Defined-OR operator

The defined-or operator, //, returns its left operand unless its value is undefined, in which case it returns the right operand. It lets you write:

my $a = get_a_possibly_undefined_value();
$a //= '';
print "Label: $a\n"; # won't throw a warning if the original value was undefined

or

my $a = get_a_possibly_undefined_value() // '';

rather than

my $a = get_a_possibly_undefined_value();
$a = '' unless defined($a);

or (horrors!)

my $a = get_a_possibly_undefined_value();
$a ||= ''; # if $a started out as 0...

Is this just syntactical sugar? Sure, but since Koha is a database-driven application whose schema has a lot of nullable columns, and since use of the Perl warnings pragma is mandated, it’s a handy one.

Named capture buffers

This lets you give a name to a regular expression capture group, allowing you to using the name rather than (say) $1, $2, etc. For example, you can write

if ($str =~ /tag="(?[0-9]{3})"/ ){
    print $+{tag}, "\n"; # %- is a magic hash that contains the named capture groups' contents
}

rather than

if ($str =~ /tag="([0-9]{3})"/ ){
    print $1, "\n";
}

There’s a bit of a trade-off with this because the regular expression is now a little more difficult to read. However, since the code that uses the results can avoid declaring unnecessary temporary variables and is more robust in the face of changes to the number of capture groups in the regex, that trade-off can be worth it.

UNITCHECK blocks

The UNITCHECK block joins BEGIN, END, INIT and CHECK as ways of designating blocks of code to execute during specific points during the compilation process for a Perl module. UNITCHECK code is executed right after the module has been compiled. In the patch I’m proposing for bug 10503, I found this handy to allow module initialization code to make use of functions defined in that same module.

Warning, warning!

There are some constructs that were added in Perl 5.10, including the given/when keywords and the smart match operator ~~, that are deprecated as of Perl 5.18. Consequently, I will say no more about them other than this: don’t use them! Maybe the RHEL5 adherents have a point after all.

Both Koha and Evergreen use memcached to cache user sessions and data that would be expensive to continually fetch and refetch from the database. For example, Koha uses memcached to cache MARC frameworks, while Evergreen caches search results, bibliographic added content, search suggestions, and other data.

Even though the data that gets cached is transitory, at times it can be useful to look at it. For example, you may need to check to see if some stale data is present in the cache, or you may want to capture some statistics about user sessions that would otherwise be lost when the cache expires.

The library libMemcached include several command-line utilities for interrogating a memcached server. We’ll look at memcdump and memccat.

memcdump prints a list of keys that are (or were, since the data may have expired) stored in a memcached server. Here’s an example of the sorts of keys you might see in an Evergreen system:

memcdump --servers 127.0.0.1:11211
oils_AS_21a5dc5cd2aa42ee7c0ecc239dcb25b5
ac.toc.html.0531301990
open-ils.search_9fd0c6c3553e6979fc63aa634a78b362_facets
open-ils.search_9fd0c6c3553e6979fc63aa634a78b362
oils_auth_8682b1017b7b27035576fecbfc7715c4

The --servers 127.0.0.1:11211 bit tells memcdump to check memcached running on the local server.

A list of keys, however, doesn’t tell you much. To see the value that’s stored under that key, use memccat. Here’s an example of looking at a user session record in Koha (assuming you’ve set the SessionStorage system preference to use memcached):

memccat --servers 127.0.0.1:11211 KOHA78c879b9942dee326710ce8e046acede
---
_SESSION_ATIME: '1363060711'
_SESSION_CTIME: '1363060711'
_SESSION_ID: 78c879b9942dee326710ce8e046acede
_SESSION_REMOTE_ADDR: 192.168.1.16
branch: CPL
branchname: Centerville
cardnumber: cat
emailaddress: ''
firstname: ''
flags: 1
id: cat
ip: 192.168.1.16
lasttime: '1363060711'
number: 51
surname: cat

And here’s an example of an Evergreen user session cached object:

memccat --servers 127.0.0.1:11211 oils_auth_8682b1017b7b27035576fecbfc7715c4
{"authtime":420,"userobj":{"__c":"au","__p":[null,null,null,null,null,null,null,null,null,"186",null,"t",null,"f",119284,38997,0,0,"2011-05-31T11:17:16-0400","0.00","1-888-555-1234","1923-01-01T00:00:00-0500","user@example.org",null,"2015-10-29T00:00:00-0400","User","Test",186,654440,3,null,null,null,"1358890660.7173220299.6945940294",119284,"f",1,null,"",null,null,10,null,1,null,"t",654440,"user",null,"f","2013-01-22T16:37:40-0500",null,"f"]}}

We’ll let the YAMLites and JSONistas square off outside, and take a look at a final example. This is an excerpt a cached catalog search result in Evergreen:

memccat --servers 127.0.0.1:11211 open-ils.search_4b81a8a59544e8c7e9fdcda357d7b05f
{"0":{"summary":{"checked":630,"visible":"546","excluded":84,"deleted":0,"total":630,"complex_query":1},"results":[["74093"],["130197"], ...., ["880940"],["574457"]]}}

There are other tools that let you manipulate the cache, including memcrm to remove keys and memccp to load key/value pairs into memcached.

For a complete list of the command-line tools provided by libMemcached, check out its documentation. To install them on Debian or Ubuntu, run apt-get install libmemcached-tools. Note that the Debian package renames the tools from ‘memdump’ to ‘memcdump’, ‘memcat’ to ‘memccat’, etc., to avoid a naming conflict with another package.

I noticed that a patch recently submitted for Koha adds the following line to one of Koha’s Perl modules:

use utf8;

utf8 is a perfectly fine and proper Perl pragma, right? Indeed it is. The problem is that the purpose of the patch is to try to fix an issue with reading and displaying UTF-8 characters from a file. So what does use utf8; contribute to that patch?

Nothing.

The only thing that the utf8 pragma does is signal to the Perl interpreter that the Perl source code is in UTF-8. If you’re using non-ASCII characters in your source code, you’ll need the pragma. For example,

use utf8;
my $str = "La lluvia en España se mantiene principalmente en el llano!";

or even

use utf8;
my $str_en_inglés = "The rain in Spain falls mostly on the plain!";

If what you’re actually trying to do is ensure that the script is handling UTF-8 input and output correctly, use utf8; won’t help. This tutorial will.