A picture is worth a thousand words:

Downloads of Koha Debian packages in past 52 weeks
Click to get larger image.

This represents the approximate geographic distribution of downloads of the Koha Debian packages over the past year. Data was taken from the Apache logs from debian.koha-community.org, which MPOW hosts. I counted only completed downloads of the koha-common package, of which there were over 25,000.

Making the map turned out to be an opportunity for me to learn some Python. I first adapted a Python script I found on Stack Overflow to query freegeoip.net and get the latitude and longitude corresponding to each of the 9,432 distinct IP addresses that had downloaded the package.

I then fed the results to OpenHeatMap. While that service is easy to use and is written with GPL3 code, I didn’t quite like the fact that the result is delivered via an Adobe Flash embed.  Consequently, I turned my attention to Plotly, and after some work, was able to write a Python script that does the following:

  1. Fetch the CSV file containing the coordinates and number of downloads.
  2. Exclude as outliers rows where a given IP address made more than 100 downloads of the package during the past year — there were seven of these.
  3. Truncate the latitude and longitude to one decimal place — we need not pester corn farmers in Kansas for bugfixes.
  4. Submit the dataset to Plotly with which to generate a bubble map.

Here’s the code:

An interactive version of the bubble map is also available on Plotly.

There’s often more than way to search a library catalog; or to put it another way, not all users come in via the front door.  For example, ensuring that your public catalog supports HTTPS can help prevent bad actors from snooping on patron’s searches — but if one of your users happens to use a tool that searches your catalog over Z39.50, by default they have less protection.

Consider this extract from a tcpdump of a Z39.50 session:

No, MARC is not a cipher; it just isn’t.

How to improve this state of affairs? There was some discussion back in 2000 of bundling SSL or TLS into the Z39.50 protocol, although it doesn’t seem like it went anywhere. Of course, SSH tunnels and stunnel are options, but it turns out that there can be an easier way.

As is usually the case with anything involving Z39.50, we can thank the folks at IndexData for being on top of things: it turns out that TLS support is easily enabled in YAZ. Here’s how this can be applied to Evergreen and Koha.

The first step is to create an SSL certificate; a self-signed one probably suffices. The certificate and its private key should be concatenated into a single PEM file, like this:

-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
-----BEGIN PRIVATE KEY-----
...
-----END PRIVATE KEY-----

Evergreen’s Z39.50 server can be told to require SSL via a <listen> element in /openils/conf/oils_yaz.xml, like this:

To supply the path to the certificate, a change to oils_ctl.sh will do the trick:

For Koha, a <listen> element should be added to koha-conf.xml, e.g.,

zebrasrv will also need to know how to find the SSL certificate:

And with that, we can test: yaz-client ssl:localhost:4210/CONS or yaz-client ssl:localhost:4210/biblios. Et voila!

Of course, not every Z39.50 client will know how to use TLS… but lots will, as YAZ is the basis for many of them.

What do ogres, hippogriffs, and authorized Koha service providers have in common?

Each of them is an imaginary creature.

20070522 Madrid: hippogriff -- image by Larry Wentzel on Flickr (CC-BY)
20070522 Madrid: hippogriff — image by Larry Wentzel on Flickr (CC-BY)

Am I saying that Koha service providers are imaginary creatures? Not at all — at the moment, there are 54 paid support providers listed on the Koha project’s website.

But not a one of them is “authorized”.

I bring this up because a friend of mine in India (full disclosure: who himself offers Koha consulting services) ran across this flyer by Avior Technologies:

Avior information sheet

The bit that I’ve highlighted is puffery at best, misleading at worst. The Koha website’s directory of paid support providers is one thing, and one thing only: a directory. The Koha project does not endorse any vendors listed there — and neither the project nor the Horowhenua Library Trust in New Zealand (which holds various Koha trademarks) authorizes any firm to offer Koha services.

If you want your firm to get included in the directory, you need only do a few things:

  1. Have a website that contains an offer of services for Koha.
  2. Ensure that your page that offers services links back to koha-community.org.
  3. Make a public request to be added to the directory.

That’s it.

Not included on this list of criteria:

  • Being good at offering services for Koha libraries.
  • Contributing code, documentation, or anything else to the Koha project.
  • Having any current customers who are willing to vouch for you.
  • Being alive at present (although eventually, your listing will get pulled for lack of response to inquiries from Koha’s webmasters).

What does this mean for folks interested in getting paid support services?  There is no shortcut to doing your due diligence — it is on you to evaluate whether a provider you might hire is competent and able to keep their customers reasonably happy. The directory on the Koha website exists as a convenience for folks starting a search for a provider, but beyond that: caveat emptor.

I know nothing about Avior Technologies. They may be good at what they do; they may be terrible — I make no representation either way.

But I do know this: while there are some open source projects where the notion of an “authorized” or “preferred” support provider may make some degree of sense, Koha isn’t such a project.

And that’s generally to the good of all: if you have Koha expertise or can gain it, you don’t need to ask anybody’s permission to start helping libraries run Koha — and get paid for it.  You can fill niches in the market that other Koha support providers cannot or do not fill.

You can in time become the best Koha vendor in your niche, however you choose to define it.

But authority? It will never be bestowed upon you. It is up to you to earn it by how well you support your customers, and by how much you contribute to the global Koha project.

 

One can, in fact, have too many holidays.

Koha uses the DateTime::Set Perl module when (among other things) calculating the next day the library is open. Unfortunately, the more special holidays you have in a Koha database, the more time DateTime::Set takes to initialize itself — and the time appears to grow faster than linearly with the number of holidays.

Jonathan Druart partially addressed this with his patch for bug 11112 by implementing some lazy initialization and caching for Koha::Calendar, but that doesn’t make DateTime::Set‘s constructor itself any faster.

Today I happened to be working on a Koha database that turned out to have duplicate rows in the special_holidays table. In other words, for a given library, there might be four rows all expressing that the library is closed on 15 August 2014. That database contains hundreds of duplicates, which results in an extra 1-3 seconds per circulation operation.

The duplication is not apparent in the calendar editor, alas.

So here’s my first question: has anybody else seen this in their Koha database? The following query will turn up duplicates:

And my second question: assuming that this somehow came about during normal operation of Koha (as opposed to duplicate rows getting directly loaded into the database), does anybody have any ideas how this happened?

One of the tightropes I must walk on as the current release manager for Koha is held taut by the tension between the necessity of maintaining boundaries with the code and the necessity of acknowledging that the code is not the first concern.

Boundaries matter. Not all code is equal: some is better, some is worse, none is perfect. Some code belongs in Koha. Some code belongs in Koha for lack of a better alternative at the time. Some code does not belong in Koha. Some code will stand the test of time; some code will test our time and energy for years.

The code is not primary. It is no great insight to point out that the code does not write itself; it certainly does not document itself nor pay its own way. Nor does it get to partake in that moment of fleeting joy when things just work, when the code gets out of the way of the librarian and the patron.

What is primary? People and their energy.

Enthusiasm is boundless. It has kept some folks working on Koha for years, beyond the impetus of mere paycheck or even approbation.

Enthusiasm is limited. Anybody volunteering passion for a free software project has a question to answer: is there something better to do with my time? If the answer turns into “no”… well, there are many ways in this world to contribute to happiness, personal or shared.

Caviling can be costly — possibly, beyond measure. One “RTFM” can eliminate an entire manual’s worth of help down the road.

On the other hand, the impulse to tweak, to provide feedback, to tune a new idea, can come from the best of intentions. Passion is not enough by itself; experience matters, can guide new effort.

It’s a tightrope we all walk. But the people must come first.

My meditation: what ways of interacting among ourselves conserves enthusiasm, and thereby grows it? And how do we avoid destroying it needlessly?

This morning I reviewed and pushed the patch for Koha bug 11174. The patch, by Zeno Tajoli, removes one character each from two files.

One character? That should be easy to eyeball, right?

Not quite — the character in question was part of a parameter name in a very long URL. I don’t know about you, but it can take me a while to spot such a difference.

Here is an example. Can you spot the exact difference in less than 2 seconds?

$ git diff --color

diff --git a/INSTALL b/INSTALL
index ffe69ae..e92b1a3 100644
--- a/INSTALL
+++ b/INSTALL
@@ -94,7 +94,7 @@ Use the packaged version or install from CPAN
  sudo make upgrade
 
 Koha 3.4.x or later  no longer stores items in biblio records.
-If you are upgrading from an older version ou will need to do the
+If you are upgrading from an older version you will need to do the
 following two steps, they can take a long time (several hours) to
 complete for large databases

Now imagine doing this if the change occurs in the 100th character of a line that is 150 characters long.

Fortunately, git diff, as well as other commands like git show that display diffs, accepts several switches that let you display the differences in terms of words, not lines. These switches include --word-diff and --color-words. For example:

$ git diff --color-words

diff --git a/INSTALL b/INSTALL
index ffe69ae..e92b1a3 100644
--- a/INSTALL
+++ b/INSTALL
@@ -94,7 +94,7 @@ Use the packaged version or install from CPAN
  sudo make upgrade
 
Koha 3.4.x or later  no longer stores items in biblio records.
If you are upgrading from an older version ouyou will need to do the
following two steps, they can take a long time (several hours) to
complete for large databases

The difference is much easier to see now — at least if you’re not red-green color-blind. You can change the colors or not use colors at all:

$ git diff --word-diff

diff --git a/INSTALL b/INSTALL
index ffe69ae..e92b1a3 100644
--- a/INSTALL
+++ b/INSTALL
@@ -94,7 +94,7 @@ Use the packaged version or install from CPAN
 sudo make upgrade

Koha 3.4.x or later  no longer stores items in biblio records.
If you are upgrading from an older version [-ou-]{+you+} will need to do the
following two steps, they can take a long time (several hours) to
complete for large databases

Going back to the bug I mentioned, --word-diff wasn’t quite enough, though. By default, Git considers words to be delimited by whitespace, but the patch in question removed a character from the middle of a very long URL. To make the change pop out, I had to tell Git to highlight single-character changes. One way to do this is the --word-diff-regex or by passing the regex to --color-words. Here’s the final example:

$ git diff --color-words=.

diff --git a/INSTALL b/INSTALL
index ffe69ae..e92b1a3 100644
--- a/INSTALL
+++ b/INSTALL
@@ -94,7 +94,7 @@ Use the packaged version or install from CPAN
  sudo make upgrade
 
Koha 3.4.x or later  no longer stores items in biblio records.
If you are upgrading from an older version you will need to do the
following two steps, they can take a long time (several hours) to
complete for large databases

And there we have it — the difference, pinpointed.

Space doesn’t matter, except when it does.

The other day Koha bug 11308 was filed reporting a problem with the public catalog search RSS feed. This affected just the new Bootstrap theme.

The bug report noted that when clicking on the RSS feed icon, the page rendered “not like an rss feed should”. That means different things to different web browsers, but we can use an RSS feed validation service like validator.w3.org to see what feed parsers are likely to think.

Before the bug was fixed, the W3C validator reported this:

This feed does not validate.
line 2, column 0: XML parsing error: :2:0: XML or text declaration not at start of entity [help]
<?xml version=’1.0′ encoding=’utf-8′ ?>

Of course, at first glance, the XML declaration looks just fine — the key bit is that it is starting at the second line of the response.

Space matters — XML requires that if an XML declaration is present, it must be the very first thing in the document.

Let’s take a look at the patch, written by Chris Cormack, that fixes the bug:

This patch moves the [% USE Koha %] Template Toolkit directive from before the XML declaration to after it. [% USE Koha %] loads a custom Template Toolkit module called “Koha”; further down in the template there is a use of Koha.Preference() to check the value of a system preference.

But why should importing a TT module add a blank line? By default, Template Toolkit will include all of the whitespace present in the template. Since there is a newline after the [% USE Koha %] directive, that newline is included in the response.

Awkward, when spaces matter.

However, Template Toolkit does have a way to chomp whitespace before or after template directives.

This means that an alternative fix could be something like this:

Adding a single hyphen here means that whitespace after the TT directive should be chomped — in other words, not included in the response.

Most of the time, extra whitespace doesn’t matter for the HTML emitted by Koha. But when space matters… you can use TT to control it.

The next few days will be pretty intense for me, as I’ll be joining friends old and new for the hackfest of the 2013 Koha Conference. Hackfest will be an opportunity for folks to learn things, including how to work with Koha’s code, how and why librarians do the things they do — and how and why developers do the things they do. Stuff will be broken, stuff will be built up again, new features will be added, bugs will be fixed, and along the way, I will be cutting another alpha release of Koha 3.14.

Unfortunately, not everybody will be able to be sitting inside the conference room in Reno for the next three days. How can one participate from afar? Lots of ways:

  • Read the koha-devel mailing list and join the conversation. I will, at minimum, post a summary to koha-devel each day.
  • Follow the #kohacon13 hashtag on Twitter. Tweet to us using that hashtag if you have a question or request.
  • Look for blog posts from hackfest.
  • Join the #koha IRC channel.
  • Keep an eye on changes on the Koha wiki, particularly the roundtable notes and hackfest wishlist pages. If you’ve got additions, corrections, or clarifications to offer, please feel free to let us know or to edit the wiki pages directly.
  • Watch the Koha dashboard for patches to test and to see the progress made during hackfest.
  • Test and sign off on patches. BibLibre’s sandboxes make that super-duper simple.

Hackfest isn’t just for folks who know their way around the code — if you know about library practice, or have time to test things, or can write up documentation, you can help too!

We may also try setting up a Google hangout. Because Google Hangout has a limit on the number of simultaneous users, if you’re interested in joining one, please let me know. If you have suggestions for other ways that folks can participate remotely, please let us know that as well.

Happy hacking!

Sometimes an idea that’s been staring you in the face has to jump up and down and wave its hands to get attention.

I was working with Katrin Fischer, Koha’s QA manager, who had just finished putting together a fresh Koha testing environment on her laptop so that she can do patch review during KohaCon’s hackfest. She mentioned wishing that something like MarcEdit were on her laptop so that she could quickly edit some records for testing. While MarcEdit could be run under WINE or Mono or in a Windows virtual machine, inspiration struck me: with a little help, vim makes a perfectly good basic MARC editor.

Here’s how — if you start with a file of MARC records, you can convert them to a text file using yaz-marcdump:

The resulting text file will look something like this:

To edit the records on the command line, you can use vim (or whatever your favorite text editor is). When you’re done, to convert them back to MARC, use

To avoid mangling special characters, it’s helpful to use UTF8 as the character encoding. yaz-marcdump can also be used to convert a MARC file to UTF8. For example, if the original MARC file uses the MARC-8 encoding, you could do:

Not particularly profound, perhaps — and the title of this post is a bit tongue-in-cheek — but I know that this technique will save me a bit of time.

One number I quite like today is 99. That’s the difference between the count of explicitly enumerated tests in Koha’s master branch as of 19 May (1,837) and the count today (1,936)1. So far in the 3.14 cycle, eleven people have contributed patches that touch t/.

In particular, there’s been quite a bit of work on the database-dependent test suite that has increased both its coverage and its usability. Database-dependent test cases are useful for several reasons. First, a good bit of Koha’s code simply cannot be tested under realistic conditions if a Koha database isn’t available to talk to; while DBD::Mock can be use to mock query responses, it can be tedious to write the mocks. Second, test scripts that can use a database can readily exercise not just individual routines, but higher-level workflows. For example, it would be feasible to write a set of tests that creates a loan, renews it, simulates it becoming overdue, charges overdue fines, then returns the loan. In turn, being able to test larger sequences of actions can make it easier to avoid cases where a seemingly innocuous change to one core routine has an unanticipated effect elsewhere. This consideration particularly matters for Koha’s circulation code.

The automated buildbot has been running the DB-dependent tests for some time, but it’s historically been a dicier proposition for the average Koha hacker to run them on their own development databases. On the one hand, you probably don’t want to risk letting a test case mess up your database. On the other hand, some of the test cases make assumptions about the initial state of the database that may be unwarranted.

Although letting the buildbot do its thing is good, test cases are most useful if developers are able and willing to run them at any time. Worrying about damage to your development DB or having to figure out fiddly preconditions both decrease the probability that the tests will be run.

Recently, a simple “trick” has been adopted to deal with the first concern: make each DB-dependent test script operate in a transaction that gets rolled back. This is simple to set up:

The trick lies in setting AutoCommit on the database handle to 0. Setting RaiseError will cause the test script to abort if a fatal SQL error is raised. The $dbh->rollback() at the end is optional; if you let the script fall through to the end, or if the script terminates unexpectedly, the transaction will get rolled back regardless.

Doing all of the tests inside of a transaction grants you … freedom. Testing circulation policies? You can empty out issuingrules, set up a set of test policies, run through the variations, then end the test script confident that your original loan rules will be back in place.

It also grants you ease. Although it’s a good idea for Koha to let you easily run the tests in a completely fresh database, test cases that can run in your main development database are even better.

This ties into the second concern, which is being addressed by an ongoing project which Jonathan Druart and others have been working on to make each test script create the test data it needs. For example, if a test script needs a patron record, it will add it rather than assume that the database contains one. The DB-dependent tests currently do make a broader assumption that some of the English-language sample data has been loaded (most notably the sample libraries), but I’m confident that that will be resolved by the time 3.14 is released.

I’m seeing a virtuous cycle starting to develop: the safer it gets for Koha devs to run the tests, the more that they will be run — and the more that will get written. In turn, the more test coverage we achieve, the more confidently we can do necessary refactoring. In addition, the more tests we have, the more documentation — executable documentation! — we’ll have of Koha’s internals.


[1] From the top of Koha’s source tree, egrep -ro 'tests => [0-9]+' t |awk '{print $3}'|paste -d+ -s |bc