Duplicate holidays, and a question

One can, in fact, have too many holidays.

Koha uses the DateTime::Set Perl module when (among other things) calculating the next day the library is open. Unfortunately, the more special holidays you have in a Koha database, the more time DateTime::Set takes to initialize itself — and the time appears to grow faster than linearly with the number of holidays.

Jonathan Druart partially addressed this with his patch for bug 11112 by implementing some lazy initialization and caching for Koha::Calendar, but that doesn’t make DateTime::Set‘s constructor itself any faster.

Today I happened to be working on a Koha database that turned out to have duplicate rows in the special_holidays table. In other words, for a given library, there might be four rows all expressing that the library is closed on 15 August 2014. That database contains hundreds of duplicates, which results in an extra 1-3 seconds per circulation operation.

The duplication is not apparent in the calendar editor, alas.

So here’s my first question: has anybody else seen this in their Koha database? The following query will turn up duplicates:

And my second question: assuming that this somehow came about during normal operation of Koha (as opposed to duplicate rows getting directly loaded into the database), does anybody have any ideas how this happened?

Conservation of enthusiasm

One of the tightropes I must walk on as the current release manager for Koha is held taut by the tension between the necessity of maintaining boundaries with the code and the necessity of acknowledging that the code is not the first concern.

Boundaries matter. Not all code is equal: some is better, some is worse, none is perfect. Some code belongs in Koha. Some code belongs in Koha for lack of a better alternative at the time. Some code does not belong in Koha. Some code will stand the test of time; some code will test our time and energy for years.

The code is not primary. It is no great insight to point out that the code does not write itself; it certainly does not document itself nor pay its own way. Nor does it get to partake in that moment of fleeting joy when things just work, when the code gets out of the way of the librarian and the patron.

What is primary? People and their energy.

Enthusiasm is boundless. It has kept some folks working on Koha for years, beyond the impetus of mere paycheck or even approbation.

Enthusiasm is limited. Anybody volunteering passion for a free software project has a question to answer: is there something better to do with my time? If the answer turns into “no”… well, there are many ways in this world to contribute to happiness, personal or shared.

Caviling can be costly — possibly, beyond measure. One “RTFM” can eliminate an entire manual’s worth of help down the road.

On the other hand, the impulse to tweak, to provide feedback, to tune a new idea, can come from the best of intentions. Passion is not enough by itself; experience matters, can guide new effort.

It’s a tightrope we all walk. But the people must come first.

My meditation: what ways of interacting among ourselves conserves enthusiasm, and thereby grows it? And how do we avoid destroying it needlessly?

Koha Dev Tidbit #2: Pinpoint the difference

This morning I reviewed and pushed the patch for Koha bug 11174. The patch, by Zeno Tajoli, removes one character each from two files.

One character? That should be easy to eyeball, right?

Not quite — the character in question was part of a parameter name in a very long URL. I don’t know about you, but it can take me a while to spot such a difference.

Here is an example. Can you spot the exact difference in less than 2 seconds?

$ git diff --color

diff --git a/INSTALL b/INSTALL
index ffe69ae..e92b1a3 100644
--- a/INSTALL
+++ b/INSTALL
@@ -94,7 +94,7 @@ Use the packaged version or install from CPAN
  sudo make upgrade
 
 Koha 3.4.x or later  no longer stores items in biblio records.
-If you are upgrading from an older version ou will need to do the
+If you are upgrading from an older version you will need to do the
 following two steps, they can take a long time (several hours) to
 complete for large databases

Now imagine doing this if the change occurs in the 100th character of a line that is 150 characters long.

Fortunately, git diff, as well as other commands like git show that display diffs, accepts several switches that let you display the differences in terms of words, not lines. These switches include --word-diff and --color-words. For example:

$ git diff --color-words

diff --git a/INSTALL b/INSTALL
index ffe69ae..e92b1a3 100644
--- a/INSTALL
+++ b/INSTALL
@@ -94,7 +94,7 @@ Use the packaged version or install from CPAN
  sudo make upgrade
 
Koha 3.4.x or later  no longer stores items in biblio records.
If you are upgrading from an older version ouyou will need to do the
following two steps, they can take a long time (several hours) to
complete for large databases

The difference is much easier to see now — at least if you’re not red-green color-blind. You can change the colors or not use colors at all:

$ git diff --word-diff

diff --git a/INSTALL b/INSTALL
index ffe69ae..e92b1a3 100644
--- a/INSTALL
+++ b/INSTALL
@@ -94,7 +94,7 @@ Use the packaged version or install from CPAN
 sudo make upgrade

Koha 3.4.x or later  no longer stores items in biblio records.
If you are upgrading from an older version [-ou-]{+you+} will need to do the
following two steps, they can take a long time (several hours) to
complete for large databases

Going back to the bug I mentioned, --word-diff wasn’t quite enough, though. By default, Git considers words to be delimited by whitespace, but the patch in question removed a character from the middle of a very long URL. To make the change pop out, I had to tell Git to highlight single-character changes. One way to do this is the --word-diff-regex or by passing the regex to --color-words. Here’s the final example:

$ git diff --color-words=.

diff --git a/INSTALL b/INSTALL
index ffe69ae..e92b1a3 100644
--- a/INSTALL
+++ b/INSTALL
@@ -94,7 +94,7 @@ Use the packaged version or install from CPAN
  sudo make upgrade
 
Koha 3.4.x or later  no longer stores items in biblio records.
If you are upgrading from an older version you will need to do the
following two steps, they can take a long time (several hours) to
complete for large databases

And there we have it — the difference, pinpointed.

Notes from Code4Lib BC: gluing together an ugly string to emit RDF

This afternoon I’m sitting in the new bibliographic environment breakout session at Code4Lib BC. After taking a look at Mark Jordan’s easyLOD, I decided to play around with putting together a web service for Koha that emits RDF when fed a bib ID. Unlike Magnus Enger’s semantikoha prototype, which uses a Ruby library to convert MARC to RDF, I was trying for an approach that used only Perl (plus XS).

There were are of building blocks available. Putting them together turned out to be a tick more convoluted than I expected.

The Library of Congress has published an XSL stylesheet for converting MODS to RDF. Converting MARC(XML) to MODS is readily done using other stylesheets, also published by LC.

The path seemed clear for a quick-and-dirty prototype — make a copy of svc/bib, copy it to opac/svc/bib and take out the bits for doing updates (we’re not quite ready to make cataloging that collaborative!), and write a few lines to apply two XSLT transformations.

The code was quickly written — but it didn’t work. XML::LibXSLT, which Koha uses to handle XSLT, complained about the modsrdf.xsl stylesheet. Too new! That stylesheet is written in XSLT 2.0, but libxslt, the C library that XML::LibXSLT is based on, only supports XSLT 1.

As it turns out, Perl modules that can handle XSLT are rather thin on the ground. What I ended up doing was:

Installing XML::Saxon::XSLT2, which required…

Installing Saxon-HE, a Java XML and XSLT processor that supports XSLT 2.0, which required…

Installing Inline::Java, which required…

Installing a JDK (I happened to choose OpenJDK).

After all that (and a quick tweak to the modsrdf.xsl stylesheet, I ended up with the following code that did the trick:

This works… but is not satisfying. Making Koha require a JDK just for XSLT 2.0 support is a bit much, for one thing, and it would likely be rather slow if used in production. It’s a pity that there’s still no broad support for XSLT 2.0.

A dead end, most likely, but instructive nonetheless.

Notes from Code4Lib BC: Accessibility

Peach Arch.  Photo by Daniel Means.  Licensed under CC-BY-SA and available at http://www.flickr.com/photos/supa_pedro/389603266.

Peach Arch. Photo by Daniel Means. Licensed under CC-BY-SA and available at Flickr.

There is nothing quite like the sense of sheer glee you get when you’re waiting at the border… and have been waiting at the border for a while… and then a new customs inspection lane is opened up. Zoom!

Marlene and I left Seattle this morning to go to the Code4Lib BC conference in Vancouver. Leaving in the morning meant that we missed the lightning talks, and arrived after the breakout sessions had started. Fortunately, folks were quick to welcome us, and I soon fell into the accessibility session.

Accessibility has been on my mind lately, but it’s an area that I’m starting mostly from ground zero with. I knew that designing accessible systems is a Good Idea, I knew about the existence some of the jargon and standards, and I knew that I didn’t know much else — certainly none of the specifics.

Cynthia Ng very kindly shared some pointers with me. For example, it is helpful to know that the Section 508 guidelines is essentially a subset of WCAG 1.0. This is exactly the sort of shortcut (through an apparently intimidating forest) that an expert can effortlessly give to a newbie — and having opportunities to learn from the experts is one of the reasons why I like going to conferences.

The accessibility breakout session charged itself with putting together a list of resources and best practices for accessibility and universal design. As I mentioned above, we arrived in the middle of the breakout session time, but a couple hours was more than enough time to get initial exposure to a lot of ideas and resources. It was exhilarating.

In no particular order, here is a list of various things that I’ll be following up on:

  • The Accessibility Project
  • Guerilla testing
  • The 5 second test
  • Swim lane diagrams
  • The Paciello Group Blog
  • Be careful about putting things in the right sidebar of a three-column layout — a lot of users have been trained by web advertising to completely ignore that region.  Similarly, a graphic with moving parts can get ignored if it looks too much like an ad.
  • The Code4Lib BC accessibility group’s notes
  • Having consistency of branding and look and feel can improve usability — but that can be a challenge when integrating a lot of separate systems (particularly if a library and a vendor have different ideas about whose branding should be foremost).
  • Integrating one’s content strategy with one’s accessibility strategy.  To paraphrase a point that Cynthia made a few times, putting out too much text is a problem for any user.
  • As with so much of software design, iterate early and often. The time to start thinking about accessibility is when you’re 20% of the way through a project, not when you’re 80% done.
  • Standards can help, but only up to a point.  A website could pass an automated WCAG compliance test with flying colors but not actually be usable by anyone.

And there’s another day of conference yet!  I’m quite happy we made the drive up.

Here’s a general question to the world: what reading material do you recommend for folks like me who want to learn more about writing accessible web software?

Koha Dev Tidbit #1: control of space

Space doesn’t matter, except when it does.

The other day Koha bug 11308 was filed reporting a problem with the public catalog search RSS feed. This affected just the new Bootstrap theme.

The bug report noted that when clicking on the RSS feed icon, the page rendered “not like an rss feed should”. That means different things to different web browsers, but we can use an RSS feed validation service like validator.w3.org to see what feed parsers are likely to think.

Before the bug was fixed, the W3C validator reported this:

This feed does not validate.
line 2, column 0: XML parsing error: :2:0: XML or text declaration not at start of entity [help]
<?xml version=’1.0′ encoding=’utf-8′ ?>

Of course, at first glance, the XML declaration looks just fine -- the key bit is that it is starting at the second line of the response.

Space matters -- XML requires that if an XML declaration is present, it must be the very first thing in the document.

Let's take a look at the patch, written by Chris Cormack, that fixes the bug:

This patch moves the [% USE Koha %] Template Toolkit directive from before the XML declaration to after it. [% USE Koha %] loads a custom Template Toolkit module called "Koha"; further down in the template there is a use of Koha.Preference() to check the value of a system preference.

But why should importing a TT module add a blank line? By default, Template Toolkit will include all of the whitespace present in the template. Since there is a newline after the [% USE Koha %] directive, that newline is included in the response.

Awkward, when spaces matter.

However, Template Toolkit does have a way to chomp whitespace before or after template directives.

This means that an alternative fix could be something like this:

Adding a single hyphen here means that whitespace after the TT directive should be chomped -- in other words, not included in the response.

Most of the time, extra whitespace doesn't matter for the HTML emitted by Koha. But when space matters... you can use TT to control it.

How to participate in the KohaCon 2013 hackfest from afar

The next few days will be pretty intense for me, as I’ll be joining friends old and new for the hackfest of the 2013 Koha Conference. Hackfest will be an opportunity for folks to learn things, including how to work with Koha’s code, how and why librarians do the things they do — and how and why developers do the things they do. Stuff will be broken, stuff will be built up again, new features will be added, bugs will be fixed, and along the way, I will be cutting another alpha release of Koha 3.14.

Unfortunately, not everybody will be able to be sitting inside the conference room in Reno for the next three days. How can one participate from afar? Lots of ways:

  • Read the koha-devel mailing list and join the conversation. I will, at minimum, post a summary to koha-devel each day.
  • Follow the #kohacon13 hashtag on Twitter. Tweet to us using that hashtag if you have a question or request.
  • Look for blog posts from hackfest.
  • Join the #koha IRC channel.
  • Keep an eye on changes on the Koha wiki, particularly the roundtable notes and hackfest wishlist pages. If you’ve got additions, corrections, or clarifications to offer, please feel free to let us know or to edit the wiki pages directly.
  • Watch the Koha dashboard for patches to test and to see the progress made during hackfest.
  • Test and sign off on patches. BibLibre’s sandboxes make that super-duper simple.

Hackfest isn’t just for folks who know their way around the code — if you know about library practice, or have time to test things, or can write up documentation, you can help too!

We may also try setting up a Google hangout. Because Google Hangout has a limit on the number of simultaneous users, if you’re interested in joining one, please let me know. If you have suggestions for other ways that folks can participate remotely, please let us know that as well.

Happy hacking!

Introducing the next new MARC editor, vim

Sometimes an idea that’s been staring you in the face has to jump up and down and wave its hands to get attention.

I was working with Katrin Fischer, Koha’s QA manager, who had just finished putting together a fresh Koha testing environment on her laptop so that she can do patch review during KohaCon’s hackfest. She mentioned wishing that something like MarcEdit were on her laptop so that she could quickly edit some records for testing. While MarcEdit could be run under WINE or Mono or in a Windows virtual machine, inspiration struck me: with a little help, vim makes a perfectly good basic MARC editor.

Here’s how — if you start with a file of MARC records, you can convert them to a text file using yaz-marcdump:

The resulting text file will look something like this:

To edit the records on the command line, you can use vim (or whatever your favorite text editor is). When you’re done, to convert them back to MARC, use

To avoid mangling special characters, it’s helpful to use UTF8 as the character encoding. yaz-marcdump can also be used to convert a MARC file to UTF8. For example, if the original MARC file uses the MARC-8 encoding, you could do:

Not particularly profound, perhaps — and the title of this post is a bit tongue-in-cheek — but I know that this technique will save me a bit of time.

Mini Review: The Alchemist by Paolo Coelho

The Alchemist by Paolo Coelho

The Alchemist details the journey of a young Andalusian shepherd boy named Santiago. Santiago, believing a recurring dream to be prophetic, decides to travel to the pyramids of Egypt to find treasure. On the way, he encounters love, danger, opportunity and disaster. One of the significant characters that he meets is an old king named Melchizedek who tells him that “When you want something, all the universe conspires in helping you to achieve it.” This is the core philosophy and motif of the book.

This book clearly aims to be profoundly simple, couched in the language of a fable or extended parable. The attempt doesn’t work for me, but I am not a theist. Nor am I inclined towards a tale of a protagonist pursuing (or more likely, having handed to him) his “personal legend” that doesn’t make at least a nod to the situation of those whose legend is forever cut off from them through no fault of their own. It is a quick read, however, and I did like the imagery of the desert that Coelho evokes.

Rating: 2/5

Worldcon 2013 wrap-up

worldcon_dalekA few thoughts on the flight back from the World Science Fiction Convention, in no particular order:

  • It’s been too long since the last time we went to a Worldcon.
  • Paul Cornell kept the toast very well in hand — it was a real treat watching him be master of ceremonies at the masquerade and the Hugos.
  • Kudos to the program committee for the Spanish language track — a U.S. Worldcon that acknowledges the good stuff that’s not written in English is all the better for it.
  • Tarnation on the program committee for the Spanish language track — I need more books like I need a hole in my head (or a hole in the apartment’s floor, more likely), yet here I am, ordering Three Messages and a Warning.
  • The panel discussing harassment and con harassment policies could easily have gone another few hours. I look forward to the dialogue — and action! — continuing. One thing I heard and a couple things I saw made it clear that the need to work for more inclusivity continues. For example, the choice of which Rostler’s Rules were on the slideshow prior to the masquerade included weight-shaming; we are better than that.
  • To whoever put up the “SMOF zone” sign on the door to the room holding the WSFS business meeting: I get the joke, but if you want broader participation, make it triply clear that all con members can participate in WSFS business. If you don’t want broader participation, I rather suspect a fannish corollary to Gilmore’s Law will be drafted — but it doesn’t need to come to that. For more background, I recommend reading this post by aiglet12.
  • An opportunity was missed in regards to the People en Español event that shared the convention center with Worldcon. I hope that future cons will consider opening the dealer’s room and exhibits to the general public. I also hope that they will consider opportunities for cross-programming with other events should they present themselves.
  • I now know that I can blame the Daleks for con time. WAIT! WAIT! OBEY! WAIT!
  • Despite a few rough edges, I’m glad we went. I’m looking forward to LonCon next year.