Author Archives:

Tips and tricks for leaking patron information

Here is a partial list of various ways I can think of to expose information about library patrons and their search and reading history by use (and misuse) of software used or recommended by libraries.

  • Send a patron’s ebook reading history to a commercial website…
    • … in the clear, for anybody to intercept.
  • Send patron information to a third party…
    • … that does not have an adequate privacy policy.
    • … that has an adequate privacy policy but does not implement it well.
    • … that is sufficiently remote that libraries lack any leverage to punish it for egregious mishandling of patron data.
  • Use an unencrypted protocol to enable a third-party service provider to authenticate patrons or look them up…
    • … such as SIP2.
    • … such as SIP2, with the patron information response message configured to include full contact information for the patron.
    • … or many configurations of NCIP.
    • … or web services accessible over HTTP (as opposed to HTTPS).
  • Store patron PINs and passwords without encryption…
    • … or using weak hashing.
  • Store the patron’s Social Security Number in the ILS patron record.
  • Don’t require HTTPS for a patron to access her account with the library…
    • … or if you do, don’t keep up to date with the various SSL and TLS flaws announced over the years.
  • Make session cookies used by your ILS or discovery layer easy to snoop.
  • Use HTTP at all in your ILS or discovery layer – as oddly enough, many patrons will borrow the items that they search for.
  • Send an unencrypted email…
    • … containing a patron’s checkouts today (i.e., an email checkout receipt).
    • … reminding a patron of his overdue books – and listing them.
    • … listing the titles of the patron’s available hold requests.
  • Don’t encrypt connections between an ILS client program and its application server.
  • Don’t encrypt connections between an ILS application server and its database server.
  • Don’t notice that a rootkit has been running on your ILS server for the past six months.
  • Don’t notice that a keylogger has been running on one of your circulation PCs for the past three months.
  • Fail to keep up with installing operating system security patches.
  • Use the same password for the circulator account used by twenty circulation staff (and 50 former circulation staff) – and never change it.
  • Don’t encrypt your backups.
  • Don’t use the feature in your ILS to enable severing the link between the record of a past loan and the specific patron who took the item out…
    • … sever the links, but retain database backups for months or years.
  • Don’t give your patrons the ability to opt out of keeping track of their past loans.
  • Don’t give your patrons the ability to opt in to keeping track of their past loans.
  • Don’t give the patron any control or ability to completely sever the link between her record and her past circulation history whenever she chooses to.
  • When a patron calls up asking “what books do I have checked out?” … answer the question without verifying that the patron is actually who she says she is.
  • When a parent calls up asking “what books does my teenager have checked out?”… answer the question.
  • Set up your ILS to print out hold slips… that include the full name of the patron. For bonus points, do this while maintaining an open holds shelf.
  • Don’t shred any circulation receipts that patrons leave behind.
  • Don’t train your non-MLS staff on the importance of keeping patron information confidential.
  • Don’t give your MLS staff refreshers on professional ethics.
  • Don’t shut down library staff gossiping about a patron’s reading preferences.
  • Don’t immediately sack a library staff member caught misusing confidential patron information.
  • Have your ILS or discovery interface hosted by a service provider that makes one or more of the mistakes listed above.
  • Join a committee writing a technical standard for library software… and don’t insist that it take patron privacy into account.

Do you have any additions to the list? Please let me know!

Of course, I am not actually advocating disclosing confidential information. Stay tuned for a follow-up post.

Verifying our tools; a role for ALA?

It came to light on Monday that the latest version of Adobe Digital Editions is sending metadata on ebooks that are read through the application to an Adobe server — in clear text.

I’ve personally verified the claim that this is happening, as have lots of other people. I particularly like Andromeda Yelton’s screencast, as it shows some of the steps that others can take to see this for themselves.

In particular, it looks like any ebook that has been opened in Digital Editions or added to a “library” there gets reported on. The original report by Nate Hofffelder at The Digital Reader also said that ebook that were not known to Digital Editions were being reported, though I and others haven’t seen that — but at the moment, since nobody is saying that they’ve decompiled the program and analyzed exactly when Digital Editions sends its reports, it’s possible that Nate simply fell into a rare execution pathUPDATE 10 October 2014: Yesterday I was able to confirm that if an ereader device is attached to a PC and is recognized by ADE, metadata from the books on that device can also be sent in the clear.

This move by Adobe, whether or not they’re permanently storing the ebook reading history, and whether or not they think they have good intentions, is bad for a number of reasons:

  • By sending the information in the clear, anybody can intercept it and choose to act on somebody’s choice of reading material.  This applies to governments, corporations, and unenlightened but technically adept parents.  And as far as state actors are concerned – it actually doesn’t matter that Digital Editions isn’t sending information like name and email addresses in the clear; the user’s IP address and the unique ID assigned by Digital Editions will often be sufficient for somebody to, with effort, link a reading history to an individual.
  • The release notes from Adobe gave no hint that Digital Editions was going to start doing this. While Amazon’s Kindle platform also keeps track of reading history, at least Amazon has been relatively forthright about it.
  • The privacy policy and license agreement similarly did not explicitly mention this. There has been some discussion to the effect that if one looks at those documents closely enough, that there is an implied suggestion that Adobe can capture and log anything one chooses to do with their software. But even if that’s the case – and I’m not sure that this argument would fly in countries with stronger data privacy protection than the U.S. – sending this information in the clear is completely inconsistent with modern security practices.
  • Digital Editions is part of the toolchain that a number of library ebook lending platforms use.

The last point is key. Everybody should be concerned about an app that spouts reading history in the clear, but librarians in particular have a professional responsibility to protect our user’s reading history.

What does it mean in the here and now? Some specific immediate steps I suggest for libraries is to:

  • Publicize the problem to their patrons.
  • Officially warn their patrons against using Digital Editions 4.0, and point to work arounds like pointing “adelogs.adobe.com” to “127.0.0.1” in hosts files.
  • If they must use Digital Editions to borrow ebooks, to recommend the use of earlier versions, which do not appear to be spying on users.

However, there are things that also need to be done in the long term.

Accepting DRM has been a terrible dilemma for libraries – enabling and supporting, no matter how passively, tools for limiting access to information flies against our professional values.  On the other hand, without some degree of acquiescence to it, libraries would be even more limited in their ability to offer current books to their patrons.

But as the Electronic Frontier Foundation points out,  DRM as practiced today is fundamentally inimical to privacy. If, following Andromeda Yelton’s post this morning, we value our professional soul, something has to give.

In other words, we have to have a serious discussion about whether we can responsibly support any level of DRM in the ebooks that we offer to our patrons.

But there’s a more immediate step that we can take. This whole thing came to light because a “hacker acquaintance” of Nate’s decided to see what Digital Editions is sending home. And a key point? Once the testing starting, it probably didn’t take that hacker more than half an hour to see what was going on, and it may well have taken only five.

While the library profession probably doesn’t count very many professional security researchers among its ranks, this sort of testing is not black magic.  Lots of systems librarians, sysadmins, and developers working for libraries already know how to use tcpdump and Wireshark and the like.

So what do we need to do? We need to stop blindly trusting our tools.  We need to be suspicious, in other words, and put anything that we would recommend to our patrons to the test to verify that it is not leaking patron information.

This is where organizations like ALA can play an important role.  Some things that ALA could do include:

  • Establishing a clearinghouse for reports of security and privacy violations in library software.
  • Distribute information on ways to perform security audits.
  • Do testing of library software in house and hire security researches as needed.
  • Provide institutional and legal support for these efforts.

That last point is key, and is why I’m calling on ALA in particular. There have been plenty of cases where software vendors have sued, or threatened to sue, folks who have pointed out security flaws. Rather than permitting that sort of chilling effect to be tolerated in the realm of library software, ALA can provide cover for individuals and libraries engaged in the testing that is necessary to protect our users.

Banned books and the library of Morpheus

A notion that haunts me is found in Neil Gaiman’s The Sandman: the library of the Dreaming, wherein can be found books that no earth-bound librarian can collect.  Books that caught existence only in the dreams – or passing thoughts – of their authors. The Great American Novel. Every Great American Novel, by all of the frustrated middle managers, farmers, and factory workers who had their heart attack too soon. Every Great Nepalese Novel.  The conclusion of the Wheel of Time, as written by Robert Jordan himself.

That library has a section containing every book whose physical embodiment was stolen.  All of the poems of Sappho. Every Mayan and Olmec text – including the ones that, in the real world, did not survive the fires of the invaders.

Books can be like cockroaches. Text thought long-lost can turn up unexpectedly, sometimes just by virtue of having been left lying around until someone things to take a closer look. It is not an impossible hope that one day, another Mayan codex may make its reappearance, thumbing its nose at the colonizers and censors who despised it and the culture and people it came from.

Books are also fragile. Sometimes the censors do succeed in utterly destroying every last trace of a book. Always, entropy threatens all.  Active measures against these threats are required; therefore, it is appropriate that librarians fight the suppression, banning, and challenges of books.

Banned Books Week is part of that fight, and is important that folks be aware of their freedom to read what they choose – and to be aware that it is a continual struggle to protect that freedom.  Indeed, perhaps “Freedom to Read Week” better expresses the proper emphasis on preserving intellectual freedom.

But it’s not enough.

I am also haunted by the books that are not to be found in the Library of the Dreaming – because not even the shadow of their genesis crossed the mind of those who could have written them.

Because their authors were shot for having the wrong skin color.

Because their authors were cheated of an education.

Because their authors were sued into submission for daring to challenge the status quo.  Even within the profession of librarianship.

Because their authors made the decision to not pursue a profession in the certain knowledge that the people who dominated it would challenge their every step.

Because their authors were convinced that nobody would care to listen to them.

Librarianship as a profession must consider and protect both sides of intellectual freedom. Not just consumption – the freedom to read and explore – but also the freedom to write and speak.

The best way to ban a book is to ensure that it never gets written. Justice demands that we struggle against those who would not just ban books, but destroy the will of those who would write them.

Libraries, the Ada Initiative, and a challenge

I am a firm believer in the power of open source to help libraries build the tools we need to help our patrons and our communities.

Our tools focus our effort. Our effort, of course, does not spring out of thin air; it’s rooted in people.

One of the many currencies that motivates people to contribute to free and open source projects is acknowledgment.

Here are some of the women I’d like to acknowledge for their contributions, direct or indirect, to projects I have been part of. Some of them I know personally, others I admire from afar.

  • Henriette Avram – Love it or hate it, where would we be without the MARC format? For all that we’ve learned about new and better ways to manage metadata, Avram’s work at the LC started the profession’s proud tradition of sharing its metadata in electronic format.
  • Ruth Bavousett – Ruth has been involved in Koha for years and served as QA team member and translation manager. She is also one of the most courageous women I have the privilege of knowing.
  • Karen Coyle – Along with Diane Hillmann, I look to Karen for leadership in revamping our metadata practices.
  • Nicole Engard – Nicole has also been involved in Koha for years as documentation manager. Besides writing most of Koha’s manual, she is consistently helpful to new users.
  • Katrin Fischer – Katrin is Koha’s current QA manager, and has and continues to perform a very difficult job with grace and less thanks than she deserves.
  • Ruth Frasur – Ruth is director of the Hagerstown Jefferson Township Public Library in Indiana, which is a member of Evergreen Indiana. Ruth is one of the very few library administrators I know who not only understands open source, but actively contributes to some of the nitty-gritty work of keeping the software documented.
  • Diane Hillmann – Another leader in library metadata.
  • Kathy Lussier – As the Evergreen project coordinator at MassLNC, Kathy has helped to guide that consortium’s many development contributions to Evergreen.  As a participant in the project and a member of the Evergreen Oversight Board, Kathy has also supplied much-needed organizational help – and a fierce determination to help more women succeed in open source.
  • Liz Rea – Liz has been running Koha systems for years, writing patches, maintaing the project’s website, and injecting humor when most needed – a true jill of all trades.

However, there are unknowns that haunt me. Who has tried to contribute to Koha or Evergreen, only to be turned away by an knee-jerk “RTFM” or simply silence? Who might have been interested, only to rightly judge that they didn’t have time for the flack they’d get? Who never got a chance to go to a Code4Lib conference while her male colleague’s funding request got approved three years in a row?

What have we lost? How many lines of code, pages of documentation, hours of help have not gone into the tools that help us help our patrons?

The ideals of free and open source software projects are necessary, but they’re not sufficient to ensure equal access and participation.

The Ada Initiative can help. It was formed to support women in open technology and culture, and runs workshops, assists communities in setting up and enforcing codes of conduct, and promotes ensuring that women have access to positions of influence in open culture projects.

Why is the Ada Initiative’s work important to me? For many reasons, but I’ll mention three. First, because making sure that everybody who wants to work and play in the field of open technology has a real choice to do so is only fair. Second, because open source projects that are truly welcoming to women are much more likely to be welcoming to everybody – and happier, because of the effort spent on taking care of the community. Third, because I know that I don’t know everything – or all that much, really – and I need exposure to multiple points of view to be effective building tools for libraries.

Right now, folks in the library and archives communities are banding together to raise money for the Ada Initiative. I’ve donated, and I encourage others to do the same. Even better, several folks, including Bess SadlerAndromeda Yelton, Chris Bourg, and Mark Matienzo are providing matching donations up to a total of $5,120.

Go ahead, make a donation by clicking below, then come back. I’ll wait.

Donate to the Ada Initiative

Money talks – but whether any given open source community is welcoming, both of new people and of new ideas, depends on its current members.

Therefore, I would also like to extend a challenge to men (including myself — accountability matters!) working in open source software projects in libraries. It’s a simple challenge, summarized in a few words: “listen, look, lift up, and learn.”

ListenListening is hard.  A coder in a library open source project has to listen to other coders, to librarians, to users – and it is all too easy to ignore or dismiss approaches that are unfamiliar.  It can be very difficult to learn that something you’ve poured a lot of effort into may not work well for librarians – and it can be even harder to hear that your are stepping on somebody’s toes or thoughtlessly stomping on their ideas.

What to do? Pay attention to how you communicate while handling bugs and project correspondence. Do you prioritize bugs filed by men? Do you have a subtle tendency to think to yourself, “oh, she’s just not seeing the obvious thing right in front of her!” if a women asks a question on the mailing list about functionality she’s having trouble with? If so, make an effort to be even-handed.

Are you receiving criticism? Count to ten, let your hackles down, and try to look at it from your critic’s point of view.

Be careful about nitpicking.  Many a good idea has died after too much bikeshedding – and while that happens to everybody, I have a gut feeling that it’s more likely to happen if the idea is proposed by a woman.

Is a women colleague confiding in you about concerns she has with community or workplace dynamics? Listen.

Look. Look around you — around your office, the forums, the IRC channels, and Stack Exchanges you frequent. Do you mostly see men who look like yourself?  If so, do what you can to broaden your perspective and your employer’s perspective. Do you have hiring authority? Do you participate in interview panels? You can help who surrounds you.

Remember that I’m talking about library technology here — even if the 70% of the employees of the library you work for are women, if the systems department only employs men, you’re missing other points of view.

Do you have no hiring authority whatsoever? Look around the open source communities you participate in. Are there proportionally far more men participating openly than the gender ratio in librarianship as a whole?  If so, you can help change that by how you choose to participate in the community.

Lift upThis can take many forms.  In some cases, you can help lift up women in library technology by getting out of the way – in other words, by removing or not supporting barriers to participation such as sexist language on the mailing list or by calling out exclusionary behavior by other men (or yourself!).

Sometimes, you can offer active assistance – but ask first! Perhaps a woman is ready to assume a project leadership role or is ready to grow into it. Encourage her – and be ready to support her publicly. Or perhaps you may have an opportunity to mentor a student – go for it, but know that mentoring is hard work.

But note — I’m not an authority on ways to support women in technology.  One of the things that the Ada Initiative does is run Ally Skills workshops that teach simple techniques for supporting women in the workplace and online.  In fact, if you’re coming to Atlanta this October for the DLF Forum, one is being offered there.

Learn. Something I’m still learning is just the sheer amount of crap that women in technology put up with. Have you ever gotten a death threat or a rape threat for something you said online about the software industry? If you’re a guy, probably not. If you’re Anita Sarkeesian or Quinn Norton, it’s a different story entirely.

If you’re thinking to yourself that “we’re librarians, not gamers, and nobody has ever gotten a death threat during a professional dispute with the possible exception of the MARC format” – that’s not good enough. Or if you think that no librarian has ever harassed another over gender – that’s simply not true. It doesn’t take a death threat to convince a women that library technology is too hostile for her; a long string of micro-aggressions can suffice. Do you think that librarians are too progressive or simply too darn nice for harassment to be an issue? Read Ingrid Henny Abrams’ posts about the results of her survey on code of conduct violations at ALA.

This is why the Ada Initiative’s anti-harassment work is so important – and to learn more, including links to sample policies, a good starting point is their own conference policies page. (Which, by the way, was quite useful when the Evergreen Project adopted its own code of conduct). Another good starting point is the Geek Feminism wiki.

And, of course, you could do worse than to go to one of the ally skills workshops.

If you choose to take up the challenge, make a note to come back in a year and write down what you’ve learned, what you’ve listened to and seen, and how you’ve helped to lift up others. It doesn’t have to be public – though that would be nice – but the important thing is to be mindful.

Finally, don’t just take my word for it – remember that I’m not an authority on supporting women in technology. Listen to the women who are.

Update: other #libs4ada posts

Taking the ALA Statement of Appropriate Conduct seriously

After I got back from this year’s ALA Annual Conference (held in the City of It’s Just a Dry Heat), I saw some feedback regarding B.J. Novak’s presentation at the closing session, where he reportedly marred a talk that by many accounts was quite inspiring with a tired joke alluding to a sexual (and sexist) stereotype about librarians.

Let’s suppose a non-invited speaker, panel participant, or committee member had made a similar joke. Depending on the circumstances, it may or may not have constituted “unwelcome sexual attention” per the ALA Statement of Appropriate Conduct at ALA Conferences, but, regardless, it certainly would not have been in the spirit of the statement’s request that “[s]peakers … frame discussions as openly and inclusively as possible and to be aware of how language or images may be perceived by others.” Any audience member would have been entitled to call out such behavior on the spot or raise the issue with ALA conference services.

The statement of appropriate conduct is for the benefit of all participants: “… members and other attendees, speakers, exhibitors, staff and volunteers…”. It does not explicitly exclude any group from being aware of it and governing their behavior accordingly.

Where does Novak fit in? I had the following exchange with @alaannual on Twitter:

A key aspect of many of the anti-harassment policies and codes of conduct that have been adopted by conferences and conventions recently is that the policy applies to all event participants. There is no reason to expect that an invited keynote speaker or celebrity will automatically not cross lines — and there have been several incidents where conference headliners have erred (or worse).

I am disappointed that when the Statement of Appropriate Conduct was adopted in late 2013, it apparently was not accompanied by changes to conference procedures to ensure that invited speakers would be made aware of it. There’s always room for process improvement, however, so for what it’s worth, here are my suggestions to ALA for improving the implementation of the Statement of Appropriate Conduct:

  • Update procedures to ensure that all conference speakers are made aware of the Statement.
  • Update speaker agreements for invited speakers to require that they read and abide by the Statement.
  • Make available a point of contact for all speakers who can answer questions regarding the Statement and how it applies to presentations. This should not be construed as a request that ALA review the content of presentations beforehand, just that ALA provide an individual who can help speakers interpret the Statement in case of doubt.
  • Ensure that the exhibitors’ manual and the exhibitors’ portal (exhibitors.ala.org) prominently link to the Statement. This does not appear to be the case at the moment, although this may be due to the exhibitors’ portal switching over to the upcoming Midwinter.
  • If this is not already the case, ensure that exhibitor agreements incorporate the Statement.
  • Discuss the Statement periodically and adjust it based on feedback from conference attendees and emerging best practices from other conferences. Speaking of feedback, the Magpie Librarian is conducting a survey (that closes today) to gather information about code of conduct violations at past ALA conferences.

I invite feedback, either here or directly to ALA.

Duplicate holidays, and a question

One can, in fact, have too many holidays.

Koha uses the DateTime::Set Perl module when (among other things) calculating the next day the library is open. Unfortunately, the more special holidays you have in a Koha database, the more time DateTime::Set takes to initialize itself — and the time appears to grow faster than linearly with the number of holidays.

Jonathan Druart partially addressed this with his patch for bug 11112 by implementing some lazy initialization and caching for Koha::Calendar, but that doesn’t make DateTime::Set‘s constructor itself any faster.

Today I happened to be working on a Koha database that turned out to have duplicate rows in the special_holidays table. In other words, for a given library, there might be four rows all expressing that the library is closed on 15 August 2014. That database contains hundreds of duplicates, which results in an extra 1-3 seconds per circulation operation.

The duplication is not apparent in the calendar editor, alas.

So here’s my first question: has anybody else seen this in their Koha database? The following query will turn up duplicates:

And my second question: assuming that this somehow came about during normal operation of Koha (as opposed to duplicate rows getting directly loaded into the database), does anybody have any ideas how this happened?

Conservation of enthusiasm

One of the tightropes I must walk on as the current release manager for Koha is held taut by the tension between the necessity of maintaining boundaries with the code and the necessity of acknowledging that the code is not the first concern.

Boundaries matter. Not all code is equal: some is better, some is worse, none is perfect. Some code belongs in Koha. Some code belongs in Koha for lack of a better alternative at the time. Some code does not belong in Koha. Some code will stand the test of time; some code will test our time and energy for years.

The code is not primary. It is no great insight to point out that the code does not write itself; it certainly does not document itself nor pay its own way. Nor does it get to partake in that moment of fleeting joy when things just work, when the code gets out of the way of the librarian and the patron.

What is primary? People and their energy.

Enthusiasm is boundless. It has kept some folks working on Koha for years, beyond the impetus of mere paycheck or even approbation.

Enthusiasm is limited. Anybody volunteering passion for a free software project has a question to answer: is there something better to do with my time? If the answer turns into “no”… well, there are many ways in this world to contribute to happiness, personal or shared.

Caviling can be costly — possibly, beyond measure. One “RTFM” can eliminate an entire manual’s worth of help down the road.

On the other hand, the impulse to tweak, to provide feedback, to tune a new idea, can come from the best of intentions. Passion is not enough by itself; experience matters, can guide new effort.

It’s a tightrope we all walk. But the people must come first.

My meditation: what ways of interacting among ourselves conserves enthusiasm, and thereby grows it? And how do we avoid destroying it needlessly?

Koha Dev Tidbit #2: Pinpoint the difference

This morning I reviewed and pushed the patch for Koha bug 11174. The patch, by Zeno Tajoli, removes one character each from two files.

One character? That should be easy to eyeball, right?

Not quite — the character in question was part of a parameter name in a very long URL. I don’t know about you, but it can take me a while to spot such a difference.

Here is an example. Can you spot the exact difference in less than 2 seconds?

$ git diff --color

diff --git a/INSTALL b/INSTALL
index ffe69ae..e92b1a3 100644
--- a/INSTALL
+++ b/INSTALL
@@ -94,7 +94,7 @@ Use the packaged version or install from CPAN
  sudo make upgrade
 
 Koha 3.4.x or later  no longer stores items in biblio records.
-If you are upgrading from an older version ou will need to do the
+If you are upgrading from an older version you will need to do the
 following two steps, they can take a long time (several hours) to
 complete for large databases

Now imagine doing this if the change occurs in the 100th character of a line that is 150 characters long.

Fortunately, git diff, as well as other commands like git show that display diffs, accepts several switches that let you display the differences in terms of words, not lines. These switches include --word-diff and --color-words. For example:

$ git diff --color-words

diff --git a/INSTALL b/INSTALL
index ffe69ae..e92b1a3 100644
--- a/INSTALL
+++ b/INSTALL
@@ -94,7 +94,7 @@ Use the packaged version or install from CPAN
  sudo make upgrade
 
Koha 3.4.x or later  no longer stores items in biblio records.
If you are upgrading from an older version ouyou will need to do the
following two steps, they can take a long time (several hours) to
complete for large databases

The difference is much easier to see now — at least if you’re not red-green color-blind. You can change the colors or not use colors at all:

$ git diff --word-diff

diff --git a/INSTALL b/INSTALL
index ffe69ae..e92b1a3 100644
--- a/INSTALL
+++ b/INSTALL
@@ -94,7 +94,7 @@ Use the packaged version or install from CPAN
 sudo make upgrade

Koha 3.4.x or later  no longer stores items in biblio records.
If you are upgrading from an older version [-ou-]{+you+} will need to do the
following two steps, they can take a long time (several hours) to
complete for large databases

Going back to the bug I mentioned, --word-diff wasn’t quite enough, though. By default, Git considers words to be delimited by whitespace, but the patch in question removed a character from the middle of a very long URL. To make the change pop out, I had to tell Git to highlight single-character changes. One way to do this is the --word-diff-regex or by passing the regex to --color-words. Here’s the final example:

$ git diff --color-words=.

diff --git a/INSTALL b/INSTALL
index ffe69ae..e92b1a3 100644
--- a/INSTALL
+++ b/INSTALL
@@ -94,7 +94,7 @@ Use the packaged version or install from CPAN
  sudo make upgrade
 
Koha 3.4.x or later  no longer stores items in biblio records.
If you are upgrading from an older version you will need to do the
following two steps, they can take a long time (several hours) to
complete for large databases

And there we have it — the difference, pinpointed.

Notes from Code4Lib BC: gluing together an ugly string to emit RDF

This afternoon I’m sitting in the new bibliographic environment breakout session at Code4Lib BC. After taking a look at Mark Jordan’s easyLOD, I decided to play around with putting together a web service for Koha that emits RDF when fed a bib ID. Unlike Magnus Enger’s semantikoha prototype, which uses a Ruby library to convert MARC to RDF, I was trying for an approach that used only Perl (plus XS).

There were are of building blocks available. Putting them together turned out to be a tick more convoluted than I expected.

The Library of Congress has published an XSL stylesheet for converting MODS to RDF. Converting MARC(XML) to MODS is readily done using other stylesheets, also published by LC.

The path seemed clear for a quick-and-dirty prototype — make a copy of svc/bib, copy it to opac/svc/bib and take out the bits for doing updates (we’re not quite ready to make cataloging that collaborative!), and write a few lines to apply two XSLT transformations.

The code was quickly written — but it didn’t work. XML::LibXSLT, which Koha uses to handle XSLT, complained about the modsrdf.xsl stylesheet. Too new! That stylesheet is written in XSLT 2.0, but libxslt, the C library that XML::LibXSLT is based on, only supports XSLT 1.

As it turns out, Perl modules that can handle XSLT are rather thin on the ground. What I ended up doing was:

Installing XML::Saxon::XSLT2, which required…

Installing Saxon-HE, a Java XML and XSLT processor that supports XSLT 2.0, which required…

Installing Inline::Java, which required…

Installing a JDK (I happened to choose OpenJDK).

After all that (and a quick tweak to the modsrdf.xsl stylesheet, I ended up with the following code that did the trick:

This works… but is not satisfying. Making Koha require a JDK just for XSLT 2.0 support is a bit much, for one thing, and it would likely be rather slow if used in production. It’s a pity that there’s still no broad support for XSLT 2.0.

A dead end, most likely, but instructive nonetheless.

Notes from Code4Lib BC: Accessibility

Peach Arch.  Photo by Daniel Means.  Licensed under CC-BY-SA and available at http://www.flickr.com/photos/supa_pedro/389603266.

Peach Arch. Photo by Daniel Means. Licensed under CC-BY-SA and available at Flickr.

There is nothing quite like the sense of sheer glee you get when you’re waiting at the border… and have been waiting at the border for a while… and then a new customs inspection lane is opened up. Zoom!

Marlene and I left Seattle this morning to go to the Code4Lib BC conference in Vancouver. Leaving in the morning meant that we missed the lightning talks, and arrived after the breakout sessions had started. Fortunately, folks were quick to welcome us, and I soon fell into the accessibility session.

Accessibility has been on my mind lately, but it’s an area that I’m starting mostly from ground zero with. I knew that designing accessible systems is a Good Idea, I knew about the existence some of the jargon and standards, and I knew that I didn’t know much else — certainly none of the specifics.

Cynthia Ng very kindly shared some pointers with me. For example, it is helpful to know that the Section 508 guidelines is essentially a subset of WCAG 1.0. This is exactly the sort of shortcut (through an apparently intimidating forest) that an expert can effortlessly give to a newbie — and having opportunities to learn from the experts is one of the reasons why I like going to conferences.

The accessibility breakout session charged itself with putting together a list of resources and best practices for accessibility and universal design. As I mentioned above, we arrived in the middle of the breakout session time, but a couple hours was more than enough time to get initial exposure to a lot of ideas and resources. It was exhilarating.

In no particular order, here is a list of various things that I’ll be following up on:

  • The Accessibility Project
  • Guerilla testing
  • The 5 second test
  • Swim lane diagrams
  • The Paciello Group Blog
  • Be careful about putting things in the right sidebar of a three-column layout — a lot of users have been trained by web advertising to completely ignore that region.  Similarly, a graphic with moving parts can get ignored if it looks too much like an ad.
  • The Code4Lib BC accessibility group’s notes
  • Having consistency of branding and look and feel can improve usability — but that can be a challenge when integrating a lot of separate systems (particularly if a library and a vendor have different ideas about whose branding should be foremost).
  • Integrating one’s content strategy with one’s accessibility strategy.  To paraphrase a point that Cynthia made a few times, putting out too much text is a problem for any user.
  • As with so much of software design, iterate early and often. The time to start thinking about accessibility is when you’re 20% of the way through a project, not when you’re 80% done.
  • Standards can help, but only up to a point.  A website could pass an automated WCAG compliance test with flying colors but not actually be usable by anyone.

And there’s another day of conference yet!  I’m quite happy we made the drive up.

Here’s a general question to the world: what reading material do you recommend for folks like me who want to learn more about writing accessible web software?