Author Archives:

Testing Adobe Digital Editions 4.0.1, round 2

Yesterday I did some testing of version 4.0.1 of Adobe Digital Editions and verified that it is now using HTTPS when sending ebook usage data to Adobe’s server adelogs.adobe.com.

Of course, because the HTTPS protocol encrypts the datastream to that server, I couldn’t immediately verify that ADE was sending only the information that the privacy statement says it is.

Emphasis is on the word “immediately”.  If you want to find out what a program is sending via HTTPS to a remote server, there are ways to get in the middle.  Here’s how I did this for ADE:

  1. I edited the hosts file to refer “adelogs.adobe.com” to the address of a server under my control.
  2. I used the CA.pl script from openssl to create a certificate authority of my very own, then generated an SSL certificate for “adelogs.adobe.com” signed by that CA.
  3. I put the certificate for my new certificate authority into the trusted root certificates store on my Windows 7 deskstop.
  4. I put the certificate in place on my webserver and wrote a couple simple CGI scripts to emulate the ADE logging data collector and capture what got sent to them.

I then started up ADE and flipped through a few pages of an ebook purchased from Kobo.  Here’s an example of what is now getting sent by ADE (reformatted a bit for readability):

In other words, it’s sending JSON containing… I’m not sure.

The values of the various keys in that structure are obviously Base 64-encoded, but when run through a decoder, the result is just binary data, presumably the result of another layer of encryption.

Thus, we haven’t actually gotten much further towards verifying that ADE is sending only the data they claim to.  That packet of data could be describing my progress reading that book purchased from Kobo… or it could be sending something else.

That extra layer of encryption might be done as protection against a real man-in-the-middle attack targeted at Adobe’s log server — or it might be obfuscating something else.

Either way, the result remains the same: reader privacy is not guaranteed. I think Adobe is now doing things a bit better than they were when they released ADE 4.0, but I could be wrong.

If we as library workers are serious about protection patron privacy, I think we need more than assurances — we need to be able to verify things for ourselves. ADE necessarily remains in the “unverified” column for now.

Testing Adobe Digital Editions 4.0.1

A couple hours ago, I saw reports from Library Journal and The Digital Reader that Adobe has released version 4.0.1 of Adobe Digital Editions.  This was something I had been waiting for, given the revelation that ADE 4.0 had been sending ebook reading data in the clear.

ADE 4.0.1 comes with a special addendum to Adobe’s privacy statement that makes the following assertions:

  • It enumerates the types of information that it is collecting.
  • It states that information is sent via HTTPS, which means that it is encrypted.
  • It states that no information is sent to Adobe on ebooks that do not have DRM applied to them.
  • It may collect and send information about ebooks that do have DRM.

It’s good to test such claims, so I upgraded to ADE 4.0.1 on my Windows 7 machine and my OS X laptop.

First, I did a quick check of strings in the ADE program itself — and found that it contained an instance of “https://adelogs.adobe.com/” rather than “http://adelogs.adobe.com/”.  That was a good indication that ADE 4.0.1 was in fact going to use HTTPS to send ebook reading data to that server.

Next, I fired up Wireshark and started ADE.  Each time it started, it contacted a server called adeactivate.adobe.com, presumably to verify that the DRM authorization was in good shape.  I then opened and flipped through several ebooks that were already present in the ADE library, including one DRM ebook I had checked out from my local library.

So far, it didn’t send anything to adelogs.adobe.com.  I then checked out another DRM ebook from the library (in this case, Seattle Public Library and its OverDrive subscription) and flipped through it.  As it happens, it still didn’t send anything to Adobe’s logging server.

Finally, I used ADE to fulfill a DRM ePub download from Kobo.  This time, after flipping through the book, it did send data to the logging server.  I can confirm that it was sent using HTTPS, meaning that the contents of the message were encrypted.

To sum up, ADE 4.0.1’s behavior is consistent with Adobe’s claims – the data is no longer sent in the clear and a message was sent to the logging server only when I opened a new commercial DRM ePub.  However, without decrypting the contents of that message, I cannot verify that it only information about that ebook from Kobo.

But even then… why should Adobe be logging that information about the Kobo book? I’m not aware that Kobo is doing anything fancy that requires knowledge of how many pages I read from a book I purchased from them but did not open in the Kobo native app.  Have they actually asked Adobe to collect that information for them?

Another open question: why did opening the library ebook in ADE not trigger a message to the logging server?  Is it because the fulfillmentType specified in the .acsm file was “loan” rather than “buy”? More clarity on exactly when ADE sends reading progress to its logging server would be good.

Finally, if we take the privacy statement at its word, ADE is not implementing a page synchronization feature as some, including myself, have speculated – at least not yet.  Instead, Adobe is gathering this data to “share anonymous aggregated information with eBook providers to enable billing under the applicable pricing model”.  However, another sentence in the statement is… interesting:

While some publishers and distributors may charge libraries and resellers for 30 days from the date of the download, others may follow a metered pricing model and charge them for the actual time you read the eBook.

In other words, if any libraries are using an ebook lending service that does have such a metered pricing model, and if ADE is sending reading progress information to an Adobe server for such ebooks, that seems like a violation of reader privacy. Even though the data is now encrypted, if an Adobe ID is used to authorize ADE, Adobe itself has personally identifying information about the library patron and what they’re reading.

Adobe appears to have closed a hole – but there are still important questions left open. Librarians need to continue pushing on this.

Tips and tricks for leaking patron information

Here is a partial list of various ways I can think of to expose information about library patrons and their search and reading history by use (and misuse) of software used or recommended by libraries.

  • Send a patron’s ebook reading history to a commercial website…
    • … in the clear, for anybody to intercept.
  • Send patron information to a third party…
    • … that does not have an adequate privacy policy.
    • … that has an adequate privacy policy but does not implement it well.
    • … that is sufficiently remote that libraries lack any leverage to punish it for egregious mishandling of patron data.
  • Use an unencrypted protocol to enable a third-party service provider to authenticate patrons or look them up…
    • … such as SIP2.
    • … such as SIP2, with the patron information response message configured to include full contact information for the patron.
    • … or many configurations of NCIP.
    • … or web services accessible over HTTP (as opposed to HTTPS).
  • Store patron PINs and passwords without encryption…
    • … or using weak hashing.
  • Store the patron’s Social Security Number in the ILS patron record.
  • Don’t require HTTPS for a patron to access her account with the library…
    • … or if you do, don’t keep up to date with the various SSL and TLS flaws announced over the years.
  • Make session cookies used by your ILS or discovery layer easy to snoop.
  • Use HTTP at all in your ILS or discovery layer – as oddly enough, many patrons will borrow the items that they search for.
  • Send an unencrypted email…
    • … containing a patron’s checkouts today (i.e., an email checkout receipt).
    • … reminding a patron of his overdue books – and listing them.
    • … listing the titles of the patron’s available hold requests.
  • Don’t encrypt connections between an ILS client program and its application server.
  • Don’t encrypt connections between an ILS application server and its database server.
  • Don’t notice that a rootkit has been running on your ILS server for the past six months.
  • Don’t notice that a keylogger has been running on one of your circulation PCs for the past three months.
  • Fail to keep up with installing operating system security patches.
  • Use the same password for the circulator account used by twenty circulation staff (and 50 former circulation staff) – and never change it.
  • Don’t encrypt your backups.
  • Don’t use the feature in your ILS to enable severing the link between the record of a past loan and the specific patron who took the item out…
    • … sever the links, but retain database backups for months or years.
  • Don’t give your patrons the ability to opt out of keeping track of their past loans.
  • Don’t give your patrons the ability to opt in to keeping track of their past loans.
  • Don’t give the patron any control or ability to completely sever the link between her record and her past circulation history whenever she chooses to.
  • When a patron calls up asking “what books do I have checked out?” … answer the question without verifying that the patron is actually who she says she is.
  • When a parent calls up asking “what books does my teenager have checked out?”… answer the question.
  • Set up your ILS to print out hold slips… that include the full name of the patron. For bonus points, do this while maintaining an open holds shelf.
  • Don’t shred any circulation receipts that patrons leave behind.
  • Don’t train your non-MLS staff on the importance of keeping patron information confidential.
  • Don’t give your MLS staff refreshers on professional ethics.
  • Don’t shut down library staff gossiping about a patron’s reading preferences.
  • Don’t immediately sack a library staff member caught misusing confidential patron information.
  • Have your ILS or discovery interface hosted by a service provider that makes one or more of the mistakes listed above.
  • Join a committee writing a technical standard for library software… and don’t insist that it take patron privacy into account.

Do you have any additions to the list? Please let me know!

Of course, I am not actually advocating disclosing confidential information. Stay tuned for a follow-up post.

Verifying our tools; a role for ALA?

It came to light on Monday that the latest version of Adobe Digital Editions is sending metadata on ebooks that are read through the application to an Adobe server — in clear text.

I’ve personally verified the claim that this is happening, as have lots of other people. I particularly like Andromeda Yelton’s screencast, as it shows some of the steps that others can take to see this for themselves.

In particular, it looks like any ebook that has been opened in Digital Editions or added to a “library” there gets reported on. The original report by Nate Hofffelder at The Digital Reader also said that ebook that were not known to Digital Editions were being reported, though I and others haven’t seen that — but at the moment, since nobody is saying that they’ve decompiled the program and analyzed exactly when Digital Editions sends its reports, it’s possible that Nate simply fell into a rare execution pathUPDATE 10 October 2014: Yesterday I was able to confirm that if an ereader device is attached to a PC and is recognized by ADE, metadata from the books on that device can also be sent in the clear.

This move by Adobe, whether or not they’re permanently storing the ebook reading history, and whether or not they think they have good intentions, is bad for a number of reasons:

  • By sending the information in the clear, anybody can intercept it and choose to act on somebody’s choice of reading material.  This applies to governments, corporations, and unenlightened but technically adept parents.  And as far as state actors are concerned – it actually doesn’t matter that Digital Editions isn’t sending information like name and email addresses in the clear; the user’s IP address and the unique ID assigned by Digital Editions will often be sufficient for somebody to, with effort, link a reading history to an individual.
  • The release notes from Adobe gave no hint that Digital Editions was going to start doing this. While Amazon’s Kindle platform also keeps track of reading history, at least Amazon has been relatively forthright about it.
  • The privacy policy and license agreement similarly did not explicitly mention this. There has been some discussion to the effect that if one looks at those documents closely enough, that there is an implied suggestion that Adobe can capture and log anything one chooses to do with their software. But even if that’s the case – and I’m not sure that this argument would fly in countries with stronger data privacy protection than the U.S. – sending this information in the clear is completely inconsistent with modern security practices.
  • Digital Editions is part of the toolchain that a number of library ebook lending platforms use.

The last point is key. Everybody should be concerned about an app that spouts reading history in the clear, but librarians in particular have a professional responsibility to protect our user’s reading history.

What does it mean in the here and now? Some specific immediate steps I suggest for libraries is to:

  • Publicize the problem to their patrons.
  • Officially warn their patrons against using Digital Editions 4.0, and point to work arounds like pointing “adelogs.adobe.com” to “127.0.0.1” in hosts files.
  • If they must use Digital Editions to borrow ebooks, to recommend the use of earlier versions, which do not appear to be spying on users.

However, there are things that also need to be done in the long term.

Accepting DRM has been a terrible dilemma for libraries – enabling and supporting, no matter how passively, tools for limiting access to information flies against our professional values.  On the other hand, without some degree of acquiescence to it, libraries would be even more limited in their ability to offer current books to their patrons.

But as the Electronic Frontier Foundation points out,  DRM as practiced today is fundamentally inimical to privacy. If, following Andromeda Yelton’s post this morning, we value our professional soul, something has to give.

In other words, we have to have a serious discussion about whether we can responsibly support any level of DRM in the ebooks that we offer to our patrons.

But there’s a more immediate step that we can take. This whole thing came to light because a “hacker acquaintance” of Nate’s decided to see what Digital Editions is sending home. And a key point? Once the testing starting, it probably didn’t take that hacker more than half an hour to see what was going on, and it may well have taken only five.

While the library profession probably doesn’t count very many professional security researchers among its ranks, this sort of testing is not black magic.  Lots of systems librarians, sysadmins, and developers working for libraries already know how to use tcpdump and Wireshark and the like.

So what do we need to do? We need to stop blindly trusting our tools.  We need to be suspicious, in other words, and put anything that we would recommend to our patrons to the test to verify that it is not leaking patron information.

This is where organizations like ALA can play an important role.  Some things that ALA could do include:

  • Establishing a clearinghouse for reports of security and privacy violations in library software.
  • Distribute information on ways to perform security audits.
  • Do testing of library software in house and hire security researches as needed.
  • Provide institutional and legal support for these efforts.

That last point is key, and is why I’m calling on ALA in particular. There have been plenty of cases where software vendors have sued, or threatened to sue, folks who have pointed out security flaws. Rather than permitting that sort of chilling effect to be tolerated in the realm of library software, ALA can provide cover for individuals and libraries engaged in the testing that is necessary to protect our users.

Banned books and the library of Morpheus

A notion that haunts me is found in Neil Gaiman’s The Sandman: the library of the Dreaming, wherein can be found books that no earth-bound librarian can collect.  Books that caught existence only in the dreams – or passing thoughts – of their authors. The Great American Novel. Every Great American Novel, by all of the frustrated middle managers, farmers, and factory workers who had their heart attack too soon. Every Great Nepalese Novel.  The conclusion of the Wheel of Time, as written by Robert Jordan himself.

That library has a section containing every book whose physical embodiment was stolen.  All of the poems of Sappho. Every Mayan and Olmec text – including the ones that, in the real world, did not survive the fires of the invaders.

Books can be like cockroaches. Text thought long-lost can turn up unexpectedly, sometimes just by virtue of having been left lying around until someone things to take a closer look. It is not an impossible hope that one day, another Mayan codex may make its reappearance, thumbing its nose at the colonizers and censors who despised it and the culture and people it came from.

Books are also fragile. Sometimes the censors do succeed in utterly destroying every last trace of a book. Always, entropy threatens all.  Active measures against these threats are required; therefore, it is appropriate that librarians fight the suppression, banning, and challenges of books.

Banned Books Week is part of that fight, and is important that folks be aware of their freedom to read what they choose – and to be aware that it is a continual struggle to protect that freedom.  Indeed, perhaps “Freedom to Read Week” better expresses the proper emphasis on preserving intellectual freedom.

But it’s not enough.

I am also haunted by the books that are not to be found in the Library of the Dreaming – because not even the shadow of their genesis crossed the mind of those who could have written them.

Because their authors were shot for having the wrong skin color.

Because their authors were cheated of an education.

Because their authors were sued into submission for daring to challenge the status quo.  Even within the profession of librarianship.

Because their authors made the decision to not pursue a profession in the certain knowledge that the people who dominated it would challenge their every step.

Because their authors were convinced that nobody would care to listen to them.

Librarianship as a profession must consider and protect both sides of intellectual freedom. Not just consumption – the freedom to read and explore – but also the freedom to write and speak.

The best way to ban a book is to ensure that it never gets written. Justice demands that we struggle against those who would not just ban books, but destroy the will of those who would write them.

Libraries, the Ada Initiative, and a challenge

I am a firm believer in the power of open source to help libraries build the tools we need to help our patrons and our communities.

Our tools focus our effort. Our effort, of course, does not spring out of thin air; it’s rooted in people.

One of the many currencies that motivates people to contribute to free and open source projects is acknowledgment.

Here are some of the women I’d like to acknowledge for their contributions, direct or indirect, to projects I have been part of. Some of them I know personally, others I admire from afar.

  • Henriette Avram – Love it or hate it, where would we be without the MARC format? For all that we’ve learned about new and better ways to manage metadata, Avram’s work at the LC started the profession’s proud tradition of sharing its metadata in electronic format.
  • Ruth Bavousett – Ruth has been involved in Koha for years and served as QA team member and translation manager. She is also one of the most courageous women I have the privilege of knowing.
  • Karen Coyle – Along with Diane Hillmann, I look to Karen for leadership in revamping our metadata practices.
  • Nicole Engard – Nicole has also been involved in Koha for years as documentation manager. Besides writing most of Koha’s manual, she is consistently helpful to new users.
  • Katrin Fischer – Katrin is Koha’s current QA manager, and has and continues to perform a very difficult job with grace and less thanks than she deserves.
  • Ruth Frasur – Ruth is director of the Hagerstown Jefferson Township Public Library in Indiana, which is a member of Evergreen Indiana. Ruth is one of the very few library administrators I know who not only understands open source, but actively contributes to some of the nitty-gritty work of keeping the software documented.
  • Diane Hillmann – Another leader in library metadata.
  • Kathy Lussier – As the Evergreen project coordinator at MassLNC, Kathy has helped to guide that consortium’s many development contributions to Evergreen.  As a participant in the project and a member of the Evergreen Oversight Board, Kathy has also supplied much-needed organizational help – and a fierce determination to help more women succeed in open source.
  • Liz Rea – Liz has been running Koha systems for years, writing patches, maintaing the project’s website, and injecting humor when most needed – a true jill of all trades.

However, there are unknowns that haunt me. Who has tried to contribute to Koha or Evergreen, only to be turned away by an knee-jerk “RTFM” or simply silence? Who might have been interested, only to rightly judge that they didn’t have time for the flack they’d get? Who never got a chance to go to a Code4Lib conference while her male colleague’s funding request got approved three years in a row?

What have we lost? How many lines of code, pages of documentation, hours of help have not gone into the tools that help us help our patrons?

The ideals of free and open source software projects are necessary, but they’re not sufficient to ensure equal access and participation.

The Ada Initiative can help. It was formed to support women in open technology and culture, and runs workshops, assists communities in setting up and enforcing codes of conduct, and promotes ensuring that women have access to positions of influence in open culture projects.

Why is the Ada Initiative’s work important to me? For many reasons, but I’ll mention three. First, because making sure that everybody who wants to work and play in the field of open technology has a real choice to do so is only fair. Second, because open source projects that are truly welcoming to women are much more likely to be welcoming to everybody – and happier, because of the effort spent on taking care of the community. Third, because I know that I don’t know everything – or all that much, really – and I need exposure to multiple points of view to be effective building tools for libraries.

Right now, folks in the library and archives communities are banding together to raise money for the Ada Initiative. I’ve donated, and I encourage others to do the same. Even better, several folks, including Bess SadlerAndromeda Yelton, Chris Bourg, and Mark Matienzo are providing matching donations up to a total of $5,120.

Go ahead, make a donation by clicking below, then come back. I’ll wait.

Donate to the Ada Initiative

Money talks – but whether any given open source community is welcoming, both of new people and of new ideas, depends on its current members.

Therefore, I would also like to extend a challenge to men (including myself — accountability matters!) working in open source software projects in libraries. It’s a simple challenge, summarized in a few words: “listen, look, lift up, and learn.”

ListenListening is hard.  A coder in a library open source project has to listen to other coders, to librarians, to users – and it is all too easy to ignore or dismiss approaches that are unfamiliar.  It can be very difficult to learn that something you’ve poured a lot of effort into may not work well for librarians – and it can be even harder to hear that your are stepping on somebody’s toes or thoughtlessly stomping on their ideas.

What to do? Pay attention to how you communicate while handling bugs and project correspondence. Do you prioritize bugs filed by men? Do you have a subtle tendency to think to yourself, “oh, she’s just not seeing the obvious thing right in front of her!” if a women asks a question on the mailing list about functionality she’s having trouble with? If so, make an effort to be even-handed.

Are you receiving criticism? Count to ten, let your hackles down, and try to look at it from your critic’s point of view.

Be careful about nitpicking.  Many a good idea has died after too much bikeshedding – and while that happens to everybody, I have a gut feeling that it’s more likely to happen if the idea is proposed by a woman.

Is a women colleague confiding in you about concerns she has with community or workplace dynamics? Listen.

Look. Look around you — around your office, the forums, the IRC channels, and Stack Exchanges you frequent. Do you mostly see men who look like yourself?  If so, do what you can to broaden your perspective and your employer’s perspective. Do you have hiring authority? Do you participate in interview panels? You can help who surrounds you.

Remember that I’m talking about library technology here — even if the 70% of the employees of the library you work for are women, if the systems department only employs men, you’re missing other points of view.

Do you have no hiring authority whatsoever? Look around the open source communities you participate in. Are there proportionally far more men participating openly than the gender ratio in librarianship as a whole?  If so, you can help change that by how you choose to participate in the community.

Lift upThis can take many forms.  In some cases, you can help lift up women in library technology by getting out of the way – in other words, by removing or not supporting barriers to participation such as sexist language on the mailing list or by calling out exclusionary behavior by other men (or yourself!).

Sometimes, you can offer active assistance – but ask first! Perhaps a woman is ready to assume a project leadership role or is ready to grow into it. Encourage her – and be ready to support her publicly. Or perhaps you may have an opportunity to mentor a student – go for it, but know that mentoring is hard work.

But note — I’m not an authority on ways to support women in technology.  One of the things that the Ada Initiative does is run Ally Skills workshops that teach simple techniques for supporting women in the workplace and online.  In fact, if you’re coming to Atlanta this October for the DLF Forum, one is being offered there.

Learn. Something I’m still learning is just the sheer amount of crap that women in technology put up with. Have you ever gotten a death threat or a rape threat for something you said online about the software industry? If you’re a guy, probably not. If you’re Anita Sarkeesian or Quinn Norton, it’s a different story entirely.

If you’re thinking to yourself that “we’re librarians, not gamers, and nobody has ever gotten a death threat during a professional dispute with the possible exception of the MARC format” – that’s not good enough. Or if you think that no librarian has ever harassed another over gender – that’s simply not true. It doesn’t take a death threat to convince a women that library technology is too hostile for her; a long string of micro-aggressions can suffice. Do you think that librarians are too progressive or simply too darn nice for harassment to be an issue? Read Ingrid Henny Abrams’ posts about the results of her survey on code of conduct violations at ALA.

This is why the Ada Initiative’s anti-harassment work is so important – and to learn more, including links to sample policies, a good starting point is their own conference policies page. (Which, by the way, was quite useful when the Evergreen Project adopted its own code of conduct). Another good starting point is the Geek Feminism wiki.

And, of course, you could do worse than to go to one of the ally skills workshops.

If you choose to take up the challenge, make a note to come back in a year and write down what you’ve learned, what you’ve listened to and seen, and how you’ve helped to lift up others. It doesn’t have to be public – though that would be nice – but the important thing is to be mindful.

Finally, don’t just take my word for it – remember that I’m not an authority on supporting women in technology. Listen to the women who are.

Update: other #libs4ada posts

Taking the ALA Statement of Appropriate Conduct seriously

After I got back from this year’s ALA Annual Conference (held in the City of It’s Just a Dry Heat), I saw some feedback regarding B.J. Novak’s presentation at the closing session, where he reportedly marred a talk that by many accounts was quite inspiring with a tired joke alluding to a sexual (and sexist) stereotype about librarians.

Let’s suppose a non-invited speaker, panel participant, or committee member had made a similar joke. Depending on the circumstances, it may or may not have constituted “unwelcome sexual attention” per the ALA Statement of Appropriate Conduct at ALA Conferences, but, regardless, it certainly would not have been in the spirit of the statement’s request that “[s]peakers … frame discussions as openly and inclusively as possible and to be aware of how language or images may be perceived by others.” Any audience member would have been entitled to call out such behavior on the spot or raise the issue with ALA conference services.

The statement of appropriate conduct is for the benefit of all participants: “… members and other attendees, speakers, exhibitors, staff and volunteers…”. It does not explicitly exclude any group from being aware of it and governing their behavior accordingly.

Where does Novak fit in? I had the following exchange with @alaannual on Twitter:

A key aspect of many of the anti-harassment policies and codes of conduct that have been adopted by conferences and conventions recently is that the policy applies to all event participants. There is no reason to expect that an invited keynote speaker or celebrity will automatically not cross lines — and there have been several incidents where conference headliners have erred (or worse).

I am disappointed that when the Statement of Appropriate Conduct was adopted in late 2013, it apparently was not accompanied by changes to conference procedures to ensure that invited speakers would be made aware of it. There’s always room for process improvement, however, so for what it’s worth, here are my suggestions to ALA for improving the implementation of the Statement of Appropriate Conduct:

  • Update procedures to ensure that all conference speakers are made aware of the Statement.
  • Update speaker agreements for invited speakers to require that they read and abide by the Statement.
  • Make available a point of contact for all speakers who can answer questions regarding the Statement and how it applies to presentations. This should not be construed as a request that ALA review the content of presentations beforehand, just that ALA provide an individual who can help speakers interpret the Statement in case of doubt.
  • Ensure that the exhibitors’ manual and the exhibitors’ portal (exhibitors.ala.org) prominently link to the Statement. This does not appear to be the case at the moment, although this may be due to the exhibitors’ portal switching over to the upcoming Midwinter.
  • If this is not already the case, ensure that exhibitor agreements incorporate the Statement.
  • Discuss the Statement periodically and adjust it based on feedback from conference attendees and emerging best practices from other conferences. Speaking of feedback, the Magpie Librarian is conducting a survey (that closes today) to gather information about code of conduct violations at past ALA conferences.

I invite feedback, either here or directly to ALA.

Duplicate holidays, and a question

One can, in fact, have too many holidays.

Koha uses the DateTime::Set Perl module when (among other things) calculating the next day the library is open. Unfortunately, the more special holidays you have in a Koha database, the more time DateTime::Set takes to initialize itself — and the time appears to grow faster than linearly with the number of holidays.

Jonathan Druart partially addressed this with his patch for bug 11112 by implementing some lazy initialization and caching for Koha::Calendar, but that doesn’t make DateTime::Set‘s constructor itself any faster.

Today I happened to be working on a Koha database that turned out to have duplicate rows in the special_holidays table. In other words, for a given library, there might be four rows all expressing that the library is closed on 15 August 2014. That database contains hundreds of duplicates, which results in an extra 1-3 seconds per circulation operation.

The duplication is not apparent in the calendar editor, alas.

So here’s my first question: has anybody else seen this in their Koha database? The following query will turn up duplicates:

And my second question: assuming that this somehow came about during normal operation of Koha (as opposed to duplicate rows getting directly loaded into the database), does anybody have any ideas how this happened?

Conservation of enthusiasm

One of the tightropes I must walk on as the current release manager for Koha is held taut by the tension between the necessity of maintaining boundaries with the code and the necessity of acknowledging that the code is not the first concern.

Boundaries matter. Not all code is equal: some is better, some is worse, none is perfect. Some code belongs in Koha. Some code belongs in Koha for lack of a better alternative at the time. Some code does not belong in Koha. Some code will stand the test of time; some code will test our time and energy for years.

The code is not primary. It is no great insight to point out that the code does not write itself; it certainly does not document itself nor pay its own way. Nor does it get to partake in that moment of fleeting joy when things just work, when the code gets out of the way of the librarian and the patron.

What is primary? People and their energy.

Enthusiasm is boundless. It has kept some folks working on Koha for years, beyond the impetus of mere paycheck or even approbation.

Enthusiasm is limited. Anybody volunteering passion for a free software project has a question to answer: is there something better to do with my time? If the answer turns into “no”… well, there are many ways in this world to contribute to happiness, personal or shared.

Caviling can be costly — possibly, beyond measure. One “RTFM” can eliminate an entire manual’s worth of help down the road.

On the other hand, the impulse to tweak, to provide feedback, to tune a new idea, can come from the best of intentions. Passion is not enough by itself; experience matters, can guide new effort.

It’s a tightrope we all walk. But the people must come first.

My meditation: what ways of interacting among ourselves conserves enthusiasm, and thereby grows it? And how do we avoid destroying it needlessly?

Koha Dev Tidbit #2: Pinpoint the difference

This morning I reviewed and pushed the patch for Koha bug 11174. The patch, by Zeno Tajoli, removes one character each from two files.

One character? That should be easy to eyeball, right?

Not quite — the character in question was part of a parameter name in a very long URL. I don’t know about you, but it can take me a while to spot such a difference.

Here is an example. Can you spot the exact difference in less than 2 seconds?

$ git diff --color

diff --git a/INSTALL b/INSTALL
index ffe69ae..e92b1a3 100644
--- a/INSTALL
+++ b/INSTALL
@@ -94,7 +94,7 @@ Use the packaged version or install from CPAN
  sudo make upgrade
 
 Koha 3.4.x or later  no longer stores items in biblio records.
-If you are upgrading from an older version ou will need to do the
+If you are upgrading from an older version you will need to do the
 following two steps, they can take a long time (several hours) to
 complete for large databases

Now imagine doing this if the change occurs in the 100th character of a line that is 150 characters long.

Fortunately, git diff, as well as other commands like git show that display diffs, accepts several switches that let you display the differences in terms of words, not lines. These switches include --word-diff and --color-words. For example:

$ git diff --color-words

diff --git a/INSTALL b/INSTALL
index ffe69ae..e92b1a3 100644
--- a/INSTALL
+++ b/INSTALL
@@ -94,7 +94,7 @@ Use the packaged version or install from CPAN
  sudo make upgrade
 
Koha 3.4.x or later  no longer stores items in biblio records.
If you are upgrading from an older version ouyou will need to do the
following two steps, they can take a long time (several hours) to
complete for large databases

The difference is much easier to see now — at least if you’re not red-green color-blind. You can change the colors or not use colors at all:

$ git diff --word-diff

diff --git a/INSTALL b/INSTALL
index ffe69ae..e92b1a3 100644
--- a/INSTALL
+++ b/INSTALL
@@ -94,7 +94,7 @@ Use the packaged version or install from CPAN
 sudo make upgrade

Koha 3.4.x or later  no longer stores items in biblio records.
If you are upgrading from an older version [-ou-]{+you+} will need to do the
following two steps, they can take a long time (several hours) to
complete for large databases

Going back to the bug I mentioned, --word-diff wasn’t quite enough, though. By default, Git considers words to be delimited by whitespace, but the patch in question removed a character from the middle of a very long URL. To make the change pop out, I had to tell Git to highlight single-character changes. One way to do this is the --word-diff-regex or by passing the regex to --color-words. Here’s the final example:

$ git diff --color-words=.

diff --git a/INSTALL b/INSTALL
index ffe69ae..e92b1a3 100644
--- a/INSTALL
+++ b/INSTALL
@@ -94,7 +94,7 @@ Use the packaged version or install from CPAN
  sudo make upgrade
 
Koha 3.4.x or later  no longer stores items in biblio records.
If you are upgrading from an older version you will need to do the
following two steps, they can take a long time (several hours) to
complete for large databases

And there we have it — the difference, pinpointed.