Author Archives: Galen Charlton

How to build an evil library catalog

Consider a catalog for a small public library that features a way to sort search results by popularity. There are several ways to measure “popularity” of a book: circulations, hold requests, click-throughs in the catalog, downloads, patron-supplied ratings, place on bestseller lists, and so forth.

But let’s do a little thought experiment: let’s use a random number generator to calculate popularity.

However, the results will need to be plausible. It won’t do to have the catalog assert that the latest J.D. Robb book is gathering dust in the stacks. Conversely, the copy of 1959 edition of The geology and paleontology of the Elk Mountain and Tabernacle Butte area, Wyoming that was given to the library right after the last weeding is never going to be a doorbuster.

So let’s be clever and ensure that the 500 most circulated titles in the collection retain their expected popularity rating. Let’s also leave books that have never circulated alone in their dark corners, as well as those that have no cover images available. The rest, we leave to the tender mercies of the RNG.

What will happen? If patrons use the catalog’s popularity rankings, if they trust them — or at least are more likely to look at whatever shows up near the top of search results — we might expect that the titles with an artificial bump from the random number generator will circulate just a bit more often.

Of course, testing that hypothesis by letting a RNG skew search results in a real library catalog would be unethical.

But if one were clever enough to be subtle in one’s use of the RNG, the patrons would have a hard time figuring out that something was amiss.  From the user’s point of view, a sufficiently advanced search engine is indistinguishable from a black box.

This suggests some interesting possibilities for the Evil Librarian of Evil:

  • Some manual tweaks: after all, everybody really ought to read $BESTBOOK. (We won’t mention that it was written by the ELE’s nephew.)
  • Automatic personalization of search results. Does geolocation show that the patron’s IP address is on the wrong side of the tracks? Titles with a lower reading level just got more popular!
  • Has the patron logged in to the catalog? Personalization just got better! Let’s check the patron’s gender and tune accordingly!

Don’t be the ELE.

But as you work to improve library catalogs… take care not to become the ELE by accident.

A small thought on library and tech unions in light of a lockout

I’ve never been a member of a union. Computer programmers — and IT workers in general — in the U.S. are mostly unorganized. Not only that, they tend to resist unions, even though banding together would be a good idea.

It’s not necessarily a matter of pay, at least not at the moment: many IT workers have decent to excellent salaries. Of course not all do, and there are an increasing number of IT job categories that are becoming commoditized. Working conditions at a lot of IT shops are another matter: the very long hours that many programmers and sysadmins work are not healthy, but it can be very hard to be first person in the office to leave at a reasonable quitting time day.

There are other reasons to be part of a union as an IT worker. Consider one of the points in the ACM code of ethics: “Respect the privacy of others.” Do you have a qualm about writing a web tracker? It can be hard to push back all by yourself against a management imperative to do so. A union can provide power and cover: what you can’t resist singly, a union might help forestall.

The various library software firms I’ve worked for have not been exceptions: no unions. At the moment, I’m also distinctly on the management side of the table.

Assuming good health, I can reasonably expect to spend another few decades working, and may well switch from management to labor and back again — IT work is squishy like that. Either way, I’ll benefit from the work — and blood, and lives — of union workers and organizers past and future. (Hello, upcoming weekend! You are literally the least of the good things that unions have given me!)

I may well find myself (or more likely, people representing me) bargaining hard with or against a union. And that’s fine.

However, if I find myself sitting, figuratively or literally, on the management side of a negotiation table, I hope that I never lose sight of this: the union has a right to exist.

Unfortunately, the U.S. has a long history of management and owners rejecting that premise, and doing their level best to break unions or prevent them from forming.

The Long Island University Faculty Federation, which represents the full time and adjunct faculty at the Brooklyn campus of LIU, holds a distinction: it was the first union to negotiate a collective bargaining agreement for faculty at a private university in the U.S.

Forty-four years later, the administration of LIU Brooklyn seems determined to break LIUFF, and have locked out the faculty. Worse, LIU has elected not to continue the health insurance of the LIUFF members. I have only one word for that tactic: it is an obscenity.

As an aside, this came to my attention last week largely because I follow LIU librarian and LIUFF secretary Emily Drabinski on Twitter. If you want to know what’s going on with the lockout, follow her blog and Twitter account as well as the #LIUlockout hashtag.

I don’t pretend that I have a full command of all of the issues under discussion between the university and the union, but I’ve read enough to be rather dubious that the university is presently acting in good faith. There’s plenty of precedent for university faculty unions to work without contracts while negotiations continue; LIU could do the same.

Remember, the union has a right to exist. Applies to LIUFF, to libraries, and hopefully in time, to more IT shops.

If you agree with me that lockouts are wrong, please consider joining me in donating to the solidarity fund for the benefit of LIUFF members run the by American Federation of Teachers.

The blossoming of the Mellie-cat

A cat who has decided to take up more space in the world.

A cat who has decided to take up more space in the world.

Sixteen years is long enough, surely, to get to know a cat.

Nope.

Amelia had always been her mother’s child. She had father and sister too, but LaZorra was the one Mellie always cuddled up to and followed around. Humans were of dubious purpose, save for our feet: from the scent we trod back home Mellie seemed to learn all she needed of the outside world.

Her father, Erasmus, left us several years ago; while Mellie’s sister mourned, I’m not sure Rasi’s absence made much of an impression on our clown princess — after all, LaZorra remained, to provide orders and guidance and a mattress.

Where Zorri went, Mellie followed — and thus a cat who had little use for humans slept on our bed anyway.

Recently, we lost both LaZorra and Sophia, and we were afraid: afraid that Amelia’s world would close in on her. We were afraid that she would become a lost cat, waiting alone for comfort that would never return.

The first couple days after LaZorra’s passing seemed to bear our fears out. Amelia kept to her routine and food, but was isolated. Then, some things became evident.

Our bed was, in fact, hers. Hers to stretch out in, space for my legs be damned.

Our feet turned out not to suffice; our hands were required too. For that matter, for the first time in her life, she started letting us brush her.

And she enjoyed it!

Then she decided that we needed correction — so she began vocalizing, loudly and often.

And now we have a cat anew: talkative and demanding of our time and attention, confident in our love.

Sixteen years is not long enough to get to know a cat.

Visualizing the global distribution of Koha installations from Debian packages

A picture is worth a thousand words:

Downloads of Koha Debian packages in past 52 weeks

Click to get larger image.

This represents the approximate geographic distribution of downloads of the Koha Debian packages over the past year. Data was taken from the Apache logs from debian.koha-community.org, which MPOW hosts. I counted only completed downloads of the koha-common package, of which there were over 25,000.

Making the map turned out to be an opportunity for me to learn some Python. I first adapted a Python script I found on Stack Overflow to query freegeoip.net and get the latitude and longitude corresponding to each of the 9,432 distinct IP addresses that had downloaded the package.

I then fed the results to OpenHeatMap. While that service is easy to use and is written with GPL3 code, I didn’t quite like the fact that the result is delivered via an Adobe Flash embed.  Consequently, I turned my attention to Plotly, and after some work, was able to write a Python script that does the following:

  1. Fetch the CSV file containing the coordinates and number of downloads.
  2. Exclude as outliers rows where a given IP address made more than 100 downloads of the package during the past year — there were seven of these.
  3. Truncate the latitude and longitude to one decimal place — we need not pester corn farmers in Kansas for bugfixes.
  4. Submit the dataset to Plotly with which to generate a bubble map.

Here’s the code:

An interactive version of the bubble map is also available on Plotly.

Cats who reside in story

The tragedy of keeping house with cats is that their lives are so short in comparison to our own.

On Friday, Marlene and I put Sophie to rest; today, LaZorra. Four years ago, we lost Erasmus; before that, Scheherazade and Jennyfur. At the moment, we have just one, Amelia. It was a relief that she got a clean bill of health on Saturday… but she is nonetheless sixteen years old. The inexorability of time weighs heavily on me today.

I have no belief that there is any continuation of thought or spirt or soul after the cessation of life; the only persistence I know of for our cats is in the realm of story. And it is not enough: I am not good enough with words to capture and pin down the moment of a cat sleeping and purring on my chest or how the limbs of our little feline family would knot and jumble together.

Words are not nothing, however, so I shall share some stories about the latest to depart.

2016-03-11 07.54.55

LaZorra was named after the white “Z” on her back, as if some bravo had decided to mark her before she entered this world. LaZorra was a cat of great brain, while her brother Erasmus was not. We would joke that LaZorra had claimed not only her brain cells, but those of her daughters Sophia and Amelia. (Who were also Erasmsus’ children; suffice it to say that I thought I had more time to spay LaZorra than was actually the case).

Although she was a young mother, LaZorra was a good one. Scheherazade was alive at the time and also proved to be a good auntie-cat.

Very early on, a pattern was set: Sophie would cuddle with her father Rasi; Mellie with her mother Zorrie. LaZorra would cuddle with me; as would Erasmus; per the transitive property, I ended up squished.

But really, it took only one cat to train me. For a while LaZorra had a toy that she would drag to me when she wanted me to play with her. I always did; morning, afternoon, evening, at 2 in the morning…

“NO!”

Well, that was Marlene reminding me that once I taught a cat that I could be trained to play with her at two a.m. that there would be no end of it—nor any rest for us—so I did not end up being perfectly accommodating.

But I came close. LaZorra knew that she was due love and affection; that her remit included unlimited interference with keyboards and screens. And in the end, assistance when she could no longer make even the slight jump to the kitchen chair.

Sophia

When we lost Erasmus to cancer, Marlene and I were afraid that Sophie would inevitably follow. For her, Rasi was her sun, moon, and stars. We had Erasmus euthanized at home so that the others would know that unlike the many trips for chemo, that this time he was not coming back. Nonetheless, Sophie would often sit at the door, waiting for her daddy to come back home.

She never stopped doing that until we moved.

It was by brush and comb, little by little as she camped out on the back of the couch, that I showed her that humans might just possibly be good for something (though not as a substitute for her daddy-cat). It is such a little thing, but I hold it as one of my personal accomplishments that I helped her look outward again.

Eventually those little scritches on the back of the couch became her expected due: we learned that we were to pay the Sophie-toll every time we passed by her.

Both LaZorra and Sophie were full of personality—and thus, they were often the subjects of my “Dear Cat” tweets. I’ll close with a few of them.

Butter to LaZorra was as mushrooms to hobbits:

At times, she was a little too clever for her own good:

Sophie was the only cat I’ve known to like popping bubblewrap:

Sophie apparently enjoyed the taste of cables:

LaZorra was the suitcase-inspector-in-chief:

And, of course, they could be counted on to help with computation:

They both departed this world with pieces of our hearts in their claws.

What makes the annual Code4Lib conference special?

There’s now a group of people taking a look at whether and how to set up some sort of ongoing fiscal entity for the annual Code4Lib conference.  Of course, one question that comes to mind is why go to the effort? What makes the annual Code4Lib conference so special?

There are lot of narratives out there about how the Code4Lib conference and the general Code4Lib community has helped people, but for this post I want to focus on the conference itself. What does the conference do that is unique or uncommon? Is there anything that it does that would be hard to replicate under another banner? Or to put it another way, what makes Code4Lib a good bet for a potential fiscal host — or something worth going to the effort of forming a new non-profit organization?

A few things that stand out to me as distinctive practices:

  • The majority of presentations are directly voted upon by the people who plan to attend (or who are at least invested enough in Code4Lib as a concept to go to the trouble of voting).
  • Similarly, keynote speakers are nominated and voted upon by the potential attendees.
  • Each year potential attendees vote on bids by one or more local groups for the privilege of hosting the conference.
  • In principle, most any aspect of the structure of the conference is open to discussion by the broader Code4Lib community — at any time.
  • Historically, any surplus from a conference has been given to the following year’s host.
  • Any group of people wanting to go to the effort can convene a local or regional Code4Lib meetup — and need not ask permission of anybody to do so.

Some practices are not unique to Code4Lib, but are highly valued:

  • The process for proposing a presentation or a preconference is intentionally light-weight.
  • The conference is single-track; for the most part, participants are expected to spend most of each day in the same room.
  • Preconferences are inexpensive.

Of course, some aspects of Code4Lib aren’t unique. The topic area certainly isn’t; library technology is not suffering any particular lack of conferences. While I believe that Code4Lib was one of the first libtech conferences to carve out time for lightning talks, many conferences do that nowadays. Code4Lib’s dependence on volunteer labor certainly isn’t unique, although putting aside keynote speakers) Code4Lib may be unique in having zero paid staff.

Code4Lib’s practice of requiring local hosts to bootstrap their fiscal operations from ground zero might be unique, as is the fact that its planning window does not extend much past 18 months. Of course, those are both arguably misfeatures that having fiscal continuity could alleviate.

Overall, the result has been a success by many measures. Code4Lib can reliably attract at least 400 or 500 attendees. Given the notorious registration rush each fall, it could very likely be larger. With its growth, however, come substantially higher expectations placed on the local hosts, and rather larger budgets — which circles us right back to the question of fiscal continuity.

I’ll close with a question: what have I missed? What makes Code4Lib qua annual conference special?

Update 2016-06-29: While at ALA Annual, I spoke with someone who mentioned another distinctive aspect of the conference: the local host is afforded broad latitude to run things as they see fit; while there is a set of lore about running the event and several people who have been involved in multiple conferences, there is no central group that dictates arrangements.  For example, while a couple recent conferences have employed a professional conference organizer, there’s nothing stopping a motivated group from doing all of the work on their own.

Naming and responding to hate — YAPC::NA and ALA Annual in Orlando

Tomorrow we will drive to Orlando, as next week I’m attending two conferences: the Perl Conference (YAPC::NA) and the American Library Association’s Annual 2016 conference.

A professional concern shared by my colleagues in software development and libraries is the difficult problem of naming. Naming things, naming concepts, naming people (or better yet, using the names they tell us to use).

Names have power; names can be misused.

In light of what happened in Orlando on 12 June, the very least we can do is to choose what names we use carefully. What did happen? That morning, a man chose to kill 49 people and injure 53 others at a gay bar called the Pulse. A gay bar that was holding a Latin Night. Most of those killed were Latinx; queer people of color, killed in a spot that for many felt like home. The dead have names.

Names are not magic spells, however. There is no one word we can utter that will undo what happened at the Pulse nor immediately construct a perfect bulwark against the tide of hate. The software and library professions may be able to help reduce hate in the long run… but I offer no platitudes today.

Sometimes what is called for is blood, or cold hard cash. If you are attending YAPC:NA or ALA Annual and want to help via some means identified by those conferences, here are options:

I will close with this: many of our LGBT colleagues will feel pain from the shooting at a level more visceral than those of us who are not LGBT — or Latinx — or people of color. Don’t be silent about the atrocity, but first, listen to them; listen to the folks in Orlando who know what specifically will help the most.

Code4Lib and the “open source way”

The question of what Code4Lib wants to be when it grows up seems to be perennial, and the latest iteration of the discussion is upon us. Quoting Christina Salazar:

… I really do think it’s time to reopen the question of formalizing Code4Lib IF ONLY FOR THE PURPOSES OF BEING THE FIDUCIARY AGENT for the annual conference.

I agree — we need to discuss this. The annual main conference has grown from a hundred or so in 2006 to 440 in 2016. Given the notorious rush of folks racing to register to attend each fall, it is not unreasonable to think that a conference in the right location that offered 750 seats — or even 1,000 — would still sell out. There are also over a dozen regional Code4Lib groups that have held events over the years.

With more attendees comes greater responsibilities — and greater financial commitments. Furthermore, over the years the bar has (appropriately) been raised on what is counted as the minimum responsibilities of the conference organizers. It is no longer enough to arrange to keep the bandwidth high, the latency low, and the beer flowing. A conference host that does not consider accessibility and representation is not living up to what Code4Lib qua group of thoughtful GLAM tech people should be; a host that does not take attendee safety and the code of conduct seriously is being dangerously irresponsible.

Running a conference or meetup that’s larger than what can fit in your employer’s conference room takes money — and the costs scale faster than linearly.  For recent Code4Lib conferences, the budgets have been in the low- to-middle- six figures.

That’s a lot of a money — and a lot of antacids consumed until the hotel and/or convention center minimums are met. The Code4Lib community has been incredibly lucky that a number of people have voluntarily chosen to take this stress on — and that a number of institutions have chosen to act as fiscal hosts and incur the risk of large payouts if a conference were to collapse.

To disclose: I am a member of the committee that worked on the erstwhile bid to host the 2017 conference in Chattanooga. I think we made the right decision to suspend our work; circumstances are such that many attendees would be faced with the prospect of traveling to a state whose legislature is actively trying to make it more dangerous to be there.

However, the question of building or finding a long-term fiscal host for the annual Code4Lib conference must be considered separately from the fate of the 2017 Chattanooga bid. Indeed, it should have been discussed before conference hosts found themselves transferring five-figure sums to the next year’s host.

Of course, one option is to scale back and cease attempting to organize a big international conference unless some big-enough institution happens to have the itch to backstop one. There is a lot of life in the regional meetings, and, of course, many, many people who will never get funding to attend a national conference but who could attend a regional one.

But I find stepping back like that unsatisfying. Collectively, the Code4Lib community has built an annual tradition of excellent conferences. Furthermore, those conference have gotten better (and bigger) over the years without losing one of the essences of Code4Lib: that any person who cares to share something neat about GLAM technology can have the respectful attention of their peers. In fact, the Code4Lib community has gotten better — by doing a lot of hard work — about truly meaning “any person.”

Is Code4Lib a “do-ocracy”? Loaded question, that. But this go around, there seems to be a number of people who are interested in doing something to keep the conference going in the long run. I feel we should not let vague concerns about “too much formality” or (gasp! horrors!) “too much library organization” stop the folks who are interested from making a serious go of it.

We may find out that forming a new non-profit is too much uncompensated effort. We may find out that we can’t find a suitable umbrella organization to join. Or we may find out that we can keep the conference going on a sounder fiscal basis by doing the leg-work — and thereby free up some people’s time to hack on cool stuff without having to pop a bunch of Maalox every winter.

But there’s one in argument against “formalizing” in particular that I object to. Quoting Eric Lease Morgan:

In the spirit of open source software and open access publishing, I suggest we
earnestly try to practice DIY — do it yourself — before other types of
formalization be put into place.

In the spirit of open source? OK, clearly that means that we should immediately form a non-profit foundation that can sustain nearly USD 16 million in annual expenses. Too ambitious?  Let’s settle for just about a million in annual expenses.

I’m not, of course, seriously suggesting that Code4Lib aim to form a foundation that’s remotely in the same league as the Apache Software Foundation or the Mozilla Foundation. Nor do I think Code4Lib needs to become another LITA — we’ve already got one of those (though I am proud, and privileged, to count myself a member of both).  For that matter, I do think it is possible for a project or group effort to prematurely spend too much time adopting the trappings of formal organizational structure and thus forget to actually do something.

But the sort of “DIY” (and have fun unpacking that!) mode that Morgan is suggesting is not the only viable method of “open source” organization. Sometimes open source projects get bigger. When that happens, the organizational structure always changes; it’s better if that change is done openly.

The Code4Lib community doesn’t have to grow larger; it doesn’t have to keep running a big annual conference. But if we do choose to do that — let’s do it right.

Cataloging and coding as applied empathy: a Mashcat discussion prompt

Consider the phrase “Cataloging and coding as applied empathy”.  Here are some implications of those six words:

  • Catalogers and coders share something: what we build is mainly for use by other people, not ourselves. (Yes, programmers often try to eat our own dogfood, and catalogers tend to be library users, but that’s mostly not what we’re paid for.)
  • Consideration of the needs of our users is needed to do our jobs well, and to do right by our users.
  • However: we cannot rely on our users to always tell us what to do:
    • sometimes they don’t know what it is possible to want;
    • sometimes they can’t articulate what they want in a way that lends itself to direct translation to code or taxonomy;
    • it is rarely their paid job to tell us what they want, and how to build it.
  • Waiting for users to tell exactly us what to do can be a decision… to do nothing. Sometimes doing nothing is the best thing to do; often it’s not.
  • Therefore, catalogers and coders need to develop empathy.
  • Applied empathy: our catalogs and our software in some sense embody our empathy (or lack thereof).
  • Applied empathy: empathy can be a learned skill.

Is “applied empathy” a useful framework for discussing how to serve our users? I don’t know, so I’d like to chat about it.  I will be moderating a Mashcat Twitter chat on Thursday, 12 May 2016, at 20:30 UTC (time converter). Do you have questions to suggest? Please add them to the Google doc for this week’s chat.

Natural and unnatural problems in the domain of library software

I offer up two tendentious lists. First, some problems in the domain of library software that are natural to work on, and in the hopeful future, solve:

  • Helping people find stuff. On the one hand, this surely comes off as simplistic; on the other hand, it is the core problem we face, and has been the core problem of library technology from the very moment that a library’s catalog grew too large to stay in the head of one librarian.  There are of course a number of interesting sub-problems under this heading:
    • Helping people produce and maintain useful metadata.
    • Usefully aggregating metadata.
    • Helping robots find stuff (presumably with the ultimate purpose of helping people to find stuff).
    • Artificial intelligence. By this I’m not suggesting that library coders should be aiming to have an ILS kick off the Singularity, but there’s plenty of room for (e.g.) natural language processing to assist in the overall task of helping people find stuff.
  • Helping people evaluate stuff. “Too much information, little knowledge, less wisdom” is one way of describing the glut of bits infesting the Information Age. Libraries can help and should help—even though pitfalls abound.
  • Helping people navigate software and information resources. This includes UX for library software, but also a lot of other software that librarians, like it or not, find themselves helping patrons use. There are some areas of software engineering where the programmer can assume that the user is expert in the task that the software assists with; library software isn’t one of them.
  • Sharing stuff. What is Evergreen if not a decade-long project in figuring out ways to better share library materials among more users? Sharing stuff is not a solved problem even for digital stuff.
  • Keeping stuff around. This is an increasingly difficult problem. Time was, you could leave a pile of books sitting around and reasonably expect that at least a few would still exist five hundred years hence. Digital stuff never rewards that sort of carelessness.
  • Protecting patron privacy. This nearly ended up in the unnatural list—a problem can be unnatural but nonetheless crucial to work on. However, since there’s no reason to expect that people will stop being nosy about what other people are reading—and for that nosiness to sometimes turn into persecution—here we are.
  • Authentication. If the library keeps any transaction information on behalf of a patron so that they can get to it later, the software had better be trying to make sure that only the correct patron can see it. Of course, one could argue that library software should never store such information in the first place (after, say, a loan is returned), but I think there can be an honest conflict with patrons’ desires to keep track of what they used in the past.

Second, some distinctly unnatural problems that library technologists all too often must work on:

  • Digital rights management. If Ambrose Bierce were alive, I would like to think that he might define DRM in a library context thus: “Something that is ineffective in its stated purpose—and cannot possible be effective—but which serves to compromise libraries’ commitment to patron privacy in the pursuit of a misunderstanding about what will keep libraries relevant.”
  • Walled garden maintenance. Consider EZproxy. It takes the back of a very small envelope to realize that hundreds of thousands of person-hours have been expended fiddling with EZproxy configuration files for the sake of bolstering the balance sheets of Big Journal. Is this characterization unfair? Perhaps. Then consider this alternative formulation: the opportunity cost imposed by time spent maintaining or working around barriers to the free exchange of academic publications is huge—and unlike DRM for public library ebooks, there isn’t even a case (good, bad, or indifferent) to be made that the effort results in any concrete financial compensation to the academics who wrote the journal articles that are being so carefully protected.
  • Authorization. It’s one thing to authenticate a patron so that they can get at whatever information the library is storing on their behalf. It’s another thing to spend time coding authentication and authorization systems as part of maintaining the walled gardens.

The common element among the problems I’m calling unnatural? Copyright; in the particular, the current copyright regime that enforces the erection of barriers to sharing—and which we can imagine, if perhaps wistfully, changing to the point where DRM and walled garden maintenance need not occupy the attention of the library programmer, who then might find more time to work on some of the natural problems.

Why is this on my mind? I would like to give a shout-out to (and blow a raspberry at) an anonymous publisher who had this to say in a recent article about Sci-Hub:

And for all the researchers at Western universities who use Sci-Hub instead, the anonymous publisher lays the blame on librarians for not making their online systems easier to use and educating their researchers. “I don’t think the issue is access—it’s the perception that access is difficult,” he says.

I know lots of library technologists who would love to have more time to make library software easier to use. Want to help, Dear Anonymous Publisher? Tell your bosses to stop building walls.