Category Archives: Uncategorized

IMLS support for free and open source software

The Institute of Museum and Library Services is the U.S. government’s primary vehicle for direct federal support of libraries, museums, and archives across the entire country. It should come as no surprise that the Trump administration’s “budget blueprint” proposes to wipe it out, along with the NEA, NEH, Meals on Wheels, and dozens of other programs.

While there is reason for hope that Congress will ignore at least some of the cuts that Trump proposes, the IMLS in particular has been in the sights of House Speaker Paul Ryan before. We cannot afford to be complacent.

Loss of the IMLS and the funding it delivers would be a disaster for many reasons, but I’ll focus on just one: the IMLS has paid a significant role in funding in the creation and use of free and open source software for libraries, museums, and archives. Besides the direct benefit to the institutions who were awarded grants to build or use F/LOSS, such grants are a smart investment on the part of an IMLS: a dollar spent on producing software that anybody can freely use can rebound to the benefit of many more libraries.

For example, here is a list of some of the software projects whose creation or enhancement was funded by an IMLS grant:

This is only a partial list; it does not include LSTA funding that libraries may have used to either implement or enhance F/LOSS systems or money that libraries contributed to F/LOSS development as part of a broader grant project.

IMLS has also funded some open source projects that ultimately… went nowhere. But that’s OK; IMLS funding is one way that libraries can afford to experiment.

Do you or your institution use any of this software? Would you miss it if it were gone — or never existed — or was only available in some proprietary form? If so… write your congressional legislators today.

TIMTOWTDI

The Internet Archive had this to say earlier today:

This was in response to the MacArthur Foundation announcing that the IA is a semifinalist for a USD $100 million grant; they propose to digitize 4 million books and make them freely available.

Well and good, if they can pull it off — though I would love to see the detailed proposal — and the assurance that this whole endeavor is not tied to the fortunes of a single entity, no matter how large.

But for now, I want to focus on the rather big bus that the IA is throwing “physical libraries” under. On the one hand, their statement is true: access to libraries is neither completely universal nor completely equitable. Academic libraries are, for obvious reasons, focused on the needs of their host schools; the independent researcher or simply the citizen who wishes to be better informed will always be a second-class user. Public libraries are not evenly distributed nor evenly funded. Both public and academic libraries struggle with increasing demands on their budgets, particularly with respect to digital collections. Despite the best efforts of librarians, underserved populations abound.

Increasing access to digital books will help — no question about it.

But it won’t fundamentally solve the problem of universal and equitable service. What use is the Open Library to somebody who has no computer — or no decent smart phone – or an inadequate data plan—or uncertain knowledge of how to use the technology? (Of course, a lot of physical libraries offer technology training.)

I will answer the IA’s overreach into technical messianism with another bit of technical lore: TIMTOWTDI.

There Is More Than One Way To Do It.

I program in Perl, and I happen to like TIMTOWTDI—but as a principle guiding the design of programming languages, it’s a matter of taste and debate: sometimes there can be too many options.

However, I think TIMTOWTDI can be applied as a rule of thumb in increasing social justice:

There Is More Than One Way To Do It… and we need to try all of them.

Local communities have local needs. Place matters. Physical libraries matter—both in themselves and as a way of reinforcing technological efforts.

Technology is not universally available. It is not available equitably. The Internet can route around certain kinds of damage… but big, centralized projects are still vulnerable. Libraries can help mitigate some of those risks.

I hope the Internet Archive realizes that they are better off working with libraries — and not just acting as a bestower of technological solutions that may help, but will not by themselves solve the problem of universal, equitable access to information and entertainment.

A small thought on library and tech unions in light of a lockout

I’ve never been a member of a union. Computer programmers — and IT workers in general — in the U.S. are mostly unorganized. Not only that, they tend to resist unions, even though banding together would be a good idea.

It’s not necessarily a matter of pay, at least not at the moment: many IT workers have decent to excellent salaries. Of course not all do, and there are an increasing number of IT job categories that are becoming commoditized. Working conditions at a lot of IT shops are another matter: the very long hours that many programmers and sysadmins work are not healthy, but it can be very hard to be first person in the office to leave at a reasonable quitting time day.

There are other reasons to be part of a union as an IT worker. Consider one of the points in the ACM code of ethics: “Respect the privacy of others.” Do you have a qualm about writing a web tracker? It can be hard to push back all by yourself against a management imperative to do so. A union can provide power and cover: what you can’t resist singly, a union might help forestall.

The various library software firms I’ve worked for have not been exceptions: no unions. At the moment, I’m also distinctly on the management side of the table.

Assuming good health, I can reasonably expect to spend another few decades working, and may well switch from management to labor and back again — IT work is squishy like that. Either way, I’ll benefit from the work — and blood, and lives — of union workers and organizers past and future. (Hello, upcoming weekend! You are literally the least of the good things that unions have given me!)

I may well find myself (or more likely, people representing me) bargaining hard with or against a union. And that’s fine.

However, if I find myself sitting, figuratively or literally, on the management side of a negotiation table, I hope that I never lose sight of this: the union has a right to exist.

Unfortunately, the U.S. has a long history of management and owners rejecting that premise, and doing their level best to break unions or prevent them from forming.

The Long Island University Faculty Federation, which represents the full time and adjunct faculty at the Brooklyn campus of LIU, holds a distinction: it was the first union to negotiate a collective bargaining agreement for faculty at a private university in the U.S.

Forty-four years later, the administration of LIU Brooklyn seems determined to break LIUFF, and have locked out the faculty. Worse, LIU has elected not to continue the health insurance of the LIUFF members. I have only one word for that tactic: it is an obscenity.

As an aside, this came to my attention last week largely because I follow LIU librarian and LIUFF secretary Emily Drabinski on Twitter. If you want to know what’s going on with the lockout, follow her blog and Twitter account as well as the #LIUlockout hashtag.

I don’t pretend that I have a full command of all of the issues under discussion between the university and the union, but I’ve read enough to be rather dubious that the university is presently acting in good faith. There’s plenty of precedent for university faculty unions to work without contracts while negotiations continue; LIU could do the same.

Remember, the union has a right to exist. Applies to LIUFF, to libraries, and hopefully in time, to more IT shops.

If you agree with me that lockouts are wrong, please consider joining me in donating to the solidarity fund for the benefit of LIUFF members run the by American Federation of Teachers.

Naming and responding to hate — YAPC::NA and ALA Annual in Orlando

Tomorrow we will drive to Orlando, as next week I’m attending two conferences: the Perl Conference (YAPC::NA) and the American Library Association’s Annual 2016 conference.

A professional concern shared by my colleagues in software development and libraries is the difficult problem of naming. Naming things, naming concepts, naming people (or better yet, using the names they tell us to use).

Names have power; names can be misused.

In light of what happened in Orlando on 12 June, the very least we can do is to choose what names we use carefully. What did happen? That morning, a man chose to kill 49 people and injure 53 others at a gay bar called the Pulse. A gay bar that was holding a Latin Night. Most of those killed were Latinx; queer people of color, killed in a spot that for many felt like home. The dead have names.

Names are not magic spells, however. There is no one word we can utter that will undo what happened at the Pulse nor immediately construct a perfect bulwark against the tide of hate. The software and library professions may be able to help reduce hate in the long run… but I offer no platitudes today.

Sometimes what is called for is blood, or cold hard cash. If you are attending YAPC:NA or ALA Annual and want to help via some means identified by those conferences, here are options:

I will close with this: many of our LGBT colleagues will feel pain from the shooting at a level more visceral than those of us who are not LGBT — or Latinx — or people of color. Don’t be silent about the atrocity, but first, listen to them; listen to the folks in Orlando who know what specifically will help the most.

Securing Z39.50 traffic from Koha and Evergreen Z39.50 servers using YAZ and TLS

There’s often more than way to search a library catalog; or to put it another way, not all users come in via the front door.  For example, ensuring that your public catalog supports HTTPS can help prevent bad actors from snooping on patron’s searches — but if one of your users happens to use a tool that searches your catalog over Z39.50, by default they have less protection.

Consider this extract from a tcpdump of a Z39.50 session:

No, MARC is not a cipher; it just isn’t.

How to improve this state of affairs? There was some discussion back in 2000 of bundling SSL or TLS into the Z39.50 protocol, although it doesn’t seem like it went anywhere. Of course, SSH tunnels and stunnel are options, but it turns out that there can be an easier way.

As is usually the case with anything involving Z39.50, we can thank the folks at IndexData for being on top of things: it turns out that TLS support is easily enabled in YAZ. Here’s how this can be applied to Evergreen and Koha.

The first step is to create an SSL certificate; a self-signed one probably suffices. The certificate and its private key should be concatenated into a single PEM file, like this:

-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
-----BEGIN PRIVATE KEY-----
...
-----END PRIVATE KEY-----

Evergreen’s Z39.50 server can be told to require SSL via a <listen> element in /openils/conf/oils_yaz.xml, like this:

To supply the path to the certificate, a change to oils_ctl.sh will do the trick:

For Koha, a <listen> element should be added to koha-conf.xml, e.g.,

zebrasrv will also need to know how to find the SSL certificate:

And with that, we can test: yaz-client ssl:localhost:4210/CONS or yaz-client ssl:localhost:4210/biblios. Et voila!

Of course, not every Z39.50 client will know how to use TLS… but lots will, as YAZ is the basis for many of them.

Exercises involving a MARC record for an imaginary book, inspired by a recent AUTOCAT thread

Consider the following record, inspired by the discussion on AUTOCAT that was kicked off by this query:

100 1_ ‡a Smith, June, ‡d 1977-
245 00 ‡a Regarding events in Ferguson / ‡c June Smith.
260 _1 ‡a New York : ‡b Hope Press, ‡c 2017.
300 __ ‡a 371 p. : ‡b ill. ; ‡c 10 x 27 cm
336 __ ‡a text ‡2 rdacontent
337 __ ‡a unmediated ‡2 rdamedia
338 __ ‡a volume ‡2 rdacarrier
650 _0 ‡a United States ‡x History ‡y Civil War, 1861-1865.
650 _0 ‡a Police brutality ‡z Missouri ‡z Ferguson ‡y 2014.
650 _0 ‡a Ferguson (Mo.) Riot, 2015.
650 _0 ‡a Reconstruction (U.S. history, 1865-).
650 _0 ‡a Race riots ‡z Missouri ‡z Ferguson ‡y 2014.
650 _0 ‡a Demonstrations ‡z Missouri ‡z Ferguson ‡y 2014.
653 20 ‡a #BlackLivesMatter
651 _0 ‡a Ferguson (Mo.) ‡x History ‡y 21st century.
650 _0 ‡a Political violence ‡z Missouri ‡z Ferguson ǂx History 
       ‡y 21st century.
650 _0 ‡a Social conflict ‡z Missouri ‡z Ferguson.
650 _0 ‡a Civil rights demonstrations ‡z Missouri ‡z Saint Louis County.
650 _0 ‡a Social conflict ‡z Missouri ‡z Saint Louis.
650 _0 ‡a Protest movements ‡z Missouri ‡z Ferguson.
650 _0 ‡a Protest movements ‡z Missouri ‡z Saint Louis.
650 _0 ‡a Militarization of police ‡z Missouri ‡z Ferguson.
650 _0 ‡a Militarization of police ‡z United States.
651 _0 ‡a Ferguson (Mo.) ‡z Race relations.
653 _0 ‡a 2014 Ferguson unrest
650 _0 ‡a African Americans ‡x Civil rights ‡x History.
650 _0 ‡a African Americans ‡x Crimes against ‡x History.
650 _0 ‡a Police brutality ‡z United States.
650 _0 ‡a Police ‡x Complaints against ‡z Missouri ‡z Ferguson.
650 _0 ‡a Police-community relations ‡z Missouri ‡z Ferguson.
650 _0 ‡a Discrimination in criminal justice administration ‡z Missouri
       ‡z Ferguson.
650 _0 ‡a United States ‡x Race relations ‡x History.

Some exercises for the reader:

  1. Identify the subject headings that detract from the neutrality of this record. Show your work.
  2. Identify the subject headings whose absence lessens the accuracy or neutrality of this record. Show your work.
  3. Of the headings that detract from the neutrality of this record, identify the ones that are inaccurate. Show your work.
  4. Adopt the perspective of someone born in 1841 and repeat exercises 1-3. Show your work.
  5. Adopt the perspective of someone born in 2245 and repeat exercises 1-3. Show your work.
  6. Repeat exercises 1-3 in the form of a video broadcast over the Internet.
  7. Repeat exercises 1-3 as a presentation to your local library board.

I acknowledge with gratitude the participants in the AUTOCAT thread who grappled with the question; many of them suggested subject headings used in this record.

How long does it take to change the data, part I: confidence

A few days ago, I asked the following question in the Mashcat Slack: “if you’re a library data person, what questions do you have to ask of library systems people and library programmers?”

Here is a question that Alison Hitchens asked based on that prompt:

I’m not sure it is a question, but a need for understanding what types of data manipulations etc. are easy peasy and would take under hour of developer time and what types of things are tricky — I guess an understanding of the resourcing scope of the things we are asking for, if that makes sense

That’s an excellent question – and one whose answer heavily depends on the particulars of the data change needed, the people requesting it, the people who are to implement it, and tools that are available.  I cannot offer a magic box that, when fed specifics and given a few turns of its crank, spits out a reliable time estimate.

However, I can offer up a point of view: asking somebody how long it takes to change some data is asking them to take the measure of their confidence and of their constraints.

In this post I’ll focus on the matter of confidence.  If you, a library data person, are asking me, a library systems person (or team, or department, or service provider), to change a pile of data, I may be perfectly confident in my ability to so.  Perhaps it’s a routine record load that for whatever reason cannot be run directly by the catalogers but for which tools and procedures already exist.  In that case, answering the question of how long it would take to do it might be easy (ignoring, for the moment, the matter of fitting the work onto the calendar).

But when asked to do something new, my confidence could start out being quite low.  Here are some of the questions I might be asking myself:

Am I confident that I’m getting the request from the right person?  Am I confident that the requester has done their homework?

Ideally, the requester has the authority to ask for the change, knows why the change is wanted, has consulted with the right data experts within the organization to verify that the request makes sense, and has ensured that all of the relevant stakeholders have signed off on the request.

If not, then it will take me time to either get the requester to line up the political ducks or to do so myself.

Am I confident that I understand the reason for the change?

If I know the reason for the change – which presumably is rooted in some expected benefit to the library’s users or staff – I may be able to suggest better approaches.  After all, sometimes the best way to do a data change is to change no data at all, and instead change displays or software configuration options.  If data does need to be changed, knowing why can make it easier for me to suss out some of the details or ask smarter questions.

If the reason for the change isn’t apparent, it will take me time to work with the requester and other experts and stakeholders until I have enough understanding of the big picture to proceed (or to be told to do it because the requester said so – but that has its own problems).

Am I confident that I understand the details of the requested change?

Computers are stupid and precise, so ultimately any process and program I write or use to effect the change has to be stupid and precise.

Humans are smart and fuzzy, so to bring a request down to the level of the computer, I have to analyze the problem until I’m confident that I’ve broken it down enough. Whatever design and development process I follow to do the analysis – waterfall, agile, or otherwise – it will take time.

Am I confident in the data that I am to change?

Is the data to be changed nice, clean and consistent?  Great! It’s easier to move a clean data set from one consistent state to another consistent state than it is to clean up a messy batch of data.

The messier the data, the more edge cases there are to consider, the more possible exceptions to worry about – the longer the data change will take.

Am I confident that I have the technical knowledge to implement the change?

Relevant technical knowledge can include knowledge of any update tools provided by the software, knowledge of programming languages that can use system APIs, knowledge of data manipulation and access languages such as SQL and XSLT, knowledge of the underlying DBMS, and so forth.

If I’m confident in my knowledge of the tools, I’ll need less time to figure out how to put them together to deal with the data change.  If not, I’ll need time to teach myself, enlist the aid of colleagues who do have the relevant knowledge, or find contractors to do the work.

Am I confident in my ability to predict any side-effects of the change?

Library data lives in complicated silos. Sometimes, a seemingly small change can have unexpected consequences.  As a very small example, Evergreen actually cares about the values of indicators in the MARC21 856 field; get them wrong, and your electronic resource URLs disappear from public catalog display.

If I’m familiar with the systems that store and use the data to be changed and am confident that side-effects of the change will be minimal, great! If not, it may take me some time to investigate the possible consequences of the change.

Am I confident in my ability to back out of the change if something goes wrong?

Is the data change difficult or awkward to undo if something is amiss?  If so, it presents an operational risk, one whose mitigation is taking more time for planning and test runs.

Am I confident that I know how often requests for similar data changes will be made in the future?

If the request is a one-off, great! If the request is the harbinger of many more like it – or looks that way – I may be better off writing a tool that I can use to make the data change repeatedly.  I may be even better off writing a tool that the requester can use.

It may take more time to write such a tool than it would to just handle the request as a one-off, in which case it will take time to decide which direction to take.

Am I confident in the organization?

Do I work for a library that can handle mistakes well?  Where if the data change turns out to be misguided, is able to roll with the punches?  Or do I work for an unhealthy organization where a mistake means months of recriminations? Or where the catalog is just one of the fronts in a war between the public and technical services departments?

Can I expect to get compensated for performing the data change successfully? Or am I effectively being treated as if were the stupid, over-precise computer?

If the organization is unhealthy, I may need to spend more time than ought to be necessary to protect my back – or I may end up spending a lot of time not just implementing data changes, but data oscillations.

The pattern should be clear: part of the process of estimating how long it might take to effect a data change is estimating how much confidence I have about the change.  Generally speaking, higher confidence means less time would be needed to make the change – but of course, confidence is a quality that cannot be separated from the people and organizations who might work on the change.

In the extreme – but common – case, if I start from a state of very low confidence, it will take me time to reach a sufficient degree of confidence to make any time estimate at all.  This is why I like a comment that Owen Stephens made in the Slack:

Perhaps this is part of the answer to [Alison]: Q: Always ask how long it will take to investigate and get an idea of how difficult it is.

In the next post, I discuss how various constraints can affect time estimates.

Playing around with Coce

In the course of looking at the patch for Koha bug 9580 today, I ended playing around with Coce.

Coce is a piece of software written by Frédéric Demians and licensed under the GPL that implements a cache for URLs of book cover images. It arose during a discussion of cover images on the Koha development mailing list.

The idea of Coce is rather than have the ILS either directly link to cover images by plugging the normalized ISBN into a URL pattern (as is done for Amazon, Baker & Taylor and Syndetics) or by calling a web service to get the image’s URL (as is done for Google and Open Library), Coce queries the cover image providers and returns the image URLs. Furthermore, Coce caches the URLs, meaning once it determines that the Open Library cover image for ISBN 9780563533191 can be found at http://covers.openlibrary.org/b/id/2520432-L.jpg, it need not ask again, at least for a while.

Having a cache like this provides some advantages:

  • Caching the result of web service calls reduces the load on the providers. That’s nice for the likes of the Open Library, and while even the most ambitious ILS is not likely to discomfit Amazon or Google, it doesn’t hurt to reduce the risk of getting rate-limited during summer reading.
  • Since Coce queries each provider for valid image URLs, users are less likely to see broken cover images in the catalog.
  • Since Coce can query multiple providers (it currently has support for the Open Library, Google Books, and Amazon’s Product Advertising API), more records can have cover images displayed as compared to using just one source.
  • It lends itself to using one Coce instance to service multiple Koha instances.

There are also some disadvantages:

  • It would be yet another service to maintain.
  • It would be another point of failure. On the other hand, it looks like it would be easy to set up multiple, load-balanced instances of Coce.
  • There is the possibility that image URLs might get cached for too long — although I don’t think any of the cover image services are in the habit of changing the static image URLs just for fun, they don’t necessarily guarantee that they will work forever.

I set up Coce on a Debian Wheezy VM. It was relatively simple to install; for posterity here is the procedure I used. First, I installed Redis, which Coce uses as its cache:

Next, I installed Node.js by building a Debian package, then installing it:

When I got to the point where checkinstall asked me to confirm the metadata for the package, I made sure to remove the “v” from the version number.

Next, I checked out Coce and installed the Node.js packages it needs:

I then copied ”config.json-sample” to ”config.json” and customized it. The only change I made, though, was to remove Amazon from the list of providers.

Finally, I started the service:

On my test Koha system, I installed the patch for bug 9580 and set the two system preferences it introduces to appropriate values to point to my Coce instance with the set of cover providers I wanted to use for the test.

The result? It worked: I did an OPAC search, and some of the titles that got displayed had their cover image provided by Google Books, while others were provided by the Open Library.

There are a few rough edges to work out. For example, the desired cover image size should probably be part of the client request to Coce, not part of Coce’s central configuration, and I suspect a bit more work is needed to get it to work properly if the OPAC is run under HTTPS. That said, this looks promising, and I enjoyed the chance to start playing a bit with Redis and Node.js.

Pronounceable acronyms, or why you should come to the LITA Open Source Systems Interest Group meeting at ALA Midwinter

I just realized something — I’ve never heard anybody refer to the LITA OSS IG as the “awww-zig”. Probably just as well.

Anyway, here’s my pitch for the meeting, which is today, 9 January 2011, at the San Diego Convention Center in room 31C. By design, the IGs in LITA are perhaps the wildest and wooliest part of ALA, being forums for like-minded librarians and library techies to discuss and work on cool toys and share ideas for the benefit of their libraries. The OSS IG is no exception; one of the best parts of our meetings is going around the table and having everybody present discuss what they’ve been up to with F/OSS. Think of it as Code4Lib where ties and jackets are allowed!

There is also a business meeting, which co-chair Daniel Lovins and I will keep as short as possible, but fortunately, there’s little in the way of administrivia to talk about. The main agenda item for the business meeting: what does the OSS IG group want to do next? To toss some ideas out:

  • Find good speakers for a managed discussion for the meeting at the next Annual.
  • Do a workshop or preconference. The migrating to open source systems preconference at ALA Annual last year was a success, so we know we can do it again.
  • Organize a LITA webinar or web class.
  • Post to the LITA blog.
  • Do something neat on the LITA sandbox server.
  • Or do something else entirely.

I look forward to seeing friends of F/OSS, the curious, and even the dubious for what will be a great discussion.

Forum on careers for public librarians in technical services

Catalogers and technical services managers in public libraries are needed more than ever, but for a variety of reasons their numbers have been declining over the years. ALCTS CRG has convened a forum on careers in technical services in public libraries, but as the forum was not listed in the program guide, here’s the description. Full disclosure: I highly recommend this because, among other reasons, my wife, Marlene Harris, is one of the participants.


ALCTS CRG Forum in Anaheim Focused on Careers for Public Librarians in
Technical Services

Sunday, June 29, 8:00-9:30 a.m., Disney Paradise Pier Hotel, Pacific C/D

Want to get the scoop on the advantages and disadvantages of a technical
services career in public libraries? Be sure to catch the CRG forum,
Technical Services Careers in Public Libraries: Getting Started,
Building Your Career, or Making the Switch, on Sunday, June 29, 2008
from 8 to 9:30 a.m., in Room Pacific C/D of the Disney Paradise Pier
Hotel, when Carolyn Goolsby, Technical Services Manager at the Tacoma
Public Library, and Marlene A. Harris, Division Chief, Technical
Services at the Chicago Public Library, will offer advice and describe
from personal experience the ups and downs, ins and outs, of a career in
technical services within the public library setting. Ample time will be
provided for questions and answers after presentations by both
panelists.

The moderator is Elaine Yontz from the faculty of the Library and Information Science program at Valdosta State University.

Sponsored by ALCTS CRG (Council of Regional Groups)

Technorati Tags: , , ,