A few days ago, I asked the following question in the Mashcat Slack: “if you’re a library data person, what questions do you have to ask of library systems people and library programmers?”

Here is a question that Alison Hitchens asked based on that prompt:

I’m not sure it is a question, but a need for understanding what types of data manipulations etc. are easy peasy and would take under hour of developer time and what types of things are tricky — I guess an understanding of the resourcing scope of the things we are asking for, if that makes sense

That’s an excellent question – and one whose answer heavily depends on the particulars of the data change needed, the people requesting it, the people who are to implement it, and tools that are available.  I cannot offer a magic box that, when fed specifics and given a few turns of its crank, spits out a reliable time estimate.

However, I can offer up a point of view: asking somebody how long it takes to change some data is asking them to take the measure of their confidence and of their constraints.

In this post I’ll focus on the matter of confidence.  If you, a library data person, are asking me, a library systems person (or team, or department, or service provider), to change a pile of data, I may be perfectly confident in my ability to so.  Perhaps it’s a routine record load that for whatever reason cannot be run directly by the catalogers but for which tools and procedures already exist.  In that case, answering the question of how long it would take to do it might be easy (ignoring, for the moment, the matter of fitting the work onto the calendar).

But when asked to do something new, my confidence could start out being quite low.  Here are some of the questions I might be asking myself:

Am I confident that I’m getting the request from the right person?  Am I confident that the requester has done their homework?

Ideally, the requester has the authority to ask for the change, knows why the change is wanted, has consulted with the right data experts within the organization to verify that the request makes sense, and has ensured that all of the relevant stakeholders have signed off on the request.

If not, then it will take me time to either get the requester to line up the political ducks or to do so myself.

Am I confident that I understand the reason for the change?

If I know the reason for the change – which presumably is rooted in some expected benefit to the library’s users or staff – I may be able to suggest better approaches.  After all, sometimes the best way to do a data change is to change no data at all, and instead change displays or software configuration options.  If data does need to be changed, knowing why can make it easier for me to suss out some of the details or ask smarter questions.

If the reason for the change isn’t apparent, it will take me time to work with the requester and other experts and stakeholders until I have enough understanding of the big picture to proceed (or to be told to do it because the requester said so – but that has its own problems).

Am I confident that I understand the details of the requested change?

Computers are stupid and precise, so ultimately any process and program I write or use to effect the change has to be stupid and precise.

Humans are smart and fuzzy, so to bring a request down to the level of the computer, I have to analyze the problem until I’m confident that I’ve broken it down enough. Whatever design and development process I follow to do the analysis – waterfall, agile, or otherwise – it will take time.

Am I confident in the data that I am to change?

Is the data to be changed nice, clean and consistent?  Great! It’s easier to move a clean data set from one consistent state to another consistent state than it is to clean up a messy batch of data.

The messier the data, the more edge cases there are to consider, the more possible exceptions to worry about – the longer the data change will take.

Am I confident that I have the technical knowledge to implement the change?

Relevant technical knowledge can include knowledge of any update tools provided by the software, knowledge of programming languages that can use system APIs, knowledge of data manipulation and access languages such as SQL and XSLT, knowledge of the underlying DBMS, and so forth.

If I’m confident in my knowledge of the tools, I’ll need less time to figure out how to put them together to deal with the data change.  If not, I’ll need time to teach myself, enlist the aid of colleagues who do have the relevant knowledge, or find contractors to do the work.

Am I confident in my ability to predict any side-effects of the change?

Library data lives in complicated silos. Sometimes, a seemingly small change can have unexpected consequences.  As a very small example, Evergreen actually cares about the values of indicators in the MARC21 856 field; get them wrong, and your electronic resource URLs disappear from public catalog display.

If I’m familiar with the systems that store and use the data to be changed and am confident that side-effects of the change will be minimal, great! If not, it may take me some time to investigate the possible consequences of the change.

Am I confident in my ability to back out of the change if something goes wrong?

Is the data change difficult or awkward to undo if something is amiss?  If so, it presents an operational risk, one whose mitigation is taking more time for planning and test runs.

Am I confident that I know how often requests for similar data changes will be made in the future?

If the request is a one-off, great! If the request is the harbinger of many more like it – or looks that way – I may be better off writing a tool that I can use to make the data change repeatedly.  I may be even better off writing a tool that the requester can use.

It may take more time to write such a tool than it would to just handle the request as a one-off, in which case it will take time to decide which direction to take.

Am I confident in the organization?

Do I work for a library that can handle mistakes well?  Where if the data change turns out to be misguided, is able to roll with the punches?  Or do I work for an unhealthy organization where a mistake means months of recriminations? Or where the catalog is just one of the fronts in a war between the public and technical services departments?

Can I expect to get compensated for performing the data change successfully? Or am I effectively being treated as if were the stupid, over-precise computer?

If the organization is unhealthy, I may need to spend more time than ought to be necessary to protect my back – or I may end up spending a lot of time not just implementing data changes, but data oscillations.

The pattern should be clear: part of the process of estimating how long it might take to effect a data change is estimating how much confidence I have about the change.  Generally speaking, higher confidence means less time would be needed to make the change – but of course, confidence is a quality that cannot be separated from the people and organizations who might work on the change.

In the extreme – but common – case, if I start from a state of very low confidence, it will take me time to reach a sufficient degree of confidence to make any time estimate at all.  This is why I like a comment that Owen Stephens made in the Slack:

Perhaps this is part of the answer to [Alison]: Q: Always ask how long it will take to investigate and get an idea of how difficult it is.

In the next post, I discuss how various constraints can affect time estimates.

The Hugo Awards have been awarded by the World Science Fiction Convention for decades, and serve to recognize the works of authors, editors, directors – fans and professionals – in the genres of science fiction and fantasy.  The Hugos are unique in being a fan-driven award that has as much process – if not more – as juried awards.

That process has two main steps.  First, there’s a nomination period where members of Worldcon select works to appear on the final ballot. Second, members of the upcoming Worldcon vote on the final ballot and the awards are given out at the convention.

Typically, rather more folks vote on the final ballot than nominate – and that means that small, organized groups of people can unduly influence the nominations.  However, there’s been surprisingly few attempts to actually do that.

Until this year.

Many of the nominations this year match the slates of two groups, the “Sad Puppies” and the “Rabid Puppies.”  Not only that, some of the categories contain nothing but Puppy nominations.

The s.f. news site File 770 has a comprehensive collection of back-and-forth about the matter, but suffice it so say that the Puppy slates are have a primarily political motivation – and one, in the interests of full disclosure, that I personally despise.

There are a lot of people saying smart things about the situation, so I’ll content myself with the following observation:

Slate nominations and voting destroy the utility of the Hugo Award lists for librarians who select science fiction and fantasy.

Why? Ideally, the Hugo process ascertains the preferences of thousands of Worldcon members to arrive at a general consensus of science fiction and fantasy that is both good and generally appealing.  As it happens, that’s a pretty useful starting point for librarians trying to round out collections or find new authors that their patrons might like – particularly for those librarians who are not themselves fans of the genre.

However, should slate voting become a successful tactic, the Hugo Awards are in danger of ending up simply reflecting which factions in fandom are best able to game the system.  The results of that… are unlikely to be all that useful for librarians.

Here’s my suggestion for librarians who are fans of science fiction and fantasy and who want to help preserve a collection development tool: get involved.  In particular:

  1. Join Worldcon. A $40 supporting membership suffices to get voting privileges.
  2. Vote on the Hugos this year. I won’t tell you who the vote for, but if you agree with me that slate nominations are a problem, consider voting accordingly.
  3. Next year, participate in the nomination process. Don’t participate in nomination slates; instead, nominate those works that you think are worthy of a Hugo – full stop.

Discussions on Twitter today – see the timelines of @cm_harlow and @erinaleach for entry points – got me thinking.

In 1991, the Library of Congress had 745 staff in its Cataloging Directorate. By the end of FY 2004, the LC Bibliographic Access Divisions had between 5061 and 5612 staff.

What about now? As of 2014, the Acquisitions and Bibliographic Access unit has 238 staff3.

While I’m sure one could quibble about the details (counting FTE vs. counting humans, accounting for the reorganizations, and so forth), the trend is clear: there has been a precipitous drop in the number of cataloging staff employed by the Library of Congress.

I’ll blithely ignore factors such as shifts in the political climate in the U.S. and how they affect civil service. Instead, I’ll focus on library technology, and spin three tales.

The tale of the library technologists

The decrease in the number of cataloging staff are one consequence of a triumph of library automation. The tools that we library technologists have written allow catalogers to work more efficiently. Sure, there are fewer of them, but that’s mostly been due to retirements. Not only that, the ones who are left are now free to work on more intellectually interesting tasks.

If we, the library technologists, can but slip the bonds of legacy cruft like the MARC record, we can make further gains in the expressiveness of our tools and the efficiencies they can achieve. We will be able to take advantage of metadata produced by other institutions and people for their own ends, enabling library metadata specialists to concern themselves with larger-scale issues.

Moreover, once our data is out there – who knows what others, including our patrons, can achieve with it?

This will of course be pretty disruptive, but as traditional library catalogers retire, we’ll reach buy-in. The library administrators have been pushing us to make more efficient systems, though we wish that they would invest more money in the systems departments.

We find that the catalogers are quite nice to work with one-on-one, but we don’t understand why they seem so attached to an ancient format that was only meant for record interchange.

The tale of the catalogers

The decrease in the number of cataloging staff reflects a success of library administration in their efforts to save money – but why is it always at our expense? We firmly believe that our work with the library catalog/metadata services counts as a public service, and we wish more of our public services colleagues knew how to use the catalog better.  We know for a fact that what doesn’t get catalogued may as well not exist in the library.

We also know that what gets catalogued badly or inconsistently can cause real problems for patrons trying to use the library’s collection.  We’ve seen what vendor cataloging can be like – and while sometimes it’s very good, often it’s terrible.

We are not just a cost center. We desperately want better tools, but we also don’t think that it’s possible to completely remove humans from the process of building and improving our metadata. 

We find that the library technologists are quite nice to work with one-on-one – but it is quite rare that we get to actually speak with a programmer.  We wish that the ILS vendors would listen to us more.

The tale of the library directors

The decrease in the number of cataloging staff at the Library of Congress is only partially relevant to the libraries we run, but hopefully somebody has figured out how to do cataloging more cheaply. We’re trying to make do with the money we’re allocated. Sometimes we’re fortunate enough to get a library funding initiative passed, but more often we’re trying to make do with less: sometimes to the point where flu season makes us super-nervous about our ability to keep all of the branches open.

We’re concerned not only with how much of our budgets are going into electronic resources, but with how nigh-impossible it is to predict increases in fees for ejournal subscriptions/ fees for ebook services.

We find that the catalogers and the library technologists are pleasant enough to talk to, but we’re not sure how well they see the big picture – and we dearly wish they could clearly articulate how yet another cataloging standard / yet another systems migration will make our budgets any more manageable.

Each of these tales is true. Each of these tales is a lie. Many other tales could be told. Fuzziness abounds.

However, there is one thing that seems clear: conversations about the future of library data and library systems involve people with radically different points of view. These differences do not mean that any of the people engaged in the conversations are villains, or do not care about library users, or are unwilling to learn new things.

The differences do mean that it can be all too easy for conversations to fall apart or get derailed.

We need to practice listening.

1. From testmony by the president of the Library of Congress Professional Guild to Congress on 6 March 2015.
2. From the BA FY 2004 report. This including 32 staff from the Cataloging Distribution Service, which had been merged into BA and had not been part of the Cataloging Directorate.
3. From testmony by the president of the Library of Congress Professional Guild to Congress on 6 March 2015.

I saw a lot of pain yesterday. I will see more pain today.

Pain from women saying that it’s back to the whisper network for them. Pain from women acknowledging the many faults of whisper networks.

Pain from women who do not want to be chilled — and who yet find themselves in the far north, with the wolves circling.

Pain from women who have seen their colleagues fail them before, and before, and before — and who have less hope now that the future of libraries will be any better.

Pain from women who fear that licenses were issued yesterday — licenses to maintain the status quo, licenses to grind away the hopes and dreams of those women in libraries who want to change the world (or who simply want to catalog books in peace and go home at the end of the day).

Above all, pain from women whose words are now constrained by the full force of the law — and who are now the target of every passerby who has much time and little empathy.

I will speak plainly: Lisa Rabey and nina de jesus did a brave thing, a thing that could never have rebounded to their personal advantage no matter the outcome of the lawsuit. I respect them, and I wish them whatever peace they can find after this.

I will speak bluntly to men in the library profession: regardless of what you think of the case that ended yesterday — regardless of what you think of Joe Murphy’s actions or of the actions of Team Harpy — sexual harassment in our profession is real; the pain our colleagues experience due to it is real.

It remains an unsolved problem.

It remains our unsolved problem.

We must do our part to fix it.

Not sure how? Neither am I. But at least as librarians and library workers, we have access to plenty of tools to learn, to listen.

Time to roll up our sleeves.

Just now I read a comment on a blog to the effect that although the commenter was mildly interested in something, she wasn’t going to Google it.

The something in question was a racist cultural practice that used to be common, but nowadays is condemned by most.

Why not search to learn more about it? She wasn’t particularly concerned that somebody would go through her web search history and think she endorsed the practice. Rather, she didn’t want to contribute even so much as a click to making the practice have higher visibility on the web.

Google searches ≠ endorsement, of course – but I find it interesting that nowadays some find it necessary to say so explicitly.

Is the COBOL programming language capable of processing MARC records?

A computer programmer in 2015 could be excused for thinking to herself, what kind of question is that!?! Surely it’s obvious that any programming language capable of receiving input can parse a simple, antique record format?

In 1968, it apparently wasn’t so obvious. I turned up an article by Henriette Avram and a colleague, MARC II and COBOL, that was evidently written in response to a review article by a Hillis Griffin where he stated

Users will require programmers skilled in languages other than FORTRAN or COBOL to take advantage of MARC records.

Avram responded to Griffin’s concern in the most direct way possible: by describing COBOL programs developed by the Library of Congress to process MARC records and generate printed catalogs. Her article even include source code, in case there were any remaining doubts!

I haven’t yet turned up any evidence that Henriette Avram and Grace Hopper ever met, but it was nice to find a close, albeit indirect connection between the two of them via COBOL.

Is the debate between Avram and Griffen in 1968 regarding COBOL and MARC anything more than a curiosity? I think it is — many of the discussions she participated in are reminiscent of debates that are taking place now. To fair to Griffin, I don’t know enough about the computing environment of the late sixties to be able to definitely say that his statement was patently ill-informed at the time — but given that by 1962 IBM had announced that they were standardizing on COBOL, it seems hardly surprising that Avram and her group would be writing MARC processing code in COBOL on an IBM/360 by 1968. To me, the concerns that Griffin raised seem on par with objections to Library Linked Data that assume that each library catalog request would necessarily mean firing off a dozen requests to RDF providers — objections that have rejoinders that are obvious to programmers, but perhaps not so obvious to others.

Plus ça change, plus c’est la même chose?

The other day I made this blog, galencharlton.com/blog/, HTTPS-only.  In other words, if Eve want to sniff what Bob is reading on my blog, she’ll need to do more than just capture packets between my blog and Bob’s computer to do so.

This is not bulletproof: perhaps Eve is in possession of truly spectacular computing capabilities or a breakthrough in cryptography and can break the ciphers. Perhaps she works for any of the sites that host external images, fonts, or analytics for my blog and has access to their server logs containing referrer headers information.  Currently these sites are Flickr (images), Gravatar (more images), Google (fonts) or WordPress (site stats – I will be changing this soon, however). Or perhaps she’s installed a keylogger on Bob’s computer, in which case anything I do to protect Bob is moot.

Or perhaps I am Eve and I’ve set up a dastardly plan to entrap people by recording when they read about MARC records, then showing up at Linked Data conferences and disclosing that activity.  Or vice versa. (Note: I will not actually do this.)

So, yes – protecting the privacy of one’s website visitors is hard; often the best we can do is be better at it than we were yesterday.

To that end, here are some notes on how I made my blog require HTTPS.

Certificates

I got my SSL certificate from Gandi.net. Why them?  Their price was OK, I already register my domains through them, and I like their corporate philosophy: they support a number of free and open source software projects; they’re not annoying about up-selling, and they have never (to my knowledge) run sexist advertising, unlikely some of their larger and more well-known competitors. But there are, of course, plenty of options for getting SSL certificates, and once Let’s Encrypt is in production, it should be both cheaper and easier for me to replace the certs next year.

I have three subdomains of galencharlton.com that I wanted a certificate for, so I decided to get a multi-domain certificate.  I consulted this tutorial by rtCamp to generate the CSR.

After following the tutorial to create a modified version of openssl.conf specifying the subjectAltName values I needed, I generated a new private key and a certificate-signing request as follows:

openssl req -new -key galencharlton.com.key \
  -out galencharlton.com.csr \
  -config galencharlton.com.cnf \
  -sha256

The openssl command asked me a few questions; the most important of which being the value to set the common name (CN) field; I used “galencharlton.com” for that, as that’s the primary domain that the certificate protects.

I then entered the text of the CSR into a form and paid the cost of the certificate.  Since I am a library techie, not a bank, I purchased a domain-validated certificate.  That means that all I had to prove to the certificate’s issuer that I had control of the three domains that the cert should cover.  That validation could have been done via email to an address at galencharlton.com or by inserting a special TXT field to the DNS zone file for galencharlton.com. I ended up choosing to go the route of placing a file on the web server whose contents and location were specified by the issuer; once they (or rather, their software) downloaded the test files, they had some assurance that I had control of the domain.

In due course, I got the certificate.  I put it and the intermediate cert specified by Gandi in the /etc/ssl/certs directory on my server and the private key in /etc/private/.

Operating System and Apache configuration

Various vulnerabilities in the OpenSSL library or in HTTPS itself have been identified and mitigated over the years: suffice it to say that it is a BEASTly CRIME to make a POODLE suffer a HeartBleed — or something like that.

To avoid the known problems, I wanted to ensure that I had a recent enough version of OpenSSL on the web server and had configured Apache to disable insecure protocols (e.g., SSLv3) and eschew bad ciphers.

The server in question is running Debian Squeeze LTS, but since OpenSSL 1.0.x is not currently packaged for that release, I ended up adding Wheezy to the APT repositories list and upgrading the openssl and apache2 packages.

For the latter, after some Googling I ended up adapting the recommended Apache SSL virtualhost configuration from this blog post by Tim Janik.  Here’s what I ended up with:

<VirtualHost _default_:443>
    ServerAdmin gmc@galencharlton.com
    DocumentRoot /var/www/galencharlton.com
    ServerName galencharlton.com
    ServerAlias www.galencharlton.com

    SSLEngine on
    SSLCertificateFile /etc/ssl/certs/galencharlton.com.crt
    SSLCertificateChainFile /etc/ssl/certs/GandiStandardSSLCA2.pem
    SSLCertificateKeyFile /etc/ssl/private/galencharlton.com.key
    Header add Strict-Transport-Security "max-age=15552000"

    # No POODLE
    SSLProtocol all -SSLv2 -SSLv3 +TLSv1.1 +TLSv1.2
    SSLHonorCipherOrder on
    SSLCipherSuite "EECDH+ECDSA+AESGCM EECDH+aRSA+AESGCM EECDH+ECDSA+SHA384 EECDH+ECDSA+SHA256 EECDH+
aRSA+SHA384 EECDH+aRSA+SHA256 EECDH+AESGCM EECDH EDH+AESGCM EDH+aRSA HIGH !MEDIUM !LOW !aNULL !eNULL
!LOW !RC4 !MD5 !EXP !PSK !SRP !DSS"

</VirtualHost>

I also wanted to make sure that folks coming in via old HTTP links would get permanently redirected to the HTTPS site:

<VirtualHost *:80>
    ServerName galencharlton.com
    Redirect 301 / https://galencharlton.com/
</VirtualHost>

<VirtualHost *:80>
    ServerName www.galencharlton.com
    Redirect 301 / https://www.galencharlton.com/
</VirtualHost>

Checking my work

I’m a big fan of the Qualsys SSL Labs server test tool, which does a number of things to test how well a given website implements HTTPS:

  • Identifying issues with the certificate chain
  • Whether it supports vulnerable protocol versions such as SSLv3
  • Whether it supports – and request – use of sufficiently strong ciphers.
  • Whether it is vulnerable to common attacks.

Suffice it to say that I required a couple iterations to get the Apache configuration just right.

WordPress

To be fully protected, all of the content embedded on a web page served via HTTPS must also be served via HTTPS.  In other words, this means that image URLs should require HTTPS – and the redirects in the Apache config are not enough.  Here is the sledgehammer I used to update image links in the blog posts:

create table bkp_posts as select * from wp_posts;

begin;
update wp_posts set post_content = replace(post_content, 'http://galen', 'https://galen') where post_content like '%http://galen%';
commit;

Whee!

I also needed to tweak a couple plugins to use HTTPS rather than HTTP to embed their icons or fetch JavaScript.

Finishing touches

In the course of testing, I discovered a couple more things to tweak:

  • The web sever had been using Apache’s mod_php5filter – I no longer remember why – and that was causing some issues when attempting to load the WordPress dashboard.  Switching to mod_php5 resolved that.
  • My domain ownership proof on keybase.io failed after the switch to HTTPS.  I eventually tracked that down to the fact that keybase.io doesn’t have a bunch of intermediate certificates in its certificate store that many browsers do. I resolved this by adding a cross-signed intermediate certificate to the file referenced by SSLCertificateChainFile in the Apache config above.

My blog now has an A+ score from SSL Labs. Yay!  Of course, it’s important to remember that this is not a static state of affairs – another big OpenSSL or HTTPS protocol vulnerability could turn that grade to an F.  In other words, it’s a good idea to test one’s website periodically.

At the first face-to-face meeting of the LITA Patron Privacy Technologies Interest Group at Midwinter, one of the attendees mentioned that they had sent out an RFP last year for library databases. One of the questions on the RFP asked how user passwords were stored — and a number of vendors responded that their systems stored passwords in plain text.

Here’s what I tweeted about that, and here is Dorothea Salo’s reply:

https://twitter.com/LibSkrat/status/561605951656976384

This is a repeatable response, by the way — much like the way a hammer strike to the patellar ligament instigates a reflexive kick, mention of plain-text password storage will trigger an instinctual wail from programmers, sysadmins, and privacy and security geeks of all stripes.

Call it the Vanilla Password Reflex?

I’m not suggesting that you should whisper “plain text passwords” into the ear of your favorite system designer, but if you are the sort to indulge in low and base amusements…

A recent blog post by Eric Hellman discusses the problems with storing passwords in plain text in detail. The upshot is that it’s bad practice — if a system’s password list is somehow leaked, and if the passwords are stored in plain text, it’s trivially easy for a cracker to use those passwords to get into all sorts of mischief.

This matters, even “just” for library reference databases. If we take the right to reader privacy seriously, it has to extend to the databases offered by the library — particularly since many of them have features to store citations and search results in a user’s account.

As Eric mentions, the common solution is to use a one-way cryptographic hash function to transform the user’s password into a bunch of gobbledegook.

For example, “p@ssw05d” might be stored as the following hash:

d242b6313f32c8821bb75fb0660c3b354c487b36b648dde2f09123cdf44973fc

To make it more secure, I might add some random salt and end up with the following salted hash:

$2355445aber$76b62e9b096257ac4032250511057ac4d146146cdbfdd8dd90097ce4f170758a

To log in, the user has to prove that they know the password by supplying it, but rather than compare the password directly, the result of the one-way function applied to the password is compared with the stored hash.

How is this more secure? If a hacker gets the list of password hashes, they won’t be able to deduce the passwords, assuming that the hash function is good enough. What counts as good enough? Well, relatively few programmers are experts in cryptography, but suffice it to say that there does exist a consensus on techniques for managing passwords and authentication.

The idea of one-way functions to encrypt passwords is not new; in fact, it dates back to the 1960s. Nowadays, any programmer who wants to be considered a professional really has no excuse for writing a system that stores passwords in plain text.

Back to the “Vanilla Password Reflex”. It is, of course, not actually a reflex in the sense of an instinctual response to a stimulus — programmers and the like get taught, one way or another, about why storing plain text passwords is a bad idea.

Where does this put the public services librarian? Particularly the one who has no particular reason to be well versed in security issues?

At one level, it just changes the script. If a system is well-designed, if a user asks what their password is, it should be impossible to get an answer to the question. How to respond to a patron who informs you that they’ve forgotten their password? Let them know that you can change it for them. If they respond by wondering why you can’t just tell them, if they’re actually interested in the answer, tell them about one-way functions — or just blame the computer, that’s fine too if time is short.

However, libraries and librarians can have a broader role in educating patrons about online security and privacy practices: leading by example. If we insist that the online services we recommend follow good security design; if we use HTTPS appropriately; if we show that we’re serious about protecting reader privacy, it can only buttress programming that the library may offer about (say) using password managers or avoiding phishing and other scams.

There’s also a direct practical benefit: human nature being what it is, many people use the same password for everything. If you crack an ILS’s password list, you’ve undoubtedly obtained a non-negligible set of people’s online banking passwords.

I’ll end this with a few questions. Many public services librarians have found themselves, like it or not, in the role of providing technical support for e-readers, smartphones, and laptops. How often does online security come up during such interactions? How often to patrons come to the library seeking help against the online bestiary of spammers, phishers, and worse? What works in discussing online security with patrons, who of course can be found at all levels of computer savvy? And what doesn’t?

I invite discussion — not just in the comments section, but also on the mailing list of the Patron Privacy IG.

What do ogres, hippogriffs, and authorized Koha service providers have in common?

Each of them is an imaginary creature.

20070522 Madrid: hippogriff -- image by Larry Wentzel on Flickr (CC-BY)
20070522 Madrid: hippogriff — image by Larry Wentzel on Flickr (CC-BY)

Am I saying that Koha service providers are imaginary creatures? Not at all — at the moment, there are 54 paid support providers listed on the Koha project’s website.

But not a one of them is “authorized”.

I bring this up because a friend of mine in India (full disclosure: who himself offers Koha consulting services) ran across this flyer by Avior Technologies:

Avior information sheet

The bit that I’ve highlighted is puffery at best, misleading at worst. The Koha website’s directory of paid support providers is one thing, and one thing only: a directory. The Koha project does not endorse any vendors listed there — and neither the project nor the Horowhenua Library Trust in New Zealand (which holds various Koha trademarks) authorizes any firm to offer Koha services.

If you want your firm to get included in the directory, you need only do a few things:

  1. Have a website that contains an offer of services for Koha.
  2. Ensure that your page that offers services links back to koha-community.org.
  3. Make a public request to be added to the directory.

That’s it.

Not included on this list of criteria:

  • Being good at offering services for Koha libraries.
  • Contributing code, documentation, or anything else to the Koha project.
  • Having any current customers who are willing to vouch for you.
  • Being alive at present (although eventually, your listing will get pulled for lack of response to inquiries from Koha’s webmasters).

What does this mean for folks interested in getting paid support services?  There is no shortcut to doing your due diligence — it is on you to evaluate whether a provider you might hire is competent and able to keep their customers reasonably happy. The directory on the Koha website exists as a convenience for folks starting a search for a provider, but beyond that: caveat emptor.

I know nothing about Avior Technologies. They may be good at what they do; they may be terrible — I make no representation either way.

But I do know this: while there are some open source projects where the notion of an “authorized” or “preferred” support provider may make some degree of sense, Koha isn’t such a project.

And that’s generally to the good of all: if you have Koha expertise or can gain it, you don’t need to ask anybody’s permission to start helping libraries run Koha — and get paid for it.  You can fill niches in the market that other Koha support providers cannot or do not fill.

You can in time become the best Koha vendor in your niche, however you choose to define it.

But authority? It will never be bestowed upon you. It is up to you to earn it by how well you support your customers, and by how much you contribute to the global Koha project.

 

Shortly after it came to light that Adobe Digital Editions was transmitting information about ebook reading activity in the clear, for anybody to snoop upon, I asked a loaded question: does ALA have a role in helping to verify that the software libraries use protect the privacy of readers?

As with any loaded question, I had an answer in mind: I do think that ALA and LITA, by virtue of their institutional heft and influence with librarians, can provide significant assistance in securing library software.

I waited a bit, wondering how the powers that be at ALA would respond. Then I remembered something: an institution like ALA is not, in fact, a faceless, inscrutable organism. Like Soylent Green, ALA is people!

Well, maybe not so much like Soylent Green. My point is that despite ALA’s reputation for being a heavily bureaucratic, procedure-bound organization, it does offer ways for members to take up and idea an run with it.

And that’s what I did — I floated a petition to form a new interest group within LITA, the Patron Privacy Technologies IG. Quite a few people signed it… and it now lives!

Here’s the charge of the IG:

The LITA Patron Privacy Technologies Interest Group will promote the design and implementation of library software and hardware that protects the privacy of library users and maximizes user ability to make informed decisions about the use of personally identifiable information by the library and its vendors.

Under this remit, activities of the Interest Group would include, but are not necessarily limited to:

  1. Publishing recommendations on data security practices for library software.
  2. Publishing tutorials on tools for libraries to use to check that library software is handling patron information responsibly.
  3. Organizing efforts to test commercially available software that handle patron information.
  4. Providing a conduit for responsible disclosure of defects in software that could lead to exposure of library patron information.
  5. Providing sample publicity materials for libraries to use with their patrons in explaining the library’s privacy practices.

I am fortunate to have two great co-chairs, Emily Morton-Owens of the Seattle Public Library and Matt Beckstrom of the Lewis and Clark Library, and I’m happy to announce that the IG’s first face-to-face meeting will at ALA Midwinter 2015 — specifically  tomorrow, at 8:30 a.m. Central Time in the Ballroom 1 of the Sheraton in Chicago.

We have two great speakers lined up — Alison Macrina of the Library Freedom Project and Gary Price of INFODocket, and I’m very much looking forward to it.

But I’m also looking forward to the rest of the meeting: this is when the IG will, as a whole, decide how far to reach.  We have a lot of interest and the ability to do things that will teach library staff and our patrons how to better protect privacy, teach library programmers how to design and code for privacy, and verify that our tools match our ideals.

Despite the title of this blog post… it’s by no means my effort alone that will get us anywhere. Many people are already engaging in issues of privacy and technology in libraries, but I do hope that the IG will provide one more point of focus for our efforts.

I look forward to the conversation tomorrow.