Let’s search a Koha catalog for something that isn’t at all controversial:

Screenshot of results from a catalog search of a Koha system for "anarchist"

What you search for in a library catalog ought to be only between you and the library — and that, only briefly, as the library should quickly forget. Of course, between “ought” and “is” lies the Devil and his details. Let’s poke around with Chrome’s DevTools:

  1. Hit Control-Shift-I (on Windows)
  2. Switch to the Network tab.
  3. Hit Control-R to reload the page and get a list of the HTTP requests that the browser makes.

We get something like this:

Screenshot of Chrome DevTool's Network tab showing requests made when doing the "anarchist" Koha catalog search.

There’s a lot to like here: every request was made using HTTPS rather than HTTP, and almost all of the requests were made to the Koha server. (If you can’t trust the library catalog, who can you trust? Well… that doesn’t have an answer as clear as we would like, but I won’t tackle that question here.)

However, the two cover images on the result’s page come from Amazon:

https://images-na.ssl-images-amazon.com/images/P/0974458902.01.TZZZZZZZ.jpg
https://images-na.ssl-images-amazon.com/images/P/1849350949.01.TZZZZZZZ.jpg

What did I trade in exchange for those two cover images? Let’s click on the request on and see:

:authority: images-na.ssl-images-amazon.com
:method: GET
:path: /images/P/0974458902.01.TZZZZZZZ.jpg
:scheme: https
accept: image/webp,image/apng,image/,/*;q=0.8
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9
cache-control: no-cache
dnt: 1
pragma: no-cache
referer: https://catalog.libraryguardians.com/cgi-bin/koha/opac-search.pl?q=anarchist
sec-fetch-dest: image
sec-fetch-mode: no-cors
sec-fetch-site: cross-site
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.116 Safari/537.36

Here’s what was sent when I used Firefox:

Host: images-na.ssl-images-amazon.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:73.0) Gecko/20100101 Firefox/73.0
Accept: image/webp,/
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Referer: https://catalog.libraryguardians.com/cgi-bin/koha/opac-search.pl?q=anarchist
DNT: 1
Pragma: no-cache

Amazon also knows what my IP address is. With that, it doesn’t take much to figure out that I am in Georgia and am clearly up to no good; after all, one look at the Referer header tells all.

Let’s switch over to using Google Book’s cover images:

https://books.google.com/books/content?id=phzFwAEACAAJ&printsec=frontcover&img=1&zoom=5
https://books.google.com/books/content?id=wdgrJQAACAAJ&printsec=frontcover&img=1&zoom=5

This time, the request headers are in Chrome:

:authority: books.google.com
:method: GET
:path: /books/content?id=phzFwAEACAAJ&printsec=frontcover&img=1&zoom=5
:scheme: https
accept: image/webp,image/apng,image/,/*;q=0.8
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9
cache-control: no-cache
dnt: 1
pragma: no-cache
referer: https://catalog.libraryguardians.com/
sec-fetch-dest: image
sec-fetch-mode: no-cors
sec-fetch-site: cross-site
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.116 Safari/537.36
x-client-data: CKO1yQEIiLbJAQimtskBCMG2yQEIqZ3KAQi3qsoBCMuuygEIz6/KAQi8sMoBCJe1ygEI7bXKAQiNusoBGKukygEYvrrKAQ==

and in Firefox:

Host: books.google.com
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:73.0) Gecko/20100101 Firefox/73.0
Accept: image/webp,/
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate, br
Connection: keep-alive
Referer: https://catalog.libraryguardians.com/
DNT: 1
Pragma: no-cache
Cache-Control: no-cache

On the one hand… the Referer now contains only the base URL of the catalog. I believe this is due to a difference in how Koha figures out the correct image URL. When using Amazon for cover images, the ISBN of the title is normalized and used to construct a URL for an <img> tag. Koha doesn’t currently set a Referrer-Policy, so the default of no-referrer-when-downgrade is used and the full referrer is sent. Google Book’s cover image URLs cannot be directly constructed like that, so a bit of JavaScript queries a web service and gets back the image URLs, and for reasons that are unclear to me at the moment, doesn’t send the full URL as the referrer. (Cover images from OpenLibrary are fetched in a similar way, but full Referer header is sent.)

As a side note, the x-client-data header sent by Chrome to books.google.com is… concerning.

There are some relatively simple things that can be done to limit leaking the full referring URL to the likes of Google and Amazon, including

  • Setting the Referrer-Policy header via web server configuration or meta tag to something like origin or origin-when-cross-origin.
  • Setting referrerpolicy for <script> and <img> tags involved in fetching book jackets.

This would help, but only up to a point: fetching https://books.google.com/books/content?id=wdgrJQAACAAJ&printsec=frontcover&img=1&zoom=5 still tells Google that a web browser at your IP address has done something to fetch the book jacket image for The Anarchist Cookbook. Suspicious!

What to do? Ultimately, if we’re going to use free third-party services to provide cover images for library catalogs, our options to do so in a way that preserves patron privacy boil down to:

  • Only use sources that we trust to not broadcast or misuse the information that gets sent in the course of requesting the images. The Open Library might qualify, but ultimately isn’t beholden to any particular library that uses its data.
  • Proxy image requests through the library catalog server. Evergreen does this in some cases, and it wouldn’t be much work to have Koha do something similar. It should be noted that Coce does not help in the case of Koha, as all it does is proxy image URLs, meaning that it’s still the user’s web browser fetching the actual images.
  • Figure out a way to obtain local copies of the cover images and serve them from the library’s web server. Sometimes this is necessary anyway for libraries that collect stuff that wasn’t commercially sold in the past couple decades, but otherwise this is a lot of work.
  • Do nothing and figure that Amazon and Google aren’t trawling through their logs correlate cover image retrieval with the potential reading interests. I actually have a tiny bit of sympathy to that approach — it’s not beyond the realm of possibility that cover image access logs are simply getting ignored, unlike say, direct usage data from Kindle or Google Books — but ostriches sticking their head in the sand are not known as a good model for due diligence.

Non-free book jacket and added content services are also an option, of course — and at least unlike Google and Amazon, it’s plausible that libraries could insist on contracts (with teeth) that forbid misuse of patron information.

My thanks to Callan Bignoli for the tweet that inspired this ramble.

It almost doesn’t need to be said that old-fashioned library checkout cards were terrible for patron privacy. Want to know who had checked out a book? Just take the card out of its pocket and read.

It’s also a trivial observation that there’s a mini-genre of news articles and social media posts telling the tales of prodigal books, returning to their library after years or decades away, usually having gathered nothing but dust.

Put these two together on a slow news day? Without care, you can end up not protecting a library user’s right to privacy and confidentially with respect to resources borrowed, to borrow some words from the ALA Code of Ethics.

Faced with this, one’s sense of proportion may ask, “so what?” The borrower of a book returned sixty years late is quite likely dead, and if alive, not likely to suffer any social opprobrium or even sixty years of accumulated overdue fines.  Even if the book in question was a copy of The Anarchist Cookbook, due back on Tuesday, 11 May 1976, the FBI no doubt has lost interest in the matter.

Of course, an immediate objection to that attitude is that personal harm to the patron remains possible, even if not probable. Sometimes the borrower wants to keep a secret to the grave. They may simply not care to be the subject of a local news story.

The potential for personal harm to the borrower is of course clearer if we consider more recent loans. It’s not the job of a librarian to out somebody who wishes to remain in the closet; it remains the case that somebody who does not care to have another snoop on their reading should be entitled to read, and think, in peace.

At this point, the sense of proportion that has somehow embodied itself in this post may rejoin, “you’re catastrophizing here, Charlton,” and not be entirely wrong. Inadvertent disclosure of patron information at the “retail” level does risk causing harm, but is not guaranteed to. After all, lots of people have no problem sharing (some) of their reading history. Otherwise, LibraryThing and Goodreads would just sit there gathering tumbleweeds.

I’d still bid that sense of proportion to shuffle off with this: it’s mostly not the librarians bearing the risk of harm.

However, there’s a larger point: libraries nowadays run much higher risks of violating patron privacy at the “wholesale” level than they used to.

Remember those old checkout cards? Back in the day, an outsider trying to get a borrower’s complete reading history might have to turn out every book in the library to do so. Today, it can be much easier: find a way in, and you can have everything (including driver’s license numbers, addressees, and, if the patrons are really ill-served by their library, SSNs).

That brings me to my point: we should care about nondisclosure (and better yet, non-collection of data we don’t need) at the retail level to help bolster a habit of caring about it at the wholesale level.

Imagine a library where people at every level can feel free to point out and correct patron privacy violations — and know that they should. Where the social media manager — whose degree may not be an MLS — redacts patron names and/or asks for permission every time.  Where, and more to my point, the director and the head of IT make technology choices that protect patron privacy — because they are in the habit of thinking about patron privacy in the first place.

This is why it’s worth it to sweat the small disclosures, to be better prepared against large ones.

The other day, school librarian and author Jennifer Iacopelli tweeted about her experience helping a student whose English paper had been vandalized by some boys. After she had left the Google Doc open in the library computer lab when she went home, they had inserted some “inappropriate” stuff. When she and her mom went to work on it later that evening, mom saw the insertions, was appalled, and grounded the student. Iacopelli, using security camera footage from the library’s computer lab, was able to demonstrate that the boys were responsible, with the result that the grounding was lifted and the boys suspended.

This story has gotten retweeted 1,300 times as of this writing and earned Iacopelli a mention as a “badass librarian” in HuffPo.

Before I continue, I want to acknowledge that there isn’t much to complain about regarding the outcome: justice was served, and mayhap the boys in question will think thrice before attacking the reputation of another or vandalizing their work.

Nonetheless, I do not count this as an unqualified feel-good story.

I have questions.

Was there no session management software running on the lab computers that would have closed off access to the document when she left at the end of the class period? If not, the school should consider installing some. On the other hand, I don’t want to hang too much on this pin; it’s possible that some was running but that a timeout hadn’t been reached before the boys got to the computer.

How long is security camera footage from the library computer lab retained? Based on the story, it sounds like it is kept at least 24 hours. Who, besides Iacopelli, can access it? Are there procedures in place to control access to it?

More fundamentally: is there a limit to how far student use of computers in that lab is monitored? Again, I do not fault the outcome in this case—but neither am I comfortable with Iacopelli’s embrace of surveillance.

Let’s consider some of the lessons learned. The victim learned that adults in a position of authority can go to bat for her and seek and acquire justice; maybe she will be inspired to help others in a similar position in the future. She may have learned a bit about version control.

She also learned that surveillance can protect her.

And well, yes. It can.

But I hope that the teaching continues—and not the hard way. Because there are other lessons to learn.

Surveillance can harm her. It can cause injustice, against her and others. Security camera footage sometimes doesn’t catch the truth. Logs can be falsified. Innocent actions can be misconstrued.

Her thoughts are her own.

And truly badass librarians will protect that.

From a security alert 1 from Langara College:

Langara was recently notified of a cyber security risk with Pearson online learning which you may be using in your classes. Pearson does not encrypt user names or passwords for the services we use, which puts you at risk. Please note that they are an external vendor; therefore, this security flaw has no direct impact on Langara systems.

This has been a problem since at least 20112; it is cold comfort that at least one Pearson service has a password recovery page that outright says that the user’s password will be emailed to them in clear text3.

There have been numerous tweets, blog posts, and forum posts about this issue over the years. In at least one case4, somebody complained to Pearson and ended up getting what reads like a canned email stating:

Pearson must strike a reasonable balance between support methods that are accessible to all users, and the risk of unauthorized access to information in our learning applications. Allowing customers to retrieve passwords via email was an industry standard for non-financial applications.

In response to the changing landscape, we are developing new user rights management protocols as part of a broader commitment to tighten security and safeguard customer accounts, information, and product access. Passwords will no longer be retrievable; customers will be able to reset passwords through secure processes.

This is a risible response for many reasons; I can only hope that they actually follow through with their plan to improve the situation in a timely fashion. Achieving the industry standard for password storage as of 1968 might be a good start5.

In the meantime, I’m curious whether there are any libraries who are directly involved in the acquisition of Pearson services on behalf of their school or college. If so, might you have a word with your Pearson rep?

Adapted from an email I sent to the LITA Patron Privacy Interest Group’s mailing list. I encourage folks interested in library patron privacy to subscribe; you do not have to be a member of ALA to do so.

Footnotes

1. Pearson Cyber Security Risk
2. Report on Plain Text Offenders
3. Pearson account recovery page
4. Pearson On Password Security
5. Wilkes, M V. Time-sharing Computer Systems. New York: American Elsevier Pub. Co, 1968. Print.. It was in this book that Roger Needham first proposed hashing passwords.

There’s often more than way to search a library catalog; or to put it another way, not all users come in via the front door.  For example, ensuring that your public catalog supports HTTPS can help prevent bad actors from snooping on patron’s searches — but if one of your users happens to use a tool that searches your catalog over Z39.50, by default they have less protection.

Consider this extract from a tcpdump of a Z39.50 session:

02:32:34.657140 IP (tos 0x0, ttl 64, id 26189, offset 0, flags [DF], proto TCP (6), length 1492)
    localhost.9999 > localhost.36545: Flags [P.], cksum 0x03c9 (incorrect -> 0x00cc), seq 10051:11491, ack 235, win 256, options [nop,nop,TS val 2278124301 ecr 2278124301], length 1440
E...fM@.@...........'.....x.KEt>...........
.............0.......(...*.H...
...p01392pam a2200361 a 4500001000500000003000500005005001700010008004100027035002100068852004900089852004900138852004900187906004500236955012300281010001700404020002800421020002800449040001800477050002300495082001600518245014300534260003500677300002400712440002900736504005100765650004300816700001800859700002800877700002800905991006200933905001000995901002501005.1445.CONS.19931221140705.2.930721s1993    mau      b    001 0 eng  .  .9(DLC)   93030748.4 .aStacks.bBR1.cACQ3164.dBR1.gACQ3202.nOn order.4 .aStacks.bBR1.cACQ3164.dBR1.gACQ3165.nOn order.4 .aStacks.bBR1.cACQ3164.dBR1.gACQ3164.nOn order.  .a7.bcbc.corignew.d1.eocip.f19.gy-gencatlg.  .apc03 to ja00 07-21-93; je39 07-22-93; je08 07-22-93; je05 to DDC 07-23-93; aa21 07-26-93; CIP ver. jf05 to sl 12/21/93.  .a   93030748 .  .a3764336242 (alk. paper).  .a0817636242 (alk. paper).  .aDLC.cDLC.dDLC.00.aQC173.6.b.A85 1993.00.a530.1/1.220.04.aThe Attraction of gravitation :.bnew studies in the history of general relativity /.cJohn Earman, Michel Janssen, John D. Norton, editord..  .aBoston :.bBirkh..user,.cc1993..  .ax, 432 p. ;.c24 cm.. 0.aEinstein studies ;.vv. 5.  .aIncludes bibliographical references and index.. 0.aGeneral relativity (Physics).xHistory..1 .aEarman, John..1 .aJanssen, Michel,.d1953-.1 .aNorton, John D.,.d1960-.  .bc-GenColl.hQC173.6.i.A85 1993.p00018915972.tCopy 1.wBOOKS.  .ugalen.  .a1445.b.c1445.tbiblio..............

No, MARC is not a cipher; it just isn’t.

How to improve this state of affairs? There was some discussion back in 2000 of bundling SSL or TLS into the Z39.50 protocol, although it doesn’t seem like it went anywhere. Of course, SSH tunnels and stunnel are options, but it turns out that there can be an easier way.

As is usually the case with anything involving Z39.50, we can thank the folks at IndexData for being on top of things: it turns out that TLS support is easily enabled in YAZ. Here’s how this can be applied to Evergreen and Koha.

The first step is to create an SSL certificate; a self-signed one probably suffices. The certificate and its private key should be concatenated into a single PEM file, like this:

-----BEGIN CERTIFICATE-----
...
-----END CERTIFICATE-----
-----BEGIN PRIVATE KEY-----
...
-----END PRIVATE KEY-----

Evergreen’s Z39.50 server can be told to require SSL via a <listen> element in /openils/conf/oils_yaz.xml, like this:

    ssl:@:4210
    
  
        
...

To supply the path to the certificate, a change to oils_ctl.sh will do the trick:

diff --git a/Open-ILS/examples/oils_ctl.sh b/Open-ILS/examples/oils_ctl.sh
index dde70cb..692ec00 100755
--- a/Open-ILS/examples/oils_ctl.sh
+++ b/Open-ILS/examples/oils_ctl.sh
@@ -6,6 +6,7 @@ OPT_PID_DIR="LOCALSTATEDIR/run"
 OPT_SIP_ERR_LOG="LOCALSTATEDIR/log/oils_sip.log";
 OPT_Z3950_CONFIG="SYSCONFDIR/oils_z3950.xml"
 OPT_YAZ_CONFIG="SYSCONFDIR/oils_yaz.xml"
+OPT_YAZ_CERT="SYSCONFDIR/yaz_ssl.pem"
 Z3950_LOG="LOCALSTATEDIR/log/oils_z3950.log"
 SIP_DIR="/opt/SIPServer";

@@ -115,7 +116,7 @@ function stop_sip {

 function start_z3950 {
        do_action "start" $PID_Z3950 "OILS Z39.50 Server";
-       simple2zoom -c $OPT_Z3950_CONFIG -- -f $OPT_YAZ_CONFIG >> "$Z3950_LOG" 2>&1 &
+       simple2zoom -c $OPT_Z3950_CONFIG -- -C $OPT_YAZ_CERT -f $OPT_YAZ_CONFIG >> "$Z3950_LOG" 2>&1
        pid=$!;
        echo $pid > $PID_Z3950;
        return 0;

For Koha, a <listen> element should be added to koha-conf.xml, e.g.,


ssl:@:4210

zebrasrv will also need to know how to find the SSL certificate:

diff --git a/misc/bin/koha-zebra-ctl.sh b/misc/bin/koha-zebra-ctl.sh
index 3b9cd81..63f0d9c 100755
--- a/misc/bin/koha-zebra-ctl.sh
+++ b/misc/bin/koha-zebra-ctl.sh
@@ -37,7 +37,8 @@ RUNDIR=__ZEBRA_RUN_DIR__
 LOCKDIR=__ZEBRA_LOCK_DIR__
 # you may need to change this depending on where zebrasrv is installed
 ZEBRASRV=__PATH_TO_ZEBRA__/zebrasrv
-ZEBRAOPTIONS="-v none,fatal,warn"
+YAZ_CERT=__KOHA_CONF_DIR__/zebra-ssl.pem
+ZEBRAOPTIONS="-C $YAZ_CERT -v none,fatal,warn"

 test -f $ZEBRASRV || exit 0

And with that, we can test: yaz-client ssl:localhost:4210/CONS or yaz-client ssl:localhost:4210/biblios. Et voila!

02:47:16.655628 IP localhost.4210 > localhost.41440: Flags [P.], seq 86:635, ack 330, win 392, options [nop,nop,TS val 116332994 ecr 116332994], length 549
E..Y..@.@.j..........r...............N.....
............ 2.........,lS...J6...5.p...,<]0....r.....m....Y.H*.em......`....s....n.%..KV2.];.Z..aP.....C..+.,6..^VY.......>..j...D..L..J...rB!............k....9..%H...?bu[........?<       R.......y.....S.uC.2.i6..X..E)..Z..K..J..q   ..m.m.%.r+...?.l....._.8).p$.H.R2...5.|....Q,..Q....9...F.......n....8 ...R.`.&..5..s.q....(.....z9...R..oD............D...jC..?O.+....,7.i.BT...*Q
...5..\-M...1.<t;...8...(.8....a7.......@.b.`n#.$....4...:...=...j....^.0..;..3i.`. f..g.|"l......i.....<n(3x......c.om_<w...p.t...`="" h..8.s....(3.......rz.1s="" ...@....t....="" <="" pre="">

Of course, not every Z39.50 client will know how to use TLS… but lots will, as YAZ is the basis for many of them.

</t;...8...(.8....a7.......@.b.`n#.$....4...:...=...j....^.0..;..3i.`.>

The other day I made this blog, galencharlton.com/blog/, HTTPS-only.  In other words, if Eve want to sniff what Bob is reading on my blog, she’ll need to do more than just capture packets between my blog and Bob’s computer to do so.

This is not bulletproof: perhaps Eve is in possession of truly spectacular computing capabilities or a breakthrough in cryptography and can break the ciphers. Perhaps she works for any of the sites that host external images, fonts, or analytics for my blog and has access to their server logs containing referrer headers information.  Currently these sites are Flickr (images), Gravatar (more images), Google (fonts) or WordPress (site stats – I will be changing this soon, however). Or perhaps she’s installed a keylogger on Bob’s computer, in which case anything I do to protect Bob is moot.

Or perhaps I am Eve and I’ve set up a dastardly plan to entrap people by recording when they read about MARC records, then showing up at Linked Data conferences and disclosing that activity.  Or vice versa. (Note: I will not actually do this.)

So, yes – protecting the privacy of one’s website visitors is hard; often the best we can do is be better at it than we were yesterday.

To that end, here are some notes on how I made my blog require HTTPS.

Certificates

I got my SSL certificate from Gandi.net. Why them?  Their price was OK, I already register my domains through them, and I like their corporate philosophy: they support a number of free and open source software projects; they’re not annoying about up-selling, and they have never (to my knowledge) run sexist advertising, unlikely some of their larger and more well-known competitors. But there are, of course, plenty of options for getting SSL certificates, and once Let’s Encrypt is in production, it should be both cheaper and easier for me to replace the certs next year.

I have three subdomains of galencharlton.com that I wanted a certificate for, so I decided to get a multi-domain certificate.  I consulted this tutorial by rtCamp to generate the CSR.

After following the tutorial to create a modified version of openssl.conf specifying the subjectAltName values I needed, I generated a new private key and a certificate-signing request as follows:

openssl req -new -key galencharlton.com.key \
  -out galencharlton.com.csr \
  -config galencharlton.com.cnf \
  -sha256

The openssl command asked me a few questions; the most important of which being the value to set the common name (CN) field; I used “galencharlton.com” for that, as that’s the primary domain that the certificate protects.

I then entered the text of the CSR into a form and paid the cost of the certificate.  Since I am a library techie, not a bank, I purchased a domain-validated certificate.  That means that all I had to prove to the certificate’s issuer that I had control of the three domains that the cert should cover.  That validation could have been done via email to an address at galencharlton.com or by inserting a special TXT field to the DNS zone file for galencharlton.com. I ended up choosing to go the route of placing a file on the web server whose contents and location were specified by the issuer; once they (or rather, their software) downloaded the test files, they had some assurance that I had control of the domain.

In due course, I got the certificate.  I put it and the intermediate cert specified by Gandi in the /etc/ssl/certs directory on my server and the private key in /etc/private/.

Operating System and Apache configuration

Various vulnerabilities in the OpenSSL library or in HTTPS itself have been identified and mitigated over the years: suffice it to say that it is a BEASTly CRIME to make a POODLE suffer a HeartBleed — or something like that.

To avoid the known problems, I wanted to ensure that I had a recent enough version of OpenSSL on the web server and had configured Apache to disable insecure protocols (e.g., SSLv3) and eschew bad ciphers.

The server in question is running Debian Squeeze LTS, but since OpenSSL 1.0.x is not currently packaged for that release, I ended up adding Wheezy to the APT repositories list and upgrading the openssl and apache2 packages.

For the latter, after some Googling I ended up adapting the recommended Apache SSL virtualhost configuration from this blog post by Tim Janik.  Here’s what I ended up with:

<VirtualHost _default_:443>
    ServerAdmin gmc@galencharlton.com
    DocumentRoot /var/www/galencharlton.com
    ServerName galencharlton.com
    ServerAlias www.galencharlton.com

    SSLEngine on
    SSLCertificateFile /etc/ssl/certs/galencharlton.com.crt
    SSLCertificateChainFile /etc/ssl/certs/GandiStandardSSLCA2.pem
    SSLCertificateKeyFile /etc/ssl/private/galencharlton.com.key
    Header add Strict-Transport-Security "max-age=15552000"

    # No POODLE
    SSLProtocol all -SSLv2 -SSLv3 +TLSv1.1 +TLSv1.2
    SSLHonorCipherOrder on
    SSLCipherSuite "EECDH+ECDSA+AESGCM EECDH+aRSA+AESGCM EECDH+ECDSA+SHA384 EECDH+ECDSA+SHA256 EECDH+
aRSA+SHA384 EECDH+aRSA+SHA256 EECDH+AESGCM EECDH EDH+AESGCM EDH+aRSA HIGH !MEDIUM !LOW !aNULL !eNULL
!LOW !RC4 !MD5 !EXP !PSK !SRP !DSS"

</VirtualHost>

I also wanted to make sure that folks coming in via old HTTP links would get permanently redirected to the HTTPS site:

<VirtualHost *:80>
    ServerName galencharlton.com
    Redirect 301 / https://galencharlton.com/
</VirtualHost>

<VirtualHost *:80>
    ServerName www.galencharlton.com
    Redirect 301 / https://www.galencharlton.com/
</VirtualHost>

Checking my work

I’m a big fan of the Qualsys SSL Labs server test tool, which does a number of things to test how well a given website implements HTTPS:

  • Identifying issues with the certificate chain
  • Whether it supports vulnerable protocol versions such as SSLv3
  • Whether it supports – and request – use of sufficiently strong ciphers.
  • Whether it is vulnerable to common attacks.

Suffice it to say that I required a couple iterations to get the Apache configuration just right.

WordPress

To be fully protected, all of the content embedded on a web page served via HTTPS must also be served via HTTPS.  In other words, this means that image URLs should require HTTPS – and the redirects in the Apache config are not enough.  Here is the sledgehammer I used to update image links in the blog posts:

create table bkp_posts as select * from wp_posts;

begin;
update wp_posts set post_content = replace(post_content, 'http://galen', 'https://galen') where post_content like '%http://galen%';
commit;

Whee!

I also needed to tweak a couple plugins to use HTTPS rather than HTTP to embed their icons or fetch JavaScript.

Finishing touches

In the course of testing, I discovered a couple more things to tweak:

  • The web sever had been using Apache’s mod_php5filter – I no longer remember why – and that was causing some issues when attempting to load the WordPress dashboard.  Switching to mod_php5 resolved that.
  • My domain ownership proof on keybase.io failed after the switch to HTTPS.  I eventually tracked that down to the fact that keybase.io doesn’t have a bunch of intermediate certificates in its certificate store that many browsers do. I resolved this by adding a cross-signed intermediate certificate to the file referenced by SSLCertificateChainFile in the Apache config above.

My blog now has an A+ score from SSL Labs. Yay!  Of course, it’s important to remember that this is not a static state of affairs – another big OpenSSL or HTTPS protocol vulnerability could turn that grade to an F.  In other words, it’s a good idea to test one’s website periodically.

At the first face-to-face meeting of the LITA Patron Privacy Technologies Interest Group at Midwinter, one of the attendees mentioned that they had sent out an RFP last year for library databases. One of the questions on the RFP asked how user passwords were stored — and a number of vendors responded that their systems stored passwords in plain text.

Here’s what I tweeted about that, and here is Dorothea Salo’s reply:

https://twitter.com/LibSkrat/status/561605951656976384

This is a repeatable response, by the way — much like the way a hammer strike to the patellar ligament instigates a reflexive kick, mention of plain-text password storage will trigger an instinctual wail from programmers, sysadmins, and privacy and security geeks of all stripes.

Call it the Vanilla Password Reflex?

I’m not suggesting that you should whisper “plain text passwords” into the ear of your favorite system designer, but if you are the sort to indulge in low and base amusements…

A recent blog post by Eric Hellman discusses the problems with storing passwords in plain text in detail. The upshot is that it’s bad practice — if a system’s password list is somehow leaked, and if the passwords are stored in plain text, it’s trivially easy for a cracker to use those passwords to get into all sorts of mischief.

This matters, even “just” for library reference databases. If we take the right to reader privacy seriously, it has to extend to the databases offered by the library — particularly since many of them have features to store citations and search results in a user’s account.

As Eric mentions, the common solution is to use a one-way cryptographic hash function to transform the user’s password into a bunch of gobbledegook.

For example, “p@ssw05d” might be stored as the following hash:

d242b6313f32c8821bb75fb0660c3b354c487b36b648dde2f09123cdf44973fc

To make it more secure, I might add some random salt and end up with the following salted hash:

$2355445aber$76b62e9b096257ac4032250511057ac4d146146cdbfdd8dd90097ce4f170758a

To log in, the user has to prove that they know the password by supplying it, but rather than compare the password directly, the result of the one-way function applied to the password is compared with the stored hash.

How is this more secure? If a hacker gets the list of password hashes, they won’t be able to deduce the passwords, assuming that the hash function is good enough. What counts as good enough? Well, relatively few programmers are experts in cryptography, but suffice it to say that there does exist a consensus on techniques for managing passwords and authentication.

The idea of one-way functions to encrypt passwords is not new; in fact, it dates back to the 1960s. Nowadays, any programmer who wants to be considered a professional really has no excuse for writing a system that stores passwords in plain text.

Back to the “Vanilla Password Reflex”. It is, of course, not actually a reflex in the sense of an instinctual response to a stimulus — programmers and the like get taught, one way or another, about why storing plain text passwords is a bad idea.

Where does this put the public services librarian? Particularly the one who has no particular reason to be well versed in security issues?

At one level, it just changes the script. If a system is well-designed, if a user asks what their password is, it should be impossible to get an answer to the question. How to respond to a patron who informs you that they’ve forgotten their password? Let them know that you can change it for them. If they respond by wondering why you can’t just tell them, if they’re actually interested in the answer, tell them about one-way functions — or just blame the computer, that’s fine too if time is short.

However, libraries and librarians can have a broader role in educating patrons about online security and privacy practices: leading by example. If we insist that the online services we recommend follow good security design; if we use HTTPS appropriately; if we show that we’re serious about protecting reader privacy, it can only buttress programming that the library may offer about (say) using password managers or avoiding phishing and other scams.

There’s also a direct practical benefit: human nature being what it is, many people use the same password for everything. If you crack an ILS’s password list, you’ve undoubtedly obtained a non-negligible set of people’s online banking passwords.

I’ll end this with a few questions. Many public services librarians have found themselves, like it or not, in the role of providing technical support for e-readers, smartphones, and laptops. How often does online security come up during such interactions? How often to patrons come to the library seeking help against the online bestiary of spammers, phishers, and worse? What works in discussing online security with patrons, who of course can be found at all levels of computer savvy? And what doesn’t?

I invite discussion — not just in the comments section, but also on the mailing list of the Patron Privacy IG.

Shortly after it came to light that Adobe Digital Editions was transmitting information about ebook reading activity in the clear, for anybody to snoop upon, I asked a loaded question: does ALA have a role in helping to verify that the software libraries use protect the privacy of readers?

As with any loaded question, I had an answer in mind: I do think that ALA and LITA, by virtue of their institutional heft and influence with librarians, can provide significant assistance in securing library software.

I waited a bit, wondering how the powers that be at ALA would respond. Then I remembered something: an institution like ALA is not, in fact, a faceless, inscrutable organism. Like Soylent Green, ALA is people!

Well, maybe not so much like Soylent Green. My point is that despite ALA’s reputation for being a heavily bureaucratic, procedure-bound organization, it does offer ways for members to take up and idea an run with it.

And that’s what I did — I floated a petition to form a new interest group within LITA, the Patron Privacy Technologies IG. Quite a few people signed it… and it now lives!

Here’s the charge of the IG:

The LITA Patron Privacy Technologies Interest Group will promote the design and implementation of library software and hardware that protects the privacy of library users and maximizes user ability to make informed decisions about the use of personally identifiable information by the library and its vendors.

Under this remit, activities of the Interest Group would include, but are not necessarily limited to:

  1. Publishing recommendations on data security practices for library software.
  2. Publishing tutorials on tools for libraries to use to check that library software is handling patron information responsibly.
  3. Organizing efforts to test commercially available software that handle patron information.
  4. Providing a conduit for responsible disclosure of defects in software that could lead to exposure of library patron information.
  5. Providing sample publicity materials for libraries to use with their patrons in explaining the library’s privacy practices.

I am fortunate to have two great co-chairs, Emily Morton-Owens of the Seattle Public Library and Matt Beckstrom of the Lewis and Clark Library, and I’m happy to announce that the IG’s first face-to-face meeting will at ALA Midwinter 2015 — specifically  tomorrow, at 8:30 a.m. Central Time in the Ballroom 1 of the Sheraton in Chicago.

We have two great speakers lined up — Alison Macrina of the Library Freedom Project and Gary Price of INFODocket, and I’m very much looking forward to it.

But I’m also looking forward to the rest of the meeting: this is when the IG will, as a whole, decide how far to reach.  We have a lot of interest and the ability to do things that will teach library staff and our patrons how to better protect privacy, teach library programmers how to design and code for privacy, and verify that our tools match our ideals.

Despite the title of this blog post… it’s by no means my effort alone that will get us anywhere. Many people are already engaging in issues of privacy and technology in libraries, but I do hope that the IG will provide one more point of focus for our efforts.

I look forward to the conversation tomorrow.

I recently circulated a petition to start a new interest group within LITA, to be called the Patron Privacy Technologies IG.  I’ve submitted the formation petition to the LITA Council, and a vote on the petition is scheduled for early November.  I also held an organizational meeting with the co-chairs; I’m really looking forward to what we all can do to help improve how our tools protect patron privacy.

But enough about the IG, let’s talk about the petition! To be specific, let’s talk about when the signatures came in.

I’ve been on Twitter since March of 2009, but a few months ago I made the decision to become much more active there (you see, there was a dearth of cat pictures on Twitter, and I felt it my duty to help do something about it).  My first thought was to tweet the link to a Google Form I created for the petition. I did so at 7:20 a.m. Pacific Time on 15 October:

Since I wanted to gauge whether there was interest beyond just LITA members, I also posted about the petition on the ALA Think Tank Facebook group at 7:50 a.m. on the 15th.

By the following morning, I had 13 responses: 7 from LITA members, and 6 from non-LITA members. An interest group petition requires 10 signatures from LITA members, so at 8:15 on the 16th, I sent another tweet, which got retweeted by LITA:

By early afternoon, that had gotten me one more signature. I was feeling a bit impatient, so at 2:28 p.m. on the 16th, I sent a message to the LITA-L mailing list.

That opened the floodgates: 10 more signatures from LITA members arrived by the end of the day, and 10 more came in on the 17th. All told, a total of 42 responses to the form were submitted between the 15th and the 23rd.

The petition didn’t ask how the responder found it, but if I make the assumption that most respondents filled out the form shortly after they first heard about it, I arrive at my bit of anecdata: over half of the petition responses were inspired by my post to LITA-L, suggesting that the mailing list remains an effective way of getting the attention of many LITA members.

By the way, the petition form is still up for folks to use if they want to be automatically subscribed to the IG’s mailing list when it gets created.

Yesterday I did some testing of version 4.0.1 of Adobe Digital Editions and verified that it is now using HTTPS when sending ebook usage data to Adobe’s server adelogs.adobe.com.

Of course, because the HTTPS protocol encrypts the datastream to that server, I couldn’t immediately verify that ADE was sending only the information that the privacy statement says it is.

Emphasis is on the word “immediately”.  If you want to find out what a program is sending via HTTPS to a remote server, there are ways to get in the middle.  Here’s how I did this for ADE:

  1. I edited the hosts file to refer “adelogs.adobe.com” to the address of a server under my control.
  2. I used the CA.pl script from openssl to create a certificate authority of my very own, then generated an SSL certificate for “adelogs.adobe.com” signed by that CA.
  3. I put the certificate for my new certificate authority into the trusted root certificates store on my Windows 7 deskstop.
  4. I put the certificate in place on my webserver and wrote a couple simple CGI scripts to emulate the ADE logging data collector and capture what got sent to them.

I then started up ADE and flipped through a few pages of an ebook purchased from Kobo.  Here’s an example of what is now getting sent by ADE (reformatted a bit for readability):

"id":"F5hxneFfnj/dhGfJONiBeibvHOIYliQzmtOVre5yctHeWpZOeOxlu9zMUD6C+ExnlZd136kM9heyYzzPt2wohHgaQRhSan/hTU+Pbvo7ot9vOHgW5zzGAa0zdMgpboxnhhDVsuRL+osGet6RJqzyaXnaJXo2FoFhRxdE0oAHYbxEX3YjoPTvW0lyD3GcF2X7x8KTlmh+YyY2wX5lozsi2pak15VjBRwl+o1lYQp7Z6nbRha7wsZKjq7v/ST49fJL",
"h":"4e79a72e31d24b34f637c1a616a3b128d65e0d26709eb7d3b6a89b99b333c96e",
"d":[  
   {  
      "d":"ikN/nu8S48WSvsMCQ5oCrK+I6WsYkrddl+zrqUFs4FSOPn+tI60Rg9ZkLbXaNzMoS9t6ACsQMovTwW5F5N8q31usPUo6ps9QPbWFaWFXaKQ6dpzGJGvONh9EyLlOsbJM"
   },
   {  
      "d":"KR0EGfUmFL+8gBIY9VlFchada3RWYIXZOe+DEhRGTPjEQUm7t3OrEzoR3KXNFux5jQ4mYzLdbfXfh29U4YL6sV4mC3AmpOJumSPJ/a6x8xA/2tozkYKNqQNnQ0ndA81yu6oKcOH9pG+LowYJ7oHRHePTEG8crR+4u+Q725nrDW/MXBVUt4B2rMSOvDimtxBzRcC59G+b3gh7S8PeA9DStE7TF53HWUInhEKf9KcvQ64="
   },
   {  
      "d":"4kVzRIC4i79hhyoug/vh8t9hnpzx5hXY/6g2w8XHD3Z1RaCXkRemsluATUorVmGS1VDUToDAvwrLzDVegeNmbKIU/wvuDEeoCpaHe+JOYD8HTPBKnnG2hfJAxaL30ON9saXxPkFQn5adm9HG3/XDnRWM3NUBLr0q6SR44bcxoYVUS2UWFtg5XmL8e0+CRYNMO2Jr8TDtaQFYZvD0vu9Tvia2D9xfZPmnNke8YRBtrL/Km/Gdah0BDGcuNjTkHgFNph3VGGJJy+n2VJruoyprBA0zSX2RMGqMfRAlWBjFvQNWaiIsRfSvjD78V7ofKpzavTdHvUa4+tcAj4YJJOXrZ2hQBLrOLf4lMa3N9AL0lTdpRSKwrLTZAFvGd8aQIxL/tPvMbTl3kFQiM45LzR1D7g=="
   },
   {  
      "d":"bSNT1fz4szRs/qbu0Oj45gaZAiX8K//kcKqHweUEjDbHdwPHQCNhy2oD7QLeFvYzPmcWneAElaCyXw+Lxxerht+reP3oExTkLNwcOQ2vGlBUHAwP5P7Te01UtQ4lY7Pz"
   }
]

In other words, it’s sending JSON containing… I’m not sure.

The values of the various keys in that structure are obviously Base 64-encoded, but when run through a decoder, the result is just binary data, presumably the result of another layer of encryption.

Thus, we haven’t actually gotten much further towards verifying that ADE is sending only the data they claim to.  That packet of data could be describing my progress reading that book purchased from Kobo… or it could be sending something else.

That extra layer of encryption might be done as protection against a real man-in-the-middle attack targeted at Adobe’s log server — or it might be obfuscating something else.

Either way, the result remains the same: reader privacy is not guaranteed. I think Adobe is now doing things a bit better than they were when they released ADE 4.0, but I could be wrong.

If we as library workers are serious about protection patron privacy, I think we need more than assurances — we need to be able to verify things for ourselves. ADE necessarily remains in the “unverified” column for now.