It came to light on Monday that the latest version of Adobe Digital Editions is sending metadata on ebooks that are read through the application to an Adobe server — in clear text.

I’ve personally verified the claim that this is happening, as have lots of other people. I particularly like Andromeda Yelton’s screencast, as it shows some of the steps that others can take to see this for themselves.

In particular, it looks like any ebook that has been opened in Digital Editions or added to a “library” there gets reported on. The original report by Nate Hofffelder at The Digital Reader also said that ebook that were not known to Digital Editions were being reported, though I and others haven’t seen that — but at the moment, since nobody is saying that they’ve decompiled the program and analyzed exactly when Digital Editions sends its reports, it’s possible that Nate simply fell into a rare execution pathUPDATE 10 October 2014: Yesterday I was able to confirm that if an ereader device is attached to a PC and is recognized by ADE, metadata from the books on that device can also be sent in the clear.

This move by Adobe, whether or not they’re permanently storing the ebook reading history, and whether or not they think they have good intentions, is bad for a number of reasons:

  • By sending the information in the clear, anybody can intercept it and choose to act on somebody’s choice of reading material.  This applies to governments, corporations, and unenlightened but technically adept parents.  And as far as state actors are concerned – it actually doesn’t matter that Digital Editions isn’t sending information like name and email addresses in the clear; the user’s IP address and the unique ID assigned by Digital Editions will often be sufficient for somebody to, with effort, link a reading history to an individual.
  • The release notes from Adobe gave no hint that Digital Editions was going to start doing this. While Amazon’s Kindle platform also keeps track of reading history, at least Amazon has been relatively forthright about it.
  • The privacy policy and license agreement similarly did not explicitly mention this. There has been some discussion to the effect that if one looks at those documents closely enough, that there is an implied suggestion that Adobe can capture and log anything one chooses to do with their software. But even if that’s the case – and I’m not sure that this argument would fly in countries with stronger data privacy protection than the U.S. – sending this information in the clear is completely inconsistent with modern security practices.
  • Digital Editions is part of the toolchain that a number of library ebook lending platforms use.

The last point is key. Everybody should be concerned about an app that spouts reading history in the clear, but librarians in particular have a professional responsibility to protect our user’s reading history.

What does it mean in the here and now? Some specific immediate steps I suggest for libraries is to:

  • Publicize the problem to their patrons.
  • Officially warn their patrons against using Digital Editions 4.0, and point to work arounds like pointing “adelogs.adobe.com” to “127.0.0.1” in hosts files.
  • If they must use Digital Editions to borrow ebooks, to recommend the use of earlier versions, which do not appear to be spying on users.

However, there are things that also need to be done in the long term.

Accepting DRM has been a terrible dilemma for libraries – enabling and supporting, no matter how passively, tools for limiting access to information flies against our professional values.  On the other hand, without some degree of acquiescence to it, libraries would be even more limited in their ability to offer current books to their patrons.

But as the Electronic Frontier Foundation points out,  DRM as practiced today is fundamentally inimical to privacy. If, following Andromeda Yelton’s post this morning, we value our professional soul, something has to give.

In other words, we have to have a serious discussion about whether we can responsibly support any level of DRM in the ebooks that we offer to our patrons.

But there’s a more immediate step that we can take. This whole thing came to light because a “hacker acquaintance” of Nate’s decided to see what Digital Editions is sending home. And a key point? Once the testing starting, it probably didn’t take that hacker more than half an hour to see what was going on, and it may well have taken only five.

While the library profession probably doesn’t count very many professional security researchers among its ranks, this sort of testing is not black magic.  Lots of systems librarians, sysadmins, and developers working for libraries already know how to use tcpdump and Wireshark and the like.

So what do we need to do? We need to stop blindly trusting our tools.  We need to be suspicious, in other words, and put anything that we would recommend to our patrons to the test to verify that it is not leaking patron information.

This is where organizations like ALA can play an important role.  Some things that ALA could do include:

  • Establishing a clearinghouse for reports of security and privacy violations in library software.
  • Distribute information on ways to perform security audits.
  • Do testing of library software in house and hire security researches as needed.
  • Provide institutional and legal support for these efforts.

That last point is key, and is why I’m calling on ALA in particular. There have been plenty of cases where software vendors have sued, or threatened to sue, folks who have pointed out security flaws. Rather than permitting that sort of chilling effect to be tolerated in the realm of library software, ALA can provide cover for individuals and libraries engaged in the testing that is necessary to protect our users.

A notion that haunts me is found in Neil Gaiman’s The Sandman: the library of the Dreaming, wherein can be found books that no earth-bound librarian can collect.  Books that caught existence only in the dreams – or passing thoughts – of their authors. The Great American Novel. Every Great American Novel, by all of the frustrated middle managers, farmers, and factory workers who had their heart attack too soon. Every Great Nepalese Novel.  The conclusion of the Wheel of Time, as written by Robert Jordan himself.

That library has a section containing every book whose physical embodiment was stolen.  All of the poems of Sappho. Every Mayan and Olmec text – including the ones that, in the real world, did not survive the fires of the invaders.

Books can be like cockroaches. Text thought long-lost can turn up unexpectedly, sometimes just by virtue of having been left lying around until someone things to take a closer look. It is not an impossible hope that one day, another Mayan codex may make its reappearance, thumbing its nose at the colonizers and censors who despised it and the culture and people it came from.

Books are also fragile. Sometimes the censors do succeed in utterly destroying every last trace of a book. Always, entropy threatens all.  Active measures against these threats are required; therefore, it is appropriate that librarians fight the suppression, banning, and challenges of books.

Banned Books Week is part of that fight, and is important that folks be aware of their freedom to read what they choose – and to be aware that it is a continual struggle to protect that freedom.  Indeed, perhaps “Freedom to Read Week” better expresses the proper emphasis on preserving intellectual freedom.

But it’s not enough.

I am also haunted by the books that are not to be found in the Library of the Dreaming – because not even the shadow of their genesis crossed the mind of those who could have written them.

Because their authors were shot for having the wrong skin color.

Because their authors were cheated of an education.

Because their authors were sued into submission for daring to challenge the status quo.  Even within the profession of librarianship.

Because their authors made the decision to not pursue a profession in the certain knowledge that the people who dominated it would challenge their every step.

Because their authors were convinced that nobody would care to listen to them.

Librarianship as a profession must consider and protect both sides of intellectual freedom. Not just consumption – the freedom to read and explore – but also the freedom to write and speak.

The best way to ban a book is to ensure that it never gets written. Justice demands that we struggle against those who would not just ban books, but destroy the will of those who would write them.

I am a firm believer in the power of open source to help libraries build the tools we need to help our patrons and our communities.

Our tools focus our effort. Our effort, of course, does not spring out of thin air; it’s rooted in people.

One of the many currencies that motivates people to contribute to free and open source projects is acknowledgment.

Here are some of the women I’d like to acknowledge for their contributions, direct or indirect, to projects I have been part of. Some of them I know personally, others I admire from afar.

  • Henriette Avram – Love it or hate it, where would we be without the MARC format? For all that we’ve learned about new and better ways to manage metadata, Avram’s work at the LC started the profession’s proud tradition of sharing its metadata in electronic format.
  • Ruth Bavousett – Ruth has been involved in Koha for years and served as QA team member and translation manager. She is also one of the most courageous women I have the privilege of knowing.
  • Karen Coyle – Along with Diane Hillmann, I look to Karen for leadership in revamping our metadata practices.
  • Nicole Engard – Nicole has also been involved in Koha for years as documentation manager. Besides writing most of Koha’s manual, she is consistently helpful to new users.
  • Katrin Fischer – Katrin is Koha’s current QA manager, and has and continues to perform a very difficult job with grace and less thanks than she deserves.
  • Ruth Frasur – Ruth is director of the Hagerstown Jefferson Township Public Library in Indiana, which is a member of Evergreen Indiana. Ruth is one of the very few library administrators I know who not only understands open source, but actively contributes to some of the nitty-gritty work of keeping the software documented.
  • Diane Hillmann – Another leader in library metadata.
  • Kathy Lussier – As the Evergreen project coordinator at MassLNC, Kathy has helped to guide that consortium’s many development contributions to Evergreen.  As a participant in the project and a member of the Evergreen Oversight Board, Kathy has also supplied much-needed organizational help – and a fierce determination to help more women succeed in open source.
  • Liz Rea – Liz has been running Koha systems for years, writing patches, maintaing the project’s website, and injecting humor when most needed – a true jill of all trades.

However, there are unknowns that haunt me. Who has tried to contribute to Koha or Evergreen, only to be turned away by an knee-jerk “RTFM” or simply silence? Who might have been interested, only to rightly judge that they didn’t have time for the flack they’d get? Who never got a chance to go to a Code4Lib conference while her male colleague’s funding request got approved three years in a row?

What have we lost? How many lines of code, pages of documentation, hours of help have not gone into the tools that help us help our patrons?

The ideals of free and open source software projects are necessary, but they’re not sufficient to ensure equal access and participation.

The Ada Initiative can help. It was formed to support women in open technology and culture, and runs workshops, assists communities in setting up and enforcing codes of conduct, and promotes ensuring that women have access to positions of influence in open culture projects.

Why is the Ada Initiative’s work important to me? For many reasons, but I’ll mention three. First, because making sure that everybody who wants to work and play in the field of open technology has a real choice to do so is only fair. Second, because open source projects that are truly welcoming to women are much more likely to be welcoming to everybody – and happier, because of the effort spent on taking care of the community. Third, because I know that I don’t know everything – or all that much, really – and I need exposure to multiple points of view to be effective building tools for libraries.

Right now, folks in the library and archives communities are banding together to raise money for the Ada Initiative. I’ve donated, and I encourage others to do the same. Even better, several folks, including Bess SadlerAndromeda Yelton, Chris Bourg, and Mark Matienzo are providing matching donations up to a total of $5,120.

Go ahead, make a donation by clicking below, then come back. I’ll wait.

Donate to the Ada Initiative

Money talks – but whether any given open source community is welcoming, both of new people and of new ideas, depends on its current members.

Therefore, I would also like to extend a challenge to men (including myself — accountability matters!) working in open source software projects in libraries. It’s a simple challenge, summarized in a few words: “listen, look, lift up, and learn.”

ListenListening is hard.  A coder in a library open source project has to listen to other coders, to librarians, to users – and it is all too easy to ignore or dismiss approaches that are unfamiliar.  It can be very difficult to learn that something you’ve poured a lot of effort into may not work well for librarians – and it can be even harder to hear that your are stepping on somebody’s toes or thoughtlessly stomping on their ideas.

What to do? Pay attention to how you communicate while handling bugs and project correspondence. Do you prioritize bugs filed by men? Do you have a subtle tendency to think to yourself, “oh, she’s just not seeing the obvious thing right in front of her!” if a women asks a question on the mailing list about functionality she’s having trouble with? If so, make an effort to be even-handed.

Are you receiving criticism? Count to ten, let your hackles down, and try to look at it from your critic’s point of view.

Be careful about nitpicking.  Many a good idea has died after too much bikeshedding – and while that happens to everybody, I have a gut feeling that it’s more likely to happen if the idea is proposed by a woman.

Is a women colleague confiding in you about concerns she has with community or workplace dynamics? Listen.

Look. Look around you — around your office, the forums, the IRC channels, and Stack Exchanges you frequent. Do you mostly see men who look like yourself?  If so, do what you can to broaden your perspective and your employer’s perspective. Do you have hiring authority? Do you participate in interview panels? You can help who surrounds you.

Remember that I’m talking about library technology here — even if the 70% of the employees of the library you work for are women, if the systems department only employs men, you’re missing other points of view.

Do you have no hiring authority whatsoever? Look around the open source communities you participate in. Are there proportionally far more men participating openly than the gender ratio in librarianship as a whole?  If so, you can help change that by how you choose to participate in the community.

Lift upThis can take many forms.  In some cases, you can help lift up women in library technology by getting out of the way – in other words, by removing or not supporting barriers to participation such as sexist language on the mailing list or by calling out exclusionary behavior by other men (or yourself!).

Sometimes, you can offer active assistance – but ask first! Perhaps a woman is ready to assume a project leadership role or is ready to grow into it. Encourage her – and be ready to support her publicly. Or perhaps you may have an opportunity to mentor a student – go for it, but know that mentoring is hard work.

But note — I’m not an authority on ways to support women in technology.  One of the things that the Ada Initiative does is run Ally Skills workshops that teach simple techniques for supporting women in the workplace and online.  In fact, if you’re coming to Atlanta this October for the DLF Forum, one is being offered there.

Learn. Something I’m still learning is just the sheer amount of crap that women in technology put up with. Have you ever gotten a death threat or a rape threat for something you said online about the software industry? If you’re a guy, probably not. If you’re Anita Sarkeesian or Quinn Norton, it’s a different story entirely.

If you’re thinking to yourself that “we’re librarians, not gamers, and nobody has ever gotten a death threat during a professional dispute with the possible exception of the MARC format” – that’s not good enough. Or if you think that no librarian has ever harassed another over gender – that’s simply not true. It doesn’t take a death threat to convince a women that library technology is too hostile for her; a long string of micro-aggressions can suffice. Do you think that librarians are too progressive or simply too darn nice for harassment to be an issue? Read Ingrid Henny Abrams’ posts about the results of her survey on code of conduct violations at ALA.

This is why the Ada Initiative’s anti-harassment work is so important – and to learn more, including links to sample policies, a good starting point is their own conference policies page. (Which, by the way, was quite useful when the Evergreen Project adopted its own code of conduct). Another good starting point is the Geek Feminism wiki.

And, of course, you could do worse than to go to one of the ally skills workshops.

If you choose to take up the challenge, make a note to come back in a year and write down what you’ve learned, what you’ve listened to and seen, and how you’ve helped to lift up others. It doesn’t have to be public – though that would be nice – but the important thing is to be mindful.

Finally, don’t just take my word for it – remember that I’m not an authority on supporting women in technology. Listen to the women who are.

Update: other #libs4ada posts

This afternoon I’m sitting in the new bibliographic environment breakout session at Code4Lib BC. After taking a look at Mark Jordan’s easyLOD, I decided to play around with putting together a web service for Koha that emits RDF when fed a bib ID. Unlike Magnus Enger’s semantikoha prototype, which uses a Ruby library to convert MARC to RDF, I was trying for an approach that used only Perl (plus XS).

There were are of building blocks available. Putting them together turned out to be a tick more convoluted than I expected.

The Library of Congress has published an XSL stylesheet for converting MODS to RDF. Converting MARC(XML) to MODS is readily done using other stylesheets, also published by LC.

The path seemed clear for a quick-and-dirty prototype — make a copy of svc/bib, copy it to opac/svc/bib and take out the bits for doing updates (we’re not quite ready to make cataloging that collaborative!), and write a few lines to apply two XSLT transformations.

The code was quickly written — but it didn’t work. XML::LibXSLT, which Koha uses to handle XSLT, complained about the modsrdf.xsl stylesheet. Too new! That stylesheet is written in XSLT 2.0, but libxslt, the C library that XML::LibXSLT is based on, only supports XSLT 1.

As it turns out, Perl modules that can handle XSLT are rather thin on the ground. What I ended up doing was:

Installing XML::Saxon::XSLT2, which required…

Installing Saxon-HE, a Java XML and XSLT processor that supports XSLT 2.0, which required…

Installing Inline::Java, which required…

Installing a JDK (I happened to choose OpenJDK).

After all that (and a quick tweak to the modsrdf.xsl stylesheet, I ended up with the following code that did the trick:

#!/usr/bin/perl

BEGIN {
    $ENV{'PERL_INLINE_DIRECTORY'} = '/tmp/inline';
}

use Modern::Perl;

use CGI;
use C4::Biblio;
use C4::Templates;
use XML::Saxon::XSLT2;

my $query = new CGI;
binmode STDOUT, ':encoding(UTF-8)';

# do initial validation
my $path_info = $query->path_info();

my $biblionumber = undef;
if ($path_info =~ m!^/(\d+)$!) {
    $biblionumber = $1;
} else {
    print $query->header(-type => 'text/xml', -status => '400 Bad Request');
}

# are we retrieving or updating a bib?
if ($query->request_method eq "GET") {
    fetch_rdf($query, $biblionumber);
}

exit 0;

sub fetch_rdf {
    my $query = shift;
    my $biblionumber = shift;
    my $record = GetMarcBiblio($biblionumber);
    if  (defined $record) {
        print $query->header(-type => 'text/xml');
        my $xml = $record->as_xml_record();
        my $base = join('/',
                        C4::Context->config('opachtdocs'),
                        C4::Context->preference('opacthemes'),
                        C4::Templates::_current_language()
                       );
        $xml = transform($xml, "$base/xslt/MARC21slim2MODS3-3.xsl");
        $xml = transform($xml, "$base/xslt/modsrdf.xsl");
        print $xml;
    } else {
        print $query->header(-type => 'text/xml', -status => '404 Not Found');
    }
}

sub transform {
    my $xmlrecord = shift;
    my $xslfilename = shift;

    open my $fh, '<', $xslfilename;
    my $trans = XML::Saxon::XSLT2->new($fh);
    return $trans->transform($xmlrecord);

}

This works… but is not satisfying. Making Koha require a JDK just for XSLT 2.0 support is a bit much, for one thing, and it would likely be rather slow if used in production. It’s a pity that there’s still no broad support for XSLT 2.0.

A dead end, most likely, but instructive nonetheless.

Peach Arch.  Photo by Daniel Means.  Licensed under CC-BY-SA and available at http://www.flickr.com/photos/supa_pedro/389603266.
Peach Arch. Photo by Daniel Means. Licensed under CC-BY-SA and available at Flickr.

There is nothing quite like the sense of sheer glee you get when you’re waiting at the border… and have been waiting at the border for a while… and then a new customs inspection lane is opened up. Zoom!

Marlene and I left Seattle this morning to go to the Code4Lib BC conference in Vancouver. Leaving in the morning meant that we missed the lightning talks, and arrived after the breakout sessions had started. Fortunately, folks were quick to welcome us, and I soon fell into the accessibility session.

Accessibility has been on my mind lately, but it’s an area that I’m starting mostly from ground zero with. I knew that designing accessible systems is a Good Idea, I knew about the existence some of the jargon and standards, and I knew that I didn’t know much else — certainly none of the specifics.

Cynthia Ng very kindly shared some pointers with me. For example, it is helpful to know that the Section 508 guidelines is essentially a subset of WCAG 1.0. This is exactly the sort of shortcut (through an apparently intimidating forest) that an expert can effortlessly give to a newbie — and having opportunities to learn from the experts is one of the reasons why I like going to conferences.

The accessibility breakout session charged itself with putting together a list of resources and best practices for accessibility and universal design. As I mentioned above, we arrived in the middle of the breakout session time, but a couple hours was more than enough time to get initial exposure to a lot of ideas and resources. It was exhilarating.

In no particular order, here is a list of various things that I’ll be following up on:

  • The Accessibility Project
  • Guerilla testing
  • The 5 second test
  • Swim lane diagrams
  • The Paciello Group Blog
  • Be careful about putting things in the right sidebar of a three-column layout — a lot of users have been trained by web advertising to completely ignore that region.  Similarly, a graphic with moving parts can get ignored if it looks too much like an ad.
  • The Code4Lib BC accessibility group’s notes
  • Having consistency of branding and look and feel can improve usability — but that can be a challenge when integrating a lot of separate systems (particularly if a library and a vendor have different ideas about whose branding should be foremost).
  • Integrating one’s content strategy with one’s accessibility strategy.  To paraphrase a point that Cynthia made a few times, putting out too much text is a problem for any user.
  • As with so much of software design, iterate early and often. The time to start thinking about accessibility is when you’re 20% of the way through a project, not when you’re 80% done.
  • Standards can help, but only up to a point.  A website could pass an automated WCAG compliance test with flying colors but not actually be usable by anyone.

And there’s another day of conference yet!  I’m quite happy we made the drive up.

Here’s a general question to the world: what reading material do you recommend for folks like me who want to learn more about writing accessible web software?

Today I discovered two things that have been around for a while but which are new to me.

Every now and again I’ve lent my computers’ spare cycles to projects like the Great Internet Mersenne Prime Search and SETI@home, both of which have been crowdsourcing scientific computing long before the term “crowdsourcing” became popular.  One of my discoveries today was a project that’s directly related to my professional interests: distributed archiving of websites that are about to go dark.

It all started when this came across my Twitter feed:

@textfiles Yes, you read right, Yahoo! is completely rate-limiting/temp-banning us from making copies of this data they're deleting. ZERG RUSH NEEDED

A Zerg rush on Yahoo?  Say what?  I had visited textfiles.com, an archive of hacker lore, in the past and knew that Jason Scott did interesting things, but had no idea what he was up to now.

It didn’t take much poking around to figure out what’s up.  Yahoo has announced that their Message Boards service is being discontinued at the end of the month.  Of course, there’s no lack of options for places on the web for folks to talk, although I wouldn’t be surprised to hear that there are a few niche communities on the boards that will have to scramble to find a new home.  What can’t be replaced, of course, are the past discussions — and those were made by the users of the service, not by Yahoo.  So far, it doesn’t sound like Yahoo is interested in providing an archive.

That’s where the Archive Team comes in.  From their homepage:

Archive Team is a loose collective of rogue archivists, programmers, writers and loudmouths dedicated to saving our digital heritage. Since 2009 this variant force of nature has caught wind of shutdowns, shutoffs, mergers, and plain old deletions – and done our best to save the history before it’s lost forever.

Sometimes they’ve been able to save the content of a service that’s going dark just by asking for a copy.  Often, however, it has been necessary to crawl the website before the clock runs out.

That’s where the crowdsourcing comes in: by downloading a virtual machine, you too can have your computer become a “Warrior” and use some of its bandwidth to crawl dying websites, then send the data back to the Archive Team’s archive.  From there, the data gets collocated and sent to a variety of places, including the Internet Archive.

This is not necessary polite archiving.  In the name of getting as complete a capture as possible, the archiving appliance intentionally ignores the the robot exclusion protocol that normal web crawlers should follow.  Furthermore, having a crowd of Warriors increases the chance of that the archiving will progress even in the face of rate-limiting, as Yahoo is currently doing on individual computers that download too quickly.

Does this sounds messy?  Sure.  Would a cautious institution want to think twice before running a Warrior? Perhaps — the cause is worthy, but the potential for liability is uncertain if a website operator decided to call an archiving effort a distributed denial-of-service attack.

Is it necessary?  I believe that it is, so I’m running a Warrior.

The virtual machine, which runs on top of VirtualBox or the like, is dead simple to use, and you can control which projects the Warrior will participate in.  Besides Yahoo Message, the Archive Team is also currently archiving the blogging service Posterous, which is due to go dark at the end of April.

Since Yahoo Messages is going dark less than nine days from now, I encourage folks to consider pitching in now.  Think of it as the WOZ corollary to LOCKSS: Waves of Zergs create the archive.  Then we can have the stuff for Lots of Copies Keep Stuff Safe.

The other discovery I made today?  Just Google for “zerg rush” and wait a moment.

This is the second part in an occasional series on how good data can go bad.

bestiary_viper_thumbnailOne aspect of the MARC standard that sometimes is forgotten is that it was meant to be a cataloging communications format. One could design an ILS that doesn’t use anything resembling MARC 21 to store or express bibliographic data, but as long as its internal structure is sufficiently expressive to keep track of the distinctions called for by AACR2, in principle it could relegate MARC handling strictly to import and export functionality. By doing so, it would follow a conception of MARC as a lingua franca for bibliographic software.

In practice, of course, MARC isn’t just a common language for machines — it’s also part of a common language for catalogers.  If you say “222” or “245” or “780” to one, you’ve communicated a reasonably precise (in the context of AACR2) identification of a metadata attribute.  Sure, it’s arcane, but then again so is most professional jargon to non-practitioners.  MARC also become the basis of record storage and editing in most ILSs, to the point where the act of cataloging is sometimes conflated with the act of creating and editing MARC records.

But MARC’s origins as a communications format can sometimes conflict with its ad hoc role as a storage format.  Consider this record:

00528dam  22001577u 4500
001 123
100 1  $a Strang, Elizabeth Leonard.
245 10 $a Lectures on landscape and gardening design / $c by Elizabeth Leonard Strang.

A brief bibliographic record, right?  Look at the Leader/05, which stores the the record status.  The value ‘d’ means that the record is deleted; other values for that position include ‘n’ for new and ‘c’ for corrected.

But unlike, say, the 245, the Leader/05 isn’t making an assertion about a bibliographic entity.  It’s making an assertion about the metadata record itself, and one that requires more context to make sense.  There can’t be a globally valid assertion that a record is deleted; my public library may have deaccessioned Lectures on landscape and gardening design, but your horticultural library may keep that title indefinitely.

Consequently, the Leader/05 is often ignored when creating or modifying records in an ILS.  For example, if a bib record is present in an Evergreen or Koha database, setting its Leader/05 to ‘d’ does not affect its indexing or display.

However, such records can become undead — not in the context of the ILS, but in the context of exporting them for loading into a discovery layer or a union catalog. Some discovery layers do look at the Leader/05.  If an incoming record is marked as deleted, that is taken as a signal to remove the matching record from the discovery layer’s own indexes.  If there is no matching record, the discovery layer would be reasonable to ignore an incoming “deleted” record — and I know of at least that does exactly that.

The result? A record that appears to be perfectly good in the ILS doesn’t show up in the discovery layer.

Context matters.

I’ll finish with a couple SQL queries for finding such undead records, one for Evergreen:

SELECT record
FROM metabib.full_rec mfr
JOIN biblio.record_entry bre ON (bre.id = mfr.record)
WHERE tag = 'LDR'
AND SUBSTRING(value, 6, 1) = 'd'
AND NOT bre.deleted;

and one for Koha:

SELECT biblionumber
FROM biblioitems 
WHERE ExtractValue(marcxml, 'substring(//leader, 6, 1)') = 'd';

 

CC-BY image of a woodcut of a viper courtesy of the Penn Provenance Project.

Libraries are sneaky, crafty places.  If you walk into one, things may never look the same when you walk out.

Libraries are dangerous places.  If you open your mind in one, you may be forever changed.

And, more mundanely, university libraries are places that employ a lot of work-study students.  I was one of them at Ganser Library at Millersville University.  Although I’ve always been a bookish lad, when I started as a reference shelver at Ganser I wasn’t thinking of the job as anything more than a way to pay the rent while I pursued a degree in mathematics.  And, of course, there were decidedly limits to how much fascination I found filing updated pages in a set of the loose-leaf CCH tax codes.  While some of the cases I skimmed were interesting, I can safely say that a career in tax accountancy was not in my future, either then or now.

Did I mention that libraries are crafty?  Naturally, much of the blame for that attaches to the librarians. As time passed, I ended up working in just about every department of the library, from circulation to serials to systems, as if there were a plot to have me learn to love every nook and cranny of that building and the folks who made it live.  By the time I graduated, math degree in hand, I had accepted a job with an ILS vendor, directly on the strength of the work I had done to help the library migrate to the (at the time) hot new ILS.

While writing this post, it has hit me hard how much I owe an incredible debt of gratitude to my mentors at Ganser.  To name some of them, Scott Anderson, Krista Higham, Barbara Hunsberger, Sally Levit, Marilyn Parrish, Elaine Pease, Leo Shelley, Marjorie Warmkessel, and David Zubatsky have each taught me much, professionally and personally.  To be counted among them as a member of the library profession is an honor.

Today I have an opportunity to toot my horn a bit, having been named one of the “Movers and Shakers” this year by Library Journal.  I am grateful for the recognition, as well as the opportunity to sneak a penguin into the pages of LJ.

Original image by Larry Ewing
Original image by Larry Ewing
Why a penguin? In part, simply because that’s how my whimsy runs. But there’s also a serious side to my choice, and I’m happy that the photographer and editors ran with it. Tux the penguin is a symbol of the open source Linux project, and moreover is a symbol that the Linux community rallies behind. Why have I emphasized community? Because it’s the strength of the library open source communities, particularly those of the Koha and Evergreen projects, that inspire me from day to day. Not that it’s all sunshine and kittens — any strong community will have its share of disappointments and conflicts. However, I deeply believe that open source software is a necessary part of librarians (I use that term broadly) building their own tools with which to share knowledge (and I use that term very broadly) with the wider communities we serve.

The recognition that LJ has given me for my work for Koha and Evergreen is very flattering, but for me it is at heart an opportunity to reflect, and to thank the many friends and mentors in libraryland I have met over the years.

Thanks, and may the work we share ever continue.

In his column in American Libraries today, Will Manley makes a good point that librarians should think twice about agreeing to projects that — no matter how useful — don’t add to the library’s mission. In fact, librarians can even say “no” every now and again. Unfortunately, I found that the column has a few too many cheap shots, detracting from Manley’s message.

Manley’s target? A proposal floated by the U.S. Postal Service to offer retail postal services via partner libraries. It’s understandable that the idea should raise eyebrows among librarians. After all, the IRS program to distribute tax forms through libraries has been a perfect example of an unfunded federal mandate from the point of view of libraries that find themselves turning into ad hoc tax advice services every spring. (And as far as I know, nobody’s offering a joint MLS/tax accountancy degree.) While providing tax forms is a useful service, it’s not clear that it’s one that libraries need to be involved in, or that being involved furthers library aims.

Where Manley goes too far is in a series of lazy clichés about the USPS:

After going billions of dollars into debt and being almost aced out of business by the double whammy of email and private-sector carriers that actually deliver your letters and packages on time and in good condition, the USPS is finally thinking outside of the post office box: The agency has hatched the concept of putting post office kiosks in libraries.

Aced out of business by private competition? There’s no doubt that the environment has drastically changed for the USPS, but it doesn’t follow that the shift from letters to email has made it a dinosaur. A (to say the least) challenging oversight structure and uniquely onerous pension funding requirements imposed on the USPS by Congress have handicapped its ability to react. The USPS covers more territory at cheaper rates than postal systems in many other countries.  Also, it covers rural areas that private firms either would not serve at all or only at exorbitant rates.

Suffice it to say, I generally like the USPS — a stint living in Alaska tends to do that to one. The USPS also has a mandate that is very consonant with library values: universal service.

Of course, whether or not the USPS is fairly treated by Manley doesn’t speak to whether a library should agree to start selling stamps and collecting mail. It’s certainly a stretch from traditional services. But a little digging turned up a big difference from the IRS program: it’s not an unfunded mandate. The “Village Post Office” program, as it’s called, does offer compensation to the small businesses (and libraries!) that operate them. For a struggling library in a rural community whose post office has recently closed or reduced hours, starting a VPO could be a net gain.

Indeed, librarians should know how to say “no”. But they also should know to do their due diligence before deciding.

This is the first part in an occasional series on how good data can go bad.bestiary_viper_thumbnail

Consider the following snippets of a MARC21 record for the Spanish edition of the fourth Harry Potter book.

00998nam  2200313 c 4500
...
240 10 $a Harry Potter and the goblet of fire $l Español
245 10 $a Harry Potter y el cáliz de fuego / $c J.K. Rowling ; [traducción, Adolfo Muñoz García y Nieves Martín Azofra]

The original record uses the Unicode character set with the UTF-8 character encoding. However, If you load this record into a modern ILS, e.g. Koha or Evergreen, the title is likely to end up displayed as:

Harry Potter y el c©Łliz de fuego / J.K. Rowling ; [traducci©đn, Adolfo Mu©łoz Garc©Ưa y Nieves Mart©Ưn Azofra]

Too much copyright! This isn’t an electronic course reserves blog!

What happened? Look at the 9th position of the leader (counting from zero), and you’ll see that it is blank. In MARC21, blank means that the record uses the MARC-8 character set, while ‘a’ means that it uses Unicode. Many, if not most, modern MARC tools will go by the Leader/09 to decide if a character conversion is needed. If the leader position is wrong, poor, defenseless diacritics will get mangled.

Why are there so many copyright signs in the mistreated title? As it happens, the UTF-8 representation of many common characters with Western European diacritics starts with byte 195 (or C3 in hexadecimal). What does C3 mean in the MARC-8 character encoding? You’ve guessed it: the copyright symbol.

There are a couple lessons to draw from this. First, using a good character encoding isn’t enough; you must also say what you’re up to. Second, if you look at enough bad data, you will start to recognize patterns on sight. If you deal with a lot of data, that “second sight” is an arcane but useful skill to develop.

CC-BY image of a woodcut of a viper courtesy of the Penn Provenance Project.