Metadata – Meta Interchange

On 22 March 2016, the Library of Congress announced [pdf] that the subject heading Illegal aliens will be cancelled and replaced with Noncitizens and Unauthorized immigration. This decision came after a couple years of lobbying by folks from Dartmouth College (and others) and a resolution [pdf] passed by the American Library Association.

Among librarians, responses to this development seemed to range from “it’s about time” to “gee, I wish my library would pay for authority control” to the Annoyed Librarian’s “let’s see how many MORE clicks my dismissiveness can send Library Journal’s way!” to Alaskan librarians thinking “they got this change made in just two years!?! Getting Denali and Alaska Natives through took decades!”.

Business as usual, in other words. Librarians know the importance of names; that folks will care enough to advocate for changes to LCSH comes as no surprise.

The change also got some attention outside of libraryland: some approval by lefty activist bloggers, a few head-patting “look at what these cute librarians are up to” pieces in mainstream media, and some complaints about “political correctness” from the likes of Breitbart.

And now, U.S. Representative Diane Black has tossed her hat in the ring by announcing that she has drafted (update: and now introduced) a bill to require that

The Librarian of Congress shall retain the headings ‘‘Aliens’’ and ‘‘Illegal Aliens’’, as well as related headings, in the Library of Congress Subject Headings in the same manner as the headings were in effect during 2015.

There’s of course a really big substantive reason to oppose this move by Black: “illegal aliens” is in fact pejorative. To quote Elie Wiesel: “no human being is illegal.” Names matter; names have power: anybody intentionally choosing to refer to another person as illegal is on shaky ground indeed if they wish to not be thought racist.

There are also reasons to simply roll one’s eyes and move on: this bill stands little chance of passing Congress on its own, let alone being signed into law. As electoral catnip to Black’s voters and those of like-minded Republicans, it’s repugnant, but still just a drop in the ocean of topics for reactionary chain letters and radio shows.

Still, there is value in opposing hateful legislation, even if it has little chance of actually being enacted. There are of course plenty of process reasons to oppose the bill:

There are just possibly a few matters that a member of the House Budget Committee could better spend her time on. For example, libraries in her district in Tennessee would benefit from increased IMLS support, to pick a example not-so-randomly.
More broadly, Congress as a whole has much better things to do than to micro-manage the Policy and Standards Division of the Library of Congress.
If Congress wishes to change the names of things, there are over 31,000 post offices to work with. They might also consider changing the names of military bases named after generals who fought against the U.S.
Professionals of any stripe in civil service are owed a degree of deference in their professional judgments by legislators. That includes librarians.
Few, if any, members of Congress are trained librarians or ontologists or have any particular qualifications to design or maintain controlled vocabularies.

However, there is one objection that will not stand: “Congress has no business whatsoever advocating or demanding changes to LCSH.”

If cataloging is not neutral, if the act of choosing names has meaning… it has political meaning.

And if the names in LCSH are important enough for a member of Congress to draft a bill about — even if Black is just grandstanding — they are important enough to defend.

If cataloging is not neutral, then negative reactions must be expected — and responded to.

Updated 2016-04-14: Add link to H.R. 4926.

Jeffrey Beall, in his post about the results of a LITA survey on library standards, gleans the following:

More notable is the absence of Semantic Web standards from the answers to this particular question. Notice that SKOS, RDF, and SPARQL do not even appear in the “other” section of the survey response (see below).
…
The absence of Semantic Web standards among the several dozen responses to question 3 is very telling.

Telling what, though? Not necessarily very much. The survey question that Beall highlights is Are there particular library-oriented standards important to your work? That strikes me as a rather broad question. Important to one’s work now? In the future? In an ideal world where all metadata is easily slice-and-diceable by anybody?

If you’re a library technologist working with library data, MARC may well be the most important library standard to you right now — after all, it’s what we have for bibliographic metadata. What about the absence of Semantic Web standards? Well, linked data approaches are still experimental, but then, once upon a time, so was MARC. Suppose ALA had conducted a survey in 1967 (before the MARC pilot program had wrapped up) asking about important standards. The catalog card likely would have been identified as the most important technology for day-to-day work, while MARC would have been at the bottom of the list.

The survey is part of LITA’s efforts to become more active in the development of library information standards. If LITA is to achieve that, we need to not only look at and maintain the past, but more importantly, experiment for the future. I actually am not entirely certain what point Beall is trying to make, but if LITA were to focus on developing the MARC standard to the exclusion of experimenting with other ways of expressing bibliographic metadata, that would be a mistake.

Yesterday the Open Knowledge Foundation announced their principles of open bibliographic data. Following a definition of “bibliographic data” (though I don’t think that the distinction drawn between “core” and “secondary” data is useful here), the principles are

When publishing bibliographic data make an explicit and robust license statement.
Use a recognized waiver or license that is appropriate for data.
If you want your data to be effectively used and added to by others it should be open as defined by the Open Definition (http://opendefinition.org) — in particular non-commercial and other restrictive clauses should not be used.
Where possible, we recommend explicitly placing bibliographic data in the Public Domain via PDDL or CC0.

I have endorsed the principles and encourage others to do the same.

The principle discouraging data licenses that restrict commercial reuse is an important one. I can see why somebody who is considering releasing a set of bibliographic data into the wild might be tempted to use a license that forbids commercial use. After all, the vast majority of bibliographic records are created or improved by librarians working for non-profit or governmental entities. Although I don’t think anybody ever became rich beyond the dreams of avarice reselling library data, obviously there is some money to be made there, given the existence of commercial and quasi-commercial firms that deal with library metadata. Why should those firms be allowed to make money off the fruits of the labor of countless catalogers without direct financial recompense?

And … I can’t say that I entirely disagree. Libraries spend a lot of money creating metadata in a punishing economy; if some libraries can manage to get some money back to help keep catalogers and metadata specialists employed, so much the better. Until the advent of true artificial intelligence, there will always be an important role for the human creation and maintenance of metadata, though we also need to do a lot better with automating metadata production.

However, bibliographic data is most useful in the aggregate. A single bibliographic record, no matter how well crafted, has very little value. Put enough of them together to describe a library’s collection, and you start to get somewhere: you now have enough to make a catalog. Put a lot of metadata together, and you can do all kinds of interesting things.

It is in the aggregation of metadata where the licensing decisions that libraries make when releasing bibliographic data matter most. The less friction there is to commercial and non-commercial reuse of the data, the more the data will be used and improved.

Consider this: if I, in the course of my duties at a for-profit MPOW, find a file of records that I can do something useful with, I can get started doing that right away if I see a PDDL or CC0 license associated with it. If, instead, I see a no-commercial-use clause, I’ve hit a point of friction. I may choose to track down the contributor and negotiate a separate license, or, more likely, I’ll look for something else to work with. Unless you are the likes of the Library of Congress (i.e., your metadata can’t be ignored), using a non-commercial license when releasing your data simply means that it will be less likely to be used and improved. Worse, if a non-profit decides to aggregate PDDL/CC0 data and commercial-use-restricted data, it is even more difficult for commercial entities to touch the dataset at all — it’s one thing to track down one rights-holder, but dozens?

Metadata is for use. It is also for continual editing and improvement, as metadata is also imperfect and incomplete. A library information ecosystem that promotes easy access to metadata and easy sharing might manage to keep up and stay relevant.

Of course, if numerous commercial entities make use of open bibliographic data but never compensate the libraries who paid to create it in the first place, that would over time become a strong disincentive for libraries to release open data. Therefore, I would like to suggest a fifth point — maybe not so much a principle as a recommendation — to commercial entities who make use of open bibliographic data: consider treating all open data, even data in the public domain, as if there were a mild copyleft license attached to it. In other words: give back. If in the course of providing your service you are not only using open data but improving the records, release your improvements as open bibliographic data. Moreover, invest some time in releasing your improvements in a maximally useful way — putting a file of improved data on your webserver is a good start, but if there are ways to contribute back to shared bibliographic databases or a hypothetical peer-to-peer metadata exchange so that the improvements can be more easily reused, consider doing so.

Yesterday I attended the RDA briefings from test participants session at ALA Midwinter. I only caught the tail end of Beacher Wiggins’ update from the Library Congress, but as I understand it, LC will announce their decision regarding the results of their testing of RDA by Annual, if not sooner. One thing Beacher said struck me: regardless of the decision, we live in a world of mixed data and will have to get used to it. Of course, that’s been the status quo for years, if not decades; RDA is now just the latest player in the metadata standards dance. At least one major academic library test partner has already made its decision about adopting RDA; Christopher Cronin from the University of Chicago, reported that the catalogers there made a unanimous decision to continuing cataloging in RDA after the test is completed.

Besides Christopher, several other test partners relayed their experiences: Penny Baker from the Clark Art Institute Library, Richard Hasenyager from the North East Independent School District, Kathryn La Barre from the UIUC GSLIS program, and Maritta Coppieters from Backstage Library Works. Here’s my idiosyncratic summary of the tester’s experiences:

Changing to RDA won’t be the end of the world.
Everybody is continuing to work in a MARC framework; the testing group has not been experimenting (or had time to?) with alternative metadata carriers or frameworks beyond some tests of creating Dublin Core records.
The testers who work with authority records seem to universally like the new RDA fields.
Nobody is mourning the passing of the rule of three.
Nobody likes sticking both the publication date and the copyright date in the 260$c.
The RDA Toolkit, as a software tool, is still a work in progress. Some like it, some don’t.
A clear understanding of FRBR is required for understanding RDA. During the Q&A, there was a discussion in the audience about training current catalogers; the consensus seems to be that it is difficult to teach FRBR to catalogers steeped in the AACR2 language, and much easier to explain FRBR to library school students. Is this a sign of a generational divide?

I was very sorry that none of the public library testing partners gave a briefing. However, I think Richard Hasenyager’s conclusion about when and if the NE ISD will adopt RDA applies to many public libraries: the ISD is willing to adopt RDA, but they can’t do it by themselves; the systems and materials vendors need to have full support for RDA records before it is economical for the school district to proceed. If libraries are to shift from AACR2 to RDA, this must be addressed. There is already a divide between academic and public library catalogers; having academic libraries do original cataloging in RDA while public libraries copy catalog using whatever records they can get (thereby adopting RDA by default without necessarily being fully invested or trained in RDA) would not be an ideal outcome.

Category: Metadata

Changing LCSH and living dangerously

What counts as an important standard?

Open data, commercialization, and copyleft

Onward, by fits and bounds?

Share this:

Share this:

Share this:

Share this: