Bestiary of Monstrous Data #2: I’m not dead yet!

This is the second part in an occasional series on how good data can go bad.

bestiary_viper_thumbnailOne aspect of the MARC standard that sometimes is forgotten is that it was meant to be a cataloging communications format. One could design an ILS that doesn’t use anything resembling MARC 21 to store or express bibliographic data, but as long as its internal structure is sufficiently expressive to keep track of the distinctions called for by AACR2, in principle it could relegate MARC handling strictly to import and export functionality. By doing so, it would follow a conception of MARC as a lingua franca for bibliographic software.

In practice, of course, MARC isn’t just a common language for machines — it’s also part of a common language for catalogers.  If you say “222” or “245” or “780” to one, you’ve communicated a reasonably precise (in the context of AACR2) identification of a metadata attribute.  Sure, it’s arcane, but then again so is most professional jargon to non-practitioners.  MARC also become the basis of record storage and editing in most ILSs, to the point where the act of cataloging is sometimes conflated with the act of creating and editing MARC records.

But MARC’s origins as a communications format can sometimes conflict with its ad hoc role as a storage format.  Consider this record:

00528dam  22001577u 4500
001 123
100 1  $a Strang, Elizabeth Leonard.
245 10 $a Lectures on landscape and gardening design / $c by Elizabeth Leonard Strang.

A brief bibliographic record, right?  Look at the Leader/05, which stores the the record status.  The value ‘d’ means that the record is deleted; other values for that position include ‘n’ for new and ‘c’ for corrected.

But unlike, say, the 245, the Leader/05 isn’t making an assertion about a bibliographic entity.  It’s making an assertion about the metadata record itself, and one that requires more context to make sense.  There can’t be a globally valid assertion that a record is deleted; my public library may have deaccessioned Lectures on landscape and gardening design, but your horticultural library may keep that title indefinitely.

Consequently, the Leader/05 is often ignored when creating or modifying records in an ILS.  For example, if a bib record is present in an Evergreen or Koha database, setting its Leader/05 to ‘d’ does not affect its indexing or display.

However, such records can become undead — not in the context of the ILS, but in the context of exporting them for loading into a discovery layer or a union catalog. Some discovery layers do look at the Leader/05.  If an incoming record is marked as deleted, that is taken as a signal to remove the matching record from the discovery layer’s own indexes.  If there is no matching record, the discovery layer would be reasonable to ignore an incoming “deleted” record — and I know of at least that does exactly that.

The result? A record that appears to be perfectly good in the ILS doesn’t show up in the discovery layer.

Context matters.

I’ll finish with a couple SQL queries for finding such undead records, one for Evergreen:

SELECT record
FROM metabib.full_rec mfr
JOIN biblio.record_entry bre ON (bre.id = mfr.record)
WHERE tag = 'LDR'
AND SUBSTRING(value, 6, 1) = 'd'
AND NOT bre.deleted;

and one for Koha:

SELECT biblionumber
FROM biblioitems 
WHERE ExtractValue(marcxml, 'substring(//leader, 6, 1)') = 'd';

 

CC-BY image of a woodcut of a viper courtesy of the Penn Provenance Project.

CC BY-SA 4.0 Bestiary of Monstrous Data #2: I’m not dead yet! by Galen Charlton is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

One thought on “Bestiary of Monstrous Data #2: I’m not dead yet!

  1. Nice post!

    Some of our vendors for electronic titles also use that field in update files. They’ll send over a ‘d’ in leader/05 when they want us to delete the record from the database. If it doesn’t already, it might not be a bad idea to have Vandelay take a look at that field.

Comments are closed.