Notes from Code4Lib BC: gluing together an ugly string to emit RDF

This afternoon I’m sitting in the new bibliographic environment breakout session at Code4Lib BC. After taking a look at Mark Jordan’s easyLOD, I decided to play around with putting together a web service for Koha that emits RDF when fed a bib ID. Unlike Magnus Enger’s semantikoha prototype, which uses a Ruby library to convert MARC to RDF, I was trying for an approach that used only Perl (plus XS).

There were are of building blocks available. Putting them together turned out to be a tick more convoluted than I expected.

The Library of Congress has published an XSL stylesheet for converting MODS to RDF. Converting MARC(XML) to MODS is readily done using other stylesheets, also published by LC.

The path seemed clear for a quick-and-dirty prototype — make a copy of svc/bib, copy it to opac/svc/bib and take out the bits for doing updates (we’re not quite ready to make cataloging that collaborative!), and write a few lines to apply two XSLT transformations.

The code was quickly written — but it didn’t work. XML::LibXSLT, which Koha uses to handle XSLT, complained about the modsrdf.xsl stylesheet. Too new! That stylesheet is written in XSLT 2.0, but libxslt, the C library that XML::LibXSLT is based on, only supports XSLT 1.

As it turns out, Perl modules that can handle XSLT are rather thin on the ground. What I ended up doing was:

Installing XML::Saxon::XSLT2, which required…

Installing Saxon-HE, a Java XML and XSLT processor that supports XSLT 2.0, which required…

Installing Inline::Java, which required…

Installing a JDK (I happened to choose OpenJDK).

After all that (and a quick tweak to the modsrdf.xsl stylesheet, I ended up with the following code that did the trick:

#!/usr/bin/perl

BEGIN {
    $ENV{'PERL_INLINE_DIRECTORY'} = '/tmp/inline';
}

use Modern::Perl;

use CGI;
use C4::Biblio;
use C4::Templates;
use XML::Saxon::XSLT2;

my $query = new CGI;
binmode STDOUT, ':encoding(UTF-8)';

# do initial validation
my $path_info = $query->path_info();

my $biblionumber = undef;
if ($path_info =~ m!^/(\d+)$!) {
    $biblionumber = $1;
} else {
    print $query->header(-type => 'text/xml', -status => '400 Bad Request');
}

# are we retrieving or updating a bib?
if ($query->request_method eq "GET") {
    fetch_rdf($query, $biblionumber);
}

exit 0;

sub fetch_rdf {
    my $query = shift;
    my $biblionumber = shift;
    my $record = GetMarcBiblio($biblionumber);
    if  (defined $record) {
        print $query->header(-type => 'text/xml');
        my $xml = $record->as_xml_record();
        my $base = join('/',
                        C4::Context->config('opachtdocs'),
                        C4::Context->preference('opacthemes'),
                        C4::Templates::_current_language()
                       );
        $xml = transform($xml, "$base/xslt/MARC21slim2MODS3-3.xsl");
        $xml = transform($xml, "$base/xslt/modsrdf.xsl");
        print $xml;
    } else {
        print $query->header(-type => 'text/xml', -status => '404 Not Found');
    }
}

sub transform {
    my $xmlrecord = shift;
    my $xslfilename = shift;

    open my $fh, '<', $xslfilename;
    my $trans = XML::Saxon::XSLT2->new($fh);
    return $trans->transform($xmlrecord);

}

This works… but is not satisfying. Making Koha require a JDK just for XSLT 2.0 support is a bit much, for one thing, and it would likely be rather slow if used in production. It’s a pity that there’s still no broad support for XSLT 2.0.

A dead end, most likely, but instructive nonetheless.

CC BY-SA 4.0 Notes from Code4Lib BC: gluing together an ugly string to emit RDF by Galen Charlton is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.