Playing around with Coce

In the course of looking at the patch for Koha bug 9580 today, I ended playing around with Coce.

Coce is a piece of software written by Frédéric Demians and licensed under the GPL that implements a cache for URLs of book cover images. It arose during a discussion of cover images on the Koha development mailing list.

The idea of Coce is rather than have the ILS either directly link to cover images by plugging the normalized ISBN into a URL pattern (as is done for Amazon, Baker & Taylor and Syndetics) or by calling a web service to get the image’s URL (as is done for Google and Open Library), Coce queries the cover image providers and returns the image URLs. Furthermore, Coce caches the URLs, meaning once it determines that the Open Library cover image for ISBN 9780563533191 can be found at http://covers.openlibrary.org/b/id/2520432-L.jpg, it need not ask again, at least for a while.

Having a cache like this provides some advantages:

  • Caching the result of web service calls reduces the load on the providers. That’s nice for the likes of the Open Library, and while even the most ambitious ILS is not likely to discomfit Amazon or Google, it doesn’t hurt to reduce the risk of getting rate-limited during summer reading.
  • Since Coce queries each provider for valid image URLs, users are less likely to see broken cover images in the catalog.
  • Since Coce can query multiple providers (it currently has support for the Open Library, Google Books, and Amazon’s Product Advertising API), more records can have cover images displayed as compared to using just one source.
  • It lends itself to using one Coce instance to service multiple Koha instances.

There are also some disadvantages:

  • It would be yet another service to maintain.
  • It would be another point of failure. On the other hand, it looks like it would be easy to set up multiple, load-balanced instances of Coce.
  • There is the possibility that image URLs might get cached for too long — although I don’t think any of the cover image services are in the habit of changing the static image URLs just for fun, they don’t necessarily guarantee that they will work forever.

I set up Coce on a Debian Wheezy VM. It was relatively simple to install; for posterity here is the procedure I used. First, I installed Redis, which Coce uses as its cache:

sudo apt-get install redis-server

Next, I installed Node.js by building a Debian package, then installing it:

sudo apt-get install python g++ make checkinstall
mkdir ~/src && cd $_
wget -N http://nodejs.org/dist/node-latest.tar.gz
tar xzvf node-latest.tar.gz && cd node-v*
./configure
checkinstall
sudo dpkg -i ./node_0.10.15-1_amd64.deb 

When I got to the point where checkinstall asked me to confirm the metadata for the package, I made sure to remove the “v” from the version number.

Next, I checked out Coce and installed the Node.js packages it needs:

cd ~
git clone https://github.com/fredericd/coce
cd coce
npm install express redis aws-lib util

I then copied ”config.json-sample” to ”config.json” and customized it. The only change I made, though, was to remove Amazon from the list of providers.

Finally, I started the service:

node webservice.js

On my test Koha system, I installed the patch for bug 9580 and set the two system preferences it introduces to appropriate values to point to my Coce instance with the set of cover providers I wanted to use for the test.

The result? It worked: I did an OPAC search, and some of the titles that got displayed had their cover image provided by Google Books, while others were provided by the Open Library.

There are a few rough edges to work out. For example, the desired cover image size should probably be part of the client request to Coce, not part of Coce’s central configuration, and I suspect a bit more work is needed to get it to work properly if the OPAC is run under HTTPS. That said, this looks promising, and I enjoyed the chance to start playing a bit with Redis and Node.js.

CC BY-SA 4.0 Playing around with Coce by Galen Charlton is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

3 thoughts on “Playing around with Coce

  1. There’s also the legal issues, you have to be certain you are legally allowed to cache the images. I think OpenLibrary is ok, I’m not sure about Google, and I’m sure that you can’t with Amazon or syndetics.

  2. Well, the legalities are shaky with Amazon anyway — I would argue that many libraries that use Amazon book jackets are doing so on the (reasonable) assumption that Amazon is unlikely to come down hard on non-commercial use, particularly by libraries. Very few library OPACs have (or should have) “as their principal purpose advertising and marketing the Amazon Site and driving sales of products and services on the Amazon Site”, to quote the Amazon Product Advertising API license agreement.

    Coce isn’t caching the images, though, just their URLs — and even if all it did was simply cache whether a given provider had an image for a given ISBN, that would be a win.

    Whether that makes a difference in theory (or practice) with respect to the TOS of a given image provider surely varies. Amazon’s Product Advertising API license agreement appears to forbid caching the images, but it does permit caching links to images (as well as non-image API results) for up to 24 hours.

    I don’t have a copy of the Syndetic’s license agreement, but I know of at least one ILS that has been doing short-term (as in, measured in minutes) caching of Syndetics content to improve performance for years.

    Of course, nothing says that Coce couldn’t be used to cache and aggregate cover image URLs from strictly free sources, including local cover images from multiple Koha catalogs (though in that case, you probably do want to cache the actual images, not just the URLs).

  3. Back from vacation, I’m just catching up your blog post. Thanks for testing Koha bug 9580 and Coce.

    You have a point concerning cover image size. The way cover images providers manage image size vary. Open Library and Amazon follow the same model with small, medium, and large sizes. Bug the meaning of ‘medium’ or ‘small’ may also vary for each provider. You can image the complication. I can’t see an easy solution. I (we) need to think about it. Maybe something with a ratio associated with the returned URL, in order to help the client-side JS code to resize the image.

Comments are closed.