Yesterday the Open Knowledge Foundation announced their principles of open bibliographic data. Following a definition of “bibliographic data” (though I don’t think that the distinction drawn between “core” and “secondary” data is useful here), the principles are
- When publishing bibliographic data make an explicit and robust license statement.
- Use a recognized waiver or license that is appropriate for data.
- If you want your data to be effectively used and added to by others it should be open as defined by the Open Definition (http://opendefinition.org) — in particular non-commercial and other restrictive clauses should not be used.
- Where possible, we recommend explicitly placing bibliographic data in the Public Domain via PDDL or CC0.
I have endorsed the principles and encourage others to do the same.
The principle discouraging data licenses that restrict commercial reuse is an important one. I can see why somebody who is considering releasing a set of bibliographic data into the wild might be tempted to use a license that forbids commercial use. After all, the vast majority of bibliographic records are created or improved by librarians working for non-profit or governmental entities. Although I don’t think anybody ever became rich beyond the dreams of avarice reselling library data, obviously there is some money to be made there, given the existence of commercial and quasi-commercial firms that deal with library metadata. Why should those firms be allowed to make money off the fruits of the labor of countless catalogers without direct financial recompense?
And … I can’t say that I entirely disagree. Libraries spend a lot of money creating metadata in a punishing economy; if some libraries can manage to get some money back to help keep catalogers and metadata specialists employed, so much the better. Until the advent of true artificial intelligence, there will always be an important role for the human creation and maintenance of metadata, though we also need to do a lot better with automating metadata production.
However, bibliographic data is most useful in the aggregate. A single bibliographic record, no matter how well crafted, has very little value. Put enough of them together to describe a library’s collection, and you start to get somewhere: you now have enough to make a catalog. Put a lot of metadata together, and you can do all kinds of interesting things.
It is in the aggregation of metadata where the licensing decisions that libraries make when releasing bibliographic data matter most. The less friction there is to commercial and non-commercial reuse of the data, the more the data will be used and improved.
Consider this: if I, in the course of my duties at a for-profit MPOW, find a file of records that I can do something useful with, I can get started doing that right away if I see a PDDL or CC0 license associated with it. If, instead, I see a no-commercial-use clause, I’ve hit a point of friction. I may choose to track down the contributor and negotiate a separate license, or, more likely, I’ll look for something else to work with. Unless you are the likes of the Library of Congress (i.e., your metadata can’t be ignored), using a non-commercial license when releasing your data simply means that it will be less likely to be used and improved. Worse, if a non-profit decides to aggregate PDDL/CC0 data and commercial-use-restricted data, it is even more difficult for commercial entities to touch the dataset at all — it’s one thing to track down one rights-holder, but dozens?
Metadata is for use. It is also for continual editing and improvement, as metadata is also imperfect and incomplete. A library information ecosystem that promotes easy access to metadata and easy sharing might manage to keep up and stay relevant.
Of course, if numerous commercial entities make use of open bibliographic data but never compensate the libraries who paid to create it in the first place, that would over time become a strong disincentive for libraries to release open data. Therefore, I would like to suggest a fifth point — maybe not so much a principle as a recommendation — to commercial entities who make use of open bibliographic data: consider treating all open data, even data in the public domain, as if there were a mild copyleft license attached to it. In other words: give back. If in the course of providing your service you are not only using open data but improving the records, release your improvements as open bibliographic data. Moreover, invest some time in releasing your improvements in a maximally useful way — putting a file of improved data on your webserver is a good start, but if there are ways to contribute back to shared bibliographic databases or a hypothetical peer-to-peer metadata exchange so that the improvements can be more easily reused, consider doing so.
Open data, commercialization, and copyleft by Galen Charlton is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Hello Galen,
I agree 100% with your arguments why libraries should publish their data under an open license and especially want to emphasize this point: “It is in the aggregation of metadata where the licensing decisions that libraries make when releasing bibliographic data matter most. The less friction there is to commercial and non-commercial reuse of the data, the more the data will be used and improved.”
I’d like to make one thing clear though: The Principles on Open Bibliographic Data don’t only adress libraries and librarians but all other producers of bibliographic data like – quoting the principles – “publishers, scholars, online communities of book lovers, social reference management systems, and so on”. The principles hold for all these agents and thus one can say that your proposed fifth point is already part of the original principles.
With Peter Murray-Rust and Jim Pitman the group who created the principles had two scientists in it who are specifically interested in journal article data which normally isn’t produced by libraries. Thus, we are not only talking about data in library catalogs which mostly covers monographs and journals, we are also talking about article metadata which in many scientific disciplines plays the really important role. And in this realm of journal article data there is a contrary movement towards a growing commercialization of metadata – one just has to look at services that emerged in the last years like Summon, Primo central and EBSCO Discovery Service who basically sell licenses to access aggregations of metadata…
Adrian
Thank you for correcting my too-narrow view of the target audience for the principles. I agree that open article metadata is also extremely important. The fact that libraries ceded maintenance of such data to the publishers and aggregators is an understandable historical event but one that hampers broader innovation in discovery tools, particularly for academic libraries.
That said, I think my fifth recommendation does cover a point that is not explicit in the principles as stated: in order to foster the best use and improvement of open bibliographic data, consumers of it should be prepared to contribute their changes back. Unlikely a copyleft license such as the GPL, the PDDL and CC0 do not impose such a requirement. While I don’t think that the GPL per se would be suitable for metadata licensing, I would like to see a social norm arise that consumers of open bibliographic data fully participate in the ecosystem by contributing improvements back, even though they’re under no obligation to do so. The distinction I’m trying to draw is between the original producers of bibliographic data and a secondary market of individuals and firms who find ways to mash it up and enhance it.
I think the above article confuses “commercial” and “for-profit”. A non-profit trading company can’t “aggregate PDDL/CC0 data and commercial-use-restricted data” in the line of their trading without getting another licence for the restricted stuff. That’s a mistake that a lot of people make when choosing NC and they hurt lots of non-profits by doing it.
I fully endorse the idea of contributing back, though. It would be good for projects to make that easier, too.