Archive

Author Archive

Preconference at ALA: Migrating to Open Source Library Systems

June 23rd, 2010 Galen Charlton 1 comment

There are a lot of programs related to free and open source software at the Annual meeting of the American Library Association this year, but I am particularly looking forward to the LITA preconference on migrating to open source library systems, which I helped organize along with the LITA Open Source Systems Interest Group.  The main speakers are:

  • Terry Reese, Gray Family Chair for Innovative Library Services, Oregon State University, who will be presenting on MarcEdit.
  • David Lindahl, Web Initiatives Manager, University of Rochester, who will be discussing the XC project, evaluating open source with users in mind, migrating and managing metadata, and interoperating the ILS with other open source software.
  • Brenda Chawner, Senior Lecturer, Victoria University of Wellington, who will discuss the Kete community digital archive platform and building communities to contribute to open source software.

I will also be presenting on techniques for loading data into open source ILSs, and I know that a number of people with experience working with and migrating to open source library software will be present to share their expertise.

The preconference will be in the Monroe Room at the Hilton Washington and will run from 9 to 4:30 on Friday, 25 June 2010.  It is a paid preconference of LITA, but I will be blogging it and sharing the thoughts and insights of the speakers and participants with the world.  As far as I know, if you will be attending ALA, it is still possible to register for the preconference.

But wait!  There’s more!  There is another LITA preconference on Friday. related to F/OSS software, the “Open Source CMS Playroom” which will be lead by Karen Coombs from OCLC and Amanda Hollister from LISHost.  For more details, check out the LITA events page.

Categories: Libraries Tags: , , ,

Saving keystrokes when sending Koha patches

May 21st, 2010 Galen Charlton No comments

With the recent move of the Koha mailing lists to http://lists.koha-community.org/, the full email address for submitting a patch is now koha-patches@lists.koha-community.org. Since that’s a bit long to type every time you submit a patch, here’s one way to save a few keystrokes.

First, create a text file file to define a short alias for the email address. I chose to call the file ~/.git_aliases. The file should contain a line like this:

alias kpatches koha-patches@lists.koha-community.org

Next, tell Git that you want to use this address book file:

$ git config --global sendemail.aliasesfile ~/.git_aliases
$ git config --global sendemail.aliasfiletype mutt

Now, all you have to do is use kpatches in place of the full address. For example:

$ git send-email 0001-bug-4801-fix-paging-in-display-of-staged-bibs-and-im.patch
0001-bug-4801-fix-paging-in-display-of-staged-bibs-and-im.patch

Who should the emails appear to be from? [Galen Charlton ]

Who should the emails be sent to? kpatches

or

$ git send-email -to kpatches 0001-bug-4801-fix-paging-in-display-of-staged-bibs-and-im.patch
Categories: Koha Tags:

Small world (or, did somebody just start a Code4Lib PAC?)

May 13th, 2010 Galen Charlton No comments

I found out today that Craig Lowe, the incoming mayor of Gainesville, used to work as a programmer for the Florida Center for Library Automation, so his election is even better news for libraries in Gainesville and Alachua County than I had thought. Craig has been on the library governing board for a while, but it is nice to know that he has direct experience to bear. (Yes, I voted for him.)

Categories: Gainesville, Libraries Tags:

Mixed messages

May 13th, 2010 Galen Charlton 4 comments

I am glad to see that PTFS and its LibLime division have contributed the developments that PTFS has been working on for the past year or so, including features commissioned by the East Brunswick and Middletown Township public libraries and others. The text of LibLime’s announcement makes it clear that this is meant as a submission to Koha 3.2 and (more so) 3.4:

The code for the individual new features included in this version has also been made available for download from the GIT repository. The features in this release were not ready for 3.2, but, pending acceptance by the 3.4 Koha Release Manager, could be included in release 3.4.

Chris Cormack (as 3.4 release manager) and I (as 3.2 release manager) have started work on integrating this work in the Koha. Since 3.2 is in feature freeze, for the most part only the bugfixes from Harley will be included in 3.2, although I am strongly considering bringing in the granular circulation permissions work as well. The majority of the features will make their way into 3.4, although they will go through QA and discussion like any other submission.

So far, so good. As a set of contributions for 3.2 and 3.4, “Harley” represents the continuation of PTFS’ ongoing submissions of code to Koha in the past year. Further, I hope that if PTFS is serious about their push for “agile” programming, that they will make a habit of submitting works in progress for discussion and public QA sooner, as in some cases “Harley” features that were obviously completed months ago were not submitted until now.

But here is where the mixed messages come in: “Harley” is prominently listed on koha.org as a release of Koha. Since no PTFS staff are among the elected release managers or maintainers for Koha, that is overreaching. Ever since Koha expanded beyond New Zealand, no vendor has hitherto unilaterally implied that they were doing mainstream releases of Koha outside of the framework of elected release managers.

Before I go further, let me get a couple things out of the way. If somebody wants to enhance Koha and create installation packages of their work in addition to contributing their changes to the Koha project, that’s fine. In fact, if somebody wants to do that without formally submitting their changes, that’s certainly within the bounds of the GPL, although obviously I’d prefer that we have one Koha instead of a bunch of forks of it. If any library wants to download, install, test, and use “Harley”, that’s fine as well. Although there could be some trickiness upgrading from “Harley” to Koha 3.2 or Koha 3.4, it will certainly be possible to do so in the future.

What I am objecting to is the overreach.  Yes, “Harley” is important.  Yes, I hope it will help open a path to resolve other issues between PTFS/LibLime and the rest of the Koha community.  Yes, I thank PTFS for releasing the code, and in particular publishing it in their Git repository.  That doesn’t make it an official release of Koha; it is still just another contribution to the Koha project, the same as if it came from BibLibre, software.coop, Catalyst, Equinox, one of the many individual librarians contributing to Koha, or any other source.

“Harley” is available for download from LibLime’s website at http://www.liblime.com/downloads.  This is where it belongs.  Any vendor-specific distribution of Koha should be retrievable from the vendor’s own website, but it should not be presented as a formal release.  Perhaps there is room to consider having the Koha download service also offer vendor-specific distributions in addition to the main releases, but if that is desired, it should be proposed and discussed on the community mailing lists.

Updating koha.org to remove the implication that “Harley” is an official release is a simple change to make, and I call upon PTFS to do so.

Please see my disclosure statement. In particular, I am release manager for Koha 3.2 and I work for a competitor of PTFS. This post should not be construed as an official statement by Equinox, however, although I stand by my words.

Categories: Koha Tags:

Database server disk space usage in Evergreen

May 9th, 2010 Galen Charlton 3 comments

I’ve had occasion recently to dig into how Evergreen uses space in a PostgresSQL database. Before sharing a couple queries and observations, here’s my number one rule for configuring a database server for Evergreen: allocate enough disk space. If you’re using a dedicated database server, you’ll need disk space to store the following:

  • the database files storing the actual data,
  • current WAL (write-ahead log) files,
  • archived WAL files (which should be backed up off the database server as well),
  • current database snapshots and backups (again, these should be backed up offline as well),
  • scratch space for migrations and data loads,
  • future data, particularly if you’re using Evergreen for a growing consortium, and
  • the operating system, the Postgres software, etc.

Of course, the amount of disk space required just to store the data depends on the number of records you have. A complete sizing formula would take into account the number of bibs, items, patrons, circulation transactions, and monetary transactions you expect to have, but here’s a rule of thumb based on looking at several production Evergreen 1.6 databases and rounding up a bit: allocate at least 50K per bibliographic record.

That’s only the beginning, however. Postgres uses write-ahead logging to record database transactions; this has the effect of adding a 16M file to the pg_xlog directory every so often as users catalog and do circulation.  In turn, the WAL files should get archived periodically by enabling archive mode so that copies exist both on the database server itself and on backup media.

In a busy system, read “quite often” for “every so often” in the second sentence of the previous paragraph. In a system where you’re actively loading data, particularly if you’re also keeping that database up for production use and are therefore keeping WAL archiving on, read “fast and furious”. Why does this matter? If your database server crashes, you will need the most recent full backup and the accumulated archived WAL files since that backup to recover your database. If you don’t keep your WAL files, be prepared for an involuntary fine amnesty and angry catalogers. Conversely, what happens if you run out of space for your archived WAL files?  Postgres will lock up, bringing your Evergreen system to a halt, yielding angry patrons and angry circulation staff.

Archived WAL files don’t need to be kept on the database server forever, fortunately.  After each periodic full backup, archived WAL files made prior to that backup won’t be needed in case you need to do a point in time recovery. Of course, that assumes everything goes well during the recovery, so you will still want to keep at least a couple generations of full backups and WAL file sequences, include offline backup copies, and also periodically create logical database dumps using pg_dump. LOCKSS isn’t just for digitized scholarly papers.

So, what’s my rule of thumb for estimating total disk space needed for an Evergreen database server? 200K per bibliographic record that you expect to have in your database three years from now. I admit that this is on the high side, and this is not the formula that Equinox’s server people necessarily use for hardware recommendations. However, while disk space may or may not be “cheap”, it is often cheaper than a 2 a.m. wake-up call from the library director.

How does this disk space get used? I’ll close with a couple queries to run against your Evergreen database:

select schemaname,
       pg_size_pretty(sum(
         pg_total_relation_size(schemaname || '.' || tablename)
       )::bigint) AS used
from pg_tables
group by schemaname
order by
  sum(pg_total_relation_size(schemaname || '.' || tablename))::bigint desc;

This gives you the amount of space used in each schema. The metabib schema, which contains the indexing tables, will almost certainly be #1.  Depending on how long you’ve been using your Evergreen system, either auditor or biblio will be #2.

select schemaname || '.' || tablename AS tab,
       pg_size_pretty(
         pg_total_relation_size(
           schemaname || '.' || tablename
         )
       ) AS used
from pg_tables
order by pg_total_relation_size(schemaname || '.' || tablename) desc;

This will give you the space used by each table. metabib.real_full_rec will be #1, usually followed by biblio.record_entry. It is interesting to note that although those two tables essentially store exactly the same data, metabib.real_full_rec will typically consume five times as much space as biblio.record_entry.

Categories: Evergreen Tags: , ,

Git repo law #1: save the time of the puller

March 29th, 2010 Galen Charlton No comments

Apologies to Ranganathan.

Say you have a Git repository you want to publish, and you’ve set up a Gitweb for it at http://git.example.com/?p=myrepo.git;a=summary.  So far, so good: others can browse your commits and download packages and tarballs.  Suppose you’ve also configured git-daemon(1) to publish the repo using the Git protocol.  Great!  Now suppose you’ve told the world to go to http://git.example.com. The world looks at what you have wrought, and then asks: How can we clone your repository?

Even assuming that you’ve used the default options in your git-daemon configuration, the Git clone URL could be any of the following depending on where your OS distribution’s packagers decided to put things:

  • git://git.example.com/myrepo
  • git://git.example.com/myrepo.git
  • git://git.example.com/git/myrepo
  • git:/git.example.com/git/myrepo.git
  • and there are even more possibilities if you did tweak the config

The rub is that Gitweb doesn’t know and can’t know until you tell it.  If you don’t tell it, somebody who wants to clone your repo and who is looking at the Gitweb page can only guess.  If they guess wrong a few times, they may give up.

Fortunately, the solution is easy: to make the Git clone URL display in your Gitweb, go to the repository’s base directory and create a new file called cloneurl and enter the correct clone URL(s), one per line. While you’re at it, make sure that the description file is accurate as well.

Categories: Code4Lib, Miscellany Tags:

Your comment spam will be graded: points off for plagiarism

March 23rd, 2010 Galen Charlton 4 comments

I saw a particularly annoying form of comment spam in Dorothea Salo’s excellent summary of various kinds of open information:

screenshot of plagiaristic comment spam

screenshot of plagiaristic comment spam

The author link points to the site of what appears to be a Turkish dietary supplement vendor.  Just a bit off-topic, unless this is somehow a subtle way of announcing that they’re releasing their supplement under an open recipe license.  What really steams me: the text was copied from one of my comments on the post.

Failing grade for plagiarism.

Categories: Miscellany Tags:

Here we go again: state aid to Florida libraries to be eliminated

March 12th, 2010 Galen Charlton No comments

On Wednesday, two committees of the Florida state legislature recommended removing funding for the Florida State Aid to Public Libraries program. This is the second time in as many years that this has happened. To compound the problem, the elimination of state aid would also mean that Florida libraries would no longer qualify for some forms of federal aid.

While a handful of library systems in Florida are independent taxing districts and could (painfully) weather this, elimination of state aid would mean that a lot of rural and city libraries would have to close branches, cut hours, and lay off library staff. Many rural libraries are already operating on shoestrings.

Do you live in Florida? Call your state representative and senator today and ask them to vote to continue funding for state aid to Florida libraries. Also, please ask them to stop this proposal from becoming an annual tradition. No brinkmanship with our libraries, please!

Update 2010-04-28: State aid has been restored! [PDF link] Can we not play this game again next year?

What is truth?

September 14th, 2009 Galen Charlton 2 comments

Not paying close attention to Perl’s definition of truth can sometimes lead to subtle bugs. Consider a simple scalar $x that should contain a string exactly one character wide. If the original value of $x can be undefined and you want to make sure it has a default value of a single space, do not do the following:

$x ||= ' ';

Why not? If $x starts off as ‘0′, a permitted value, this line will change it to ' '. Instead, do this

$x = ' ' unless defined $x;

Remember, 0, '0', '', and undef all evaluate to Perl’s notion of false.

Categories: Code4Lib, Perl Tags:

Koha Git mini-tutorial: pulling from remote repostories

August 10th, 2009 Galen Charlton 1 comment

Earlier today Chris Cormack and I were chatting on IRC about various ways to manage patches, and decided to stage a little tutorial about how to pull from remote Git repositories:

<chris> speaking of public repo’s … i have been pushing to github for a while
<chris> but i have set up git-daemon on my machine at home too
<gmcharlt> chris: anything you’re ready to have me look at to pull?
<chris> not really
<chris> one interesting thing is the dbix_class branch
<chris> http://git.workbuffer.org/cgi-bin/gitweb.cgi?p=koha.git;a=summary
<gmcharlt> even if it’s trivial, it occurs to me that doing it and writing up how we did it might be useful material for a tutorial blog post or maiilings to koha-devel
<chris> lemme check
<chris> tell ya what
<chris> ill do a history.txt update
<chris> and commit that, and we can pull that
<chris> gmcharlt: http://git.workbuffer.org/cgi-bin/gitweb.cgi?p=koha.git;a=shortlog;h=refs/heads/documentation
<chris> so you can a remote for my repo
<chris> git remote add workbuffer.org git://git.workbuffer.org/git/koha.git
<chris> then git checkout -b documentation –track workbuffer.org/documentation
<chris> (probably need a git fetch workbuffer.org too(
<chris> then you can cherry-pick that commit over
<chris> thats one way to do it
<chris> or you could just checkout a branch
<gmcharlt> chris: yeah, I think I’ll do it as a pull
<chris> checkout -b mydocumentation
<chris> git pull workbuffer.org/documentation
<chris> i think that will do it anyway
<gmcharlt> yeah, then into my staging branch
<gmcharlt> git checkout test
<gmcharlt> git merge mydocumentation/documentation
<gmcharlt> or directly
<gmcharlt> git merge workbuffer.org/documentation
<chris> yep
<chris> i think the pull will do fetch + merge for ya
<gmcharlt> it does indeed
<gmcharlt> fetch first, though
<gmcharlt> lets you do git log –pretty=oneline test workbuffer.org/documentation
<chris> good point
<gmcharlt> chris: well, let’s make it official – send a pull request to the patches list
<chris> will do
<gmcharlt> e.g., Subject: PULL – git://git.workbuffer.org/koha.git – documentation – history changes
<gmcharlt> brief description of changes in body
<gmcharlt> something like that
<chris> works for me
<gmcharlt> “Welcome, all, to DVCS performance theatre”
<chris> off it goes
<chris> this was our first git tutorial right there .. quick someone take photos or something :-)

Categories: Koha Tags: , ,