May 2010 – Meta Interchange

With the recent move of the Koha mailing lists to http://lists.koha-community.org/, the full email address for submitting a patch is now koha-patches@lists.koha-community.org. Since that’s a bit long to type every time you submit a patch, here’s one way to save a few keystrokes.

First, create a text file file to define a short alias for the email address. I chose to call the file ~/.git_aliases. The file should contain a line like this:

alias kpatches koha-patches@lists.koha-community.org

Next, tell Git that you want to use this address book file:

$ git config --global sendemail.aliasesfile ~/.git_aliases
$ git config --global sendemail.aliasfiletype mutt

Now, all you have to do is use kpatches in place of the full address. For example:

$ git send-email 0001-bug-4801-fix-paging-in-display-of-staged-bibs-and-im.patch
0001-bug-4801-fix-paging-in-display-of-staged-bibs-and-im.patch

Who should the emails appear to be from? [Galen Charlton ]

Who should the emails be sent to? kpatches

or

$ git send-email -to kpatches 0001-bug-4801-fix-paging-in-display-of-staged-bibs-and-im.patch

I found out today that Craig Lowe, the incoming mayor of Gainesville, used to work as a programmer for the Florida Center for Library Automation, so his election is even better news for libraries in Gainesville and Alachua County than I had thought. Craig has been on the library governing board for a while, but it is nice to know that he has direct experience to bear. (Yes, I voted for him.)

I am glad to see that PTFS and its LibLime division have contributed the developments that PTFS has been working on for the past year or so, including features commissioned by the East Brunswick and Middletown Township public libraries and others. The text of LibLime’s announcement makes it clear that this is meant as a submission to Koha 3.2 and (more so) 3.4:

The code for the individual new features included in this version has also been made available for download from the GIT repository. The features in this release were not ready for 3.2, but, pending acceptance by the 3.4 Koha Release Manager, could be included in release 3.4.

Chris Cormack (as 3.4 release manager) and I (as 3.2 release manager) have started work on integrating this work in the Koha. Since 3.2 is in feature freeze, for the most part only the bugfixes from Harley will be included in 3.2, although I am strongly considering bringing in the granular circulation permissions work as well. The majority of the features will make their way into 3.4, although they will go through QA and discussion like any other submission.

So far, so good. As a set of contributions for 3.2 and 3.4, “Harley” represents the continuation of PTFS’ ongoing submissions of code to Koha in the past year. Further, I hope that if PTFS is serious about their push for “agile” programming, that they will make a habit of submitting works in progress for discussion and public QA sooner, as in some cases “Harley” features that were obviously completed months ago were not submitted until now.

But here is where the mixed messages come in: “Harley” is prominently listed on koha.org as a release of Koha. Since no PTFS staff are among the elected release managers or maintainers for Koha, that is overreaching. Ever since Koha expanded beyond New Zealand, no vendor has hitherto unilaterally implied that they were doing mainstream releases of Koha outside of the framework of elected release managers.

Before I go further, let me get a couple things out of the way. If somebody wants to enhance Koha and create installation packages of their work in addition to contributing their changes to the Koha project, that’s fine. In fact, if somebody wants to do that without formally submitting their changes, that’s certainly within the bounds of the GPL, although obviously I’d prefer that we have one Koha instead of a bunch of forks of it. If any library wants to download, install, test, and use “Harley”, that’s fine as well. Although there could be some trickiness upgrading from “Harley” to Koha 3.2 or Koha 3.4, it will certainly be possible to do so in the future.

What I am objecting to is the overreach. Yes, “Harley” is important. Yes, I hope it will help open a path to resolve other issues between PTFS/LibLime and the rest of the Koha community. Yes, I thank PTFS for releasing the code, and in particular publishing it in their Git repository. That doesn’t make it an official release of Koha; it is still just another contribution to the Koha project, the same as if it came from BibLibre, software.coop, Catalyst, Equinox, one of the many individual librarians contributing to Koha, or any other source.

“Harley” is available for download from LibLime’s website at http://www.liblime.com/downloads. This is where it belongs. Any vendor-specific distribution of Koha should be retrievable from the vendor’s own website, but it should not be presented as a formal release. Perhaps there is room to consider having the Koha download service also offer vendor-specific distributions in addition to the main releases, but if that is desired, it should be proposed and discussed on the community mailing lists.

Updating koha.org to remove the implication that “Harley” is an official release is a simple change to make, and I call upon PTFS to do so.

Please see my disclosure statement. In particular, I am release manager for Koha 3.2 and I work for a competitor of PTFS. This post should not be construed as an official statement by Equinox, however, although I stand by my words.

I’ve had occasion recently to dig into how Evergreen uses space in a PostgresSQL database. Before sharing a couple queries and observations, here’s my number one rule for configuring a database server for Evergreen: allocate enough disk space. If you’re using a dedicated database server, you’ll need disk space to store the following:

the database files storing the actual data,
current WAL (write-ahead log) files,
archived WAL files (which should be backed up off the database server as well),
current database snapshots and backups (again, these should be backed up offline as well),
scratch space for migrations and data loads,
future data, particularly if you’re using Evergreen for a growing consortium, and
the operating system, the Postgres software, etc.

Of course, the amount of disk space required just to store the data depends on the number of records you have. A complete sizing formula would take into account the number of bibs, items, patrons, circulation transactions, and monetary transactions you expect to have, but here’s a rule of thumb based on looking at several production Evergreen 1.6 databases and rounding up a bit: allocate at least 50K per bibliographic record.

That’s only the beginning, however. Postgres uses write-ahead logging to record database transactions; this has the effect of adding a 16M file to the pg_xlog directory every so often as users catalog and do circulation. In turn, the WAL files should get archived periodically by enabling archive mode so that copies exist both on the database server itself and on backup media.

In a busy system, read “quite often” for “every so often” in the second sentence of the previous paragraph. In a system where you’re actively loading data, particularly if you’re also keeping that database up for production use and are therefore keeping WAL archiving on, read “fast and furious”. Why does this matter? If your database server crashes, you will need the most recent full backup and the accumulated archived WAL files since that backup to recover your database. If you don’t keep your WAL files, be prepared for an involuntary fine amnesty and angry catalogers. Conversely, what happens if you run out of space for your archived WAL files? Postgres will lock up, bringing your Evergreen system to a halt, yielding angry patrons and angry circulation staff.

Archived WAL files don’t need to be kept on the database server forever, fortunately. After each periodic full backup, archived WAL files made prior to that backup won’t be needed in case you need to do a point in time recovery. Of course, that assumes everything goes well during the recovery, so you will still want to keep at least a couple generations of full backups and WAL file sequences, include offline backup copies, and also periodically create logical database dumps using pg_dump. LOCKSS isn’t just for digitized scholarly papers.

So, what’s my rule of thumb for estimating total disk space needed for an Evergreen database server? 200K per bibliographic record that you expect to have in your database three years from now. I admit that this is on the high side, and this is not the formula that Equinox’s server people necessarily use for hardware recommendations. However, while disk space may or may not be “cheap”, it is often cheaper than a 2 a.m. wake-up call from the library director.

How does this disk space get used? I’ll close with a couple queries to run against your Evergreen database:

select schemaname,
       pg_size_pretty(sum(
         pg_total_relation_size(schemaname || '.' || tablename)
       )::bigint) AS used
from pg_tables
group by schemaname
order by 
  sum(pg_total_relation_size(schemaname || '.' || tablename))::bigint desc;

This gives you the amount of space used in each schema. The metabib schema, which contains the indexing tables, will almost certainly be #1. Depending on how long you’ve been using your Evergreen system, either auditor or biblio will be #2.

select schemaname || '.' || tablename AS tab,
       pg_size_pretty(
         pg_total_relation_size(
           schemaname || '.' || tablename
         )
       ) AS used
from pg_tables
order by pg_total_relation_size(schemaname || '.' || tablename) desc;

This will give you the space used by each table. metabib.real_full_rec will be #1, usually followed by biblio.record_entry. It is interesting to note that although those two tables essentially store exactly the same data, metabib.real_full_rec will typically consume five times as much space as biblio.record_entry.

Month: May 2010

Saving keystrokes when sending Koha patches

Small world (or, did somebody just start a Code4Lib PAC?)

Mixed messages

Database server disk space usage in Evergreen

Share this:

Share this:

Share this:

Share this: