Monthly Archives: October 2005

I’ll have to keep this rather short, since the hotel wireless network is being flaky (as usual) (in fact I am having to write this while standing in the bathroom — oddly the best wireless reception in my room).

Again, the conference organizers have proven why this is the only professional event I schedule in my year. I’ll comment on the earlier days a little later, but while day 3 is still rather fresh in my mind (as fresh as my poor, oversaturated mind can be), I’d like to touch on a few things.

1) Listen to every word Lorcan says, always. It blows my mind what an amazing asset he is to our community and how there are people (in my library) who have no idea who he is. His presentation this morning has completely energized me to kick it up a notch in getting our library more into our (and other) user’s “LifeFlow”.

2) Art and Peter proved that WAG needs no “G”. I’ve been working with Art on this WebDav/OPAC project for months (thanks to SUDOC, as he pointed out), and I never, ever would have dreamed of the things these two are coming up with. Cocoon is, evidentally, a very magical beast and the potential of storing these “trails” could have huge implications on the collaborative research environment that we are trying to create at Georgia Tech. Being able to chart the path of scholarship would make it easier to get to the giant to stand upon his shoulders.

3) Internet communication is lousy when trying to develop a new spec. Despite being there at the beginning (and being a very loud proponent of COinS), I could not wrap my head around the use cases for COinS-PMH. Oh, Dan tried to “learn me”, but it really took his presentation today to “get it”. I definitely “get it” now, and expect to see COinS-PMH all over Tech.

4) Hackfest is the greatest invention ever. And I honestly couldn’t imagine it working properly at any other conference (sorry, LITA).

5) This is why I’m applying to enroll in the Master’s program for Human-Computer Interaction at Tech. This, coupled with the previous two presentations (Art/Peter’s, Dan’s)… Holy crap. The world would be so different.

The U.S. is screwed. We have sold our souls, culture and future to corporate interests and I’m not sure how we can fix it. As Peter remarked to me, hopefully Cliff Lynch’s vision of a world where everything is digitized except the intellectual output after 1920 will light a fire under us. I fear at that point it may be too late, however. It looks like Canada’s future might be a bit brighter. Even if it isn’t, I’ll get fired up by the revolutionary rhetoric, any day.

Wow, I love this place.

I am still feeling my way around Python. I have yet to grasp the zen of being Pythonic, but I am at least coming to grips with real object orientation (as opposed to the named hashes of PHP) and am actually taking the leap into error handling, which, if you have dealt with any of the myriad bugs in any of my other projects, you’d know has been a bit of a foreign concept to me.

Python project #2 is known as RepoMan (thanks to Ed Summers for the name). It attempts to solve a problem that not one but two other opensource projects already have solved admirably (I’ll go into more about this in a bit). RepoMan is an OAI Repository indexer that makes said repository available via SRU. I created it in an attempt to make our DSpace implementation searchable from remote applications (namely, the site search and the upcoming alternative opac). It’s an extremely simple two script project that has only taken a week to get running largely due to the existence of two similar and available python scripts that I could modify for my own use. It’s also due to the help of Ed Summers and Aaron Lav.

The harvester is, basically, Thom Hickey’s one page OAI harvester with some minor modification. I have added error handling (the two lines I added to compensate for malformed xml must have been over the “one page limit”) and instead of outputting to a text file, it shoves the records in a Lucene index (thanks to PyLucene). This part still needs some work (I’m not sure what it would do with an “updated” record, for example), but it makes a nice index of the Dublin Core fields, plus a field for the whole record, for “default” searches. This was a good exercise for me to work with xml, Python and Lucene, because I was having some trouble when trying to index the MODS records for the alternative opac.

The SRU server is, basically, Dan Chudnov‘s SRU implementation for unalog. It needed to be de-Quixotefied and is, in fact, much more robust than Dan’s original (of course, unalog’s implementation doesn’t need to be as “robust”, since the metadata is much more uniform), but certainly having a working model to modify made this go much, much faster. The nice part is that there might be some stuff in there that Dan might want to put back into unalog.

So, here is the result. The operations currently supported are explain and searchRetrieve and majority of CQL relations are unsupported, but it does most of the queries I need it to do and, most importantly, it’s less than a week old.

So the burning question here is: why on earth would I waste time developing this when OCKHAM’s Harvest-to-Query is out there, and, even more specifically, OCLC’s SRW/U implementation for DSpace is available? Further, I knew full well that these projects existed before I started.

Lemme tell ya.

Harvest-to-Query looked very promising. I began down this road, but stopped about halfway down the installation document. Granted, anything that uses Perl, TCL and PHP has to be, well, something… After all, those were the first three languages I learned (and in the same order!). Adding in IndexData’s Zebra seemed logical as well since it has a built-in Z39.50 server. Still, this didn’t exactly solve my problem. I’d have to install yazproxy, as well, in order to achieve my SRU requirement. Requiring Perl, TCL, PHP, Zebra and yazproxy is a bit much to maintain for this project. Too many dependencies and I am too easily distracted.

OCLC’s SRW/U seemed so obvious. It seemed easy. It seemed perfect. Except our DSpace admin couldn’t get it to work. Oh, I inquired. I nagged. I pestered. That still didn’t make it work. I have very limited permissions on the machine that DSpace runs on (and no permissions for Tomcat), so there was little I could do to help. This also solved a specific purpose, but didn’t necessarily address any other OAI providers that we might have.

So, enter RepoMan. Another wheel that closely resembles all the other wheels out there, but possibly with minor cosmetic changes. Let a thousand wheels be invented.