Archive

Monthly Archives: March 2007

Before I left for Guatemala, Ian Davis at Talis asked if I could give him a dump of our MARC records to load into Talis Platform. I had been talking in the #code4lib channel about how I was pushing the idea of using Talis Source to make simple, ad-hoc union catalogs; we could make one for Georgia Tech & Emory (we have joint degree programs) or Arche or Georgia Tech/Atlanta-Fulton Public Library, etc. My thinking was that by utilizing the Talis Platform, we could forgo much of the headache in actually making a union catalog for somewhat marginal use cases (the public library one notwithstanding).

About a week after I got back from Guatemala, I had an email from Richard Wallis with some urls to play around with to access my Bigfoot store. He showed me search services, facet services and augment services. I was unable to be really dive into it much at the time but since I’m working on a total site search project for the library, I thought this would be a good chance to kick the tires a bit to include catalog results.

After two days of poking around, I have made some opinions of it, have some recommendations for it, and wrote a Ruby library to access it.

1) The Item Service

This is certainly the most straightforward and for many people, the most useful service of the bunch. The easiest way to think of the item service is an HTTP based Lucene service (a la Solr or Lucene-WS) of your bib records. It returns something OpenSearch-y (it claims to be a RSS 1.0 document), but it doesn’t validate. That being said, FeedTools happily consumed it (more on that later) and the semantics should be familiar to anyone that has looked at OpenSearch before. Each item node also contains a Dublin Core representation of the record and a link to a marcxml representation. I’m not sure if there’s a description document for Bigfoot.
Although the query syntax is pure Lucene (title:”The Lexus and the Olive Tree”), the downside is that it’s not documented anywhere what the indexes are and I doubt there would be any way to add new ones (for example, my guess is I wouldn’t be able to get an index for 490/440$v that I use for the Umlaut). I don’t see returning the results as OAI_DC being too much of a problem, since the RSS item includes a title (which would have been tricky between the DC and the marcxml). My Ruby library might not generate valid DC, I haven’t really looked into it.

The docs also mention you can POST items to your Bigfoot store, but they don’t mention what your data needs to look like (MARC?) or what credentials you need to add something (I mean, it must be more than just your store name, right?). My hope is to add this functionality to bigfoot-ruby soon (especially since my data is from a bulk export from last October).

2) The Facet Service

This one is intriguing, definitely, since Faceted searching is all the rage right now. The search syntax is basically the same as the Item Service, except you also send a comma delimited list of the fields you would like to query. What you get back is either an XML or XHTML document of your results.

For each field you request, you get back a set of terms (you can specify how many you want, with a default of 5) that appear most frequently in your field. You also get an approximation for how many results you would get in that facet and a url to search on that facet. It’s quite fast, although, realistically, you can’t do much with the output of facet search alone.

Again, it’s difficult to know what you can facet on (subject, creator and date are all useful — I’m sure there are others) and the facet that (for me, at least) held the most promise — type — is too overly broad to do much with (it uses Leader position 7, but lumps the BKS and SER types all in a label called “text”). I would like to see Talis implement something like my MARC::TypedRecord concept so one could facet on things like government document or conference. You could separate newspapers from journals and globes from maps. Still, the text analysis of the non-fixed fields is powerful and useful and beats the hell out of trying to implement something like that locally.

In bigfoot-ruby, I have provided two ways to do a faceted search: you can just do the search and get back Facet objects containing the terms and search urls or you can facet with items which executes the item searches automatically (in turn getting a definitive number of results for the query, as well). Since I didn’t bother to implement threading, getting facets with items can be pretty slow.

3) The Augment Service

To be honest, I’m having a hard time figuring out useful scenarios for the augment service. The idea is that you give it the URI of an RSS feed, and this service will enhance it with data from your Bigfoot store (at least, that is sort of how I understand it works). Richard’s example for me was to feed it the output of an xISBN query (which isn’t in RSS 1.0, AFAIK, but, for the sake of example…) and the augment service would fill in the data for ISBNs your library holds. The API example page mentions Wikipedia, but I don’t know where other than the Talis Platform that you can get Wikipedia entries formatted properly. I tried sending it the results of an Umlaut2 OpenSearch query, but it didn’t do anything with it. Presumably this RSS 1.0 feed needs the bib data to be sent in a certain way (my guess is in OAI_DC, like the Item Service), but I’m not sure. The only use case I can think of for this service is a much simpler way to check for ISBN concordance (rather than isbn:(123456789X|223456789X|323456789X|etc.))

Overall, I’m really impressed with the Talis API. It is a LOT easier to use than, say, Z39.50 and by using OpenSearch seems more natural to integrate into existing web services than SRU.

Bigfoot-ruby is definitely a work in progress. I think I would like to split the Search class into ItemService and FacetService. I don’t like how results is an Array for items and a Hash for facets. Just seems sloppy. I need to document it, of course and I would like to implement Item POST. This project also made me realize how bloody slow FeedTools is. I am currently using it in both the Umlaut and the Finding Aids to provide OpenSearch, but I think it’s really too sluggish to justify itself.

Thanks, Talis, for getting me started with Bigfoot and giving me the opportunity to play around with it. Also, thanks to Ed Summers for fixing SVN on Code4lib.org. You wouldn’t be able to download it and futz around with it yourself, otherwise.

If YPOW, like MPOW, is an Endeavor Voyager site, you’ve got some decisions ahead. Francisco Partners, naturally, would like you to migrate to Aleph, and I have no doubt that Ex Libris is, as I write this, busily working on a means to make that easy for Voyager libraries to do. But ILS migrations are painful, no matter how easy the backend process might be. There’s staff training, user training, managing new workflows, site integration; lots of things to deal with. Also, your functionality may not be a 1:1 relationship to what you currently have. How do you work around services you depended upon?

Since soon our contracts with Endeavor Information Systems will be next to worthless, I propose, Voyager customers, that we take ownership of our systems. For the price of a full Oracle (or SQL Server? — does Voyager support other RDBMSes?) license (many of us already have this), we can get write permissions to our DB and make our own interfaces. We wouldn’t need to worry about staff clients (for now), since we already have cataloging, circulation, acquisitions, etc. modules that work. When we’re ready for different functionality, however, we can create a new middleware (in fact, I’m planning to break ground on this in the next two weeks) to allow for web clients or, even better, piggyback on Evergreen’s staff clients and let somebody else do the hard work. If we had native clients in the new middleware, a library could use any database backend they wanted (just migrate the data from Oracle into something else). The key is write access to the database.

By taking ownership of our ILS, we can push developments we want, such as NCIP, a ‘Next Gen OPAC’, better link resolver integration, better metasearch integration, etc. without the pain of starting all over again (with potentially the same results, who is to say that whatever you choose as an ILS wouldn’t eventally get bought and killed off, as well?). Putting my money (or lack thereof) where my mouth is, I plan on migrating Fancy Pants to use such a backend (read only db access, for now, we still have a support contract, after all). I’m calling this project ‘Bon Voyage’. After reading Birkin’s post on CODE4LIB, I would like to make a similar service for Voyager that would basically take the place of the Z39.50 server and access to the database. Fancy Pants wouldn’t be integrated into Bon Voyage, it would just be another client (since it was always only meant as a stopgap, anyway).

What we’ll have is a framework for getting at the database backend (it’d be safe to say this will be a rails project) with APIs to access bib, item, patron, etc. information. Once the models are created, it will be relatively simple to transition to ‘write’ access when that becomes necessary. Making a replacement for WebVoyage would be fairly trivial once the architecture is in place. Web based staff clients would also be fairly simple. I think EG staff client integration wouldn’t be too hard since it would just be an issue of outputting our data to something the EG clients want (JSON, I believe) and translating the client’s reponse. That would need to be investigated more, however (I’m on paternity leave and not doing things like that right now 🙂

Would anybody find this useful?
It seems the money we spend on an ILS could be better spent elsewhere. I don’t think this would be a product we could distribute outside of the the current Voyager customer base (at least, not until it was completely native… maybe not even then- we’d have to work this out with Francisco Partners, I guess), but I think that that is big enough to be sustainable on its own.

Well, we’re home. Things are going well. Guatemala was a lot of waiting… waiting for Che, waiting for the U.S. Embassy, waiting for his visa, waiting around the hotel, waiting for the airport, waiting in the airport, waiting to land, waiting at customs and immigration…

He is wonderful, though, and we’re really enjoying things right now.

I’m on leave until next Tuesday (after which I’ll be working from home until June), but since Che doesn’t talk much, it gives me lots of time to think while I’m rocking him to sleep or (even more) laying in bed awake wondering when he’s going to wake up next. With that, I’ve had some ‘work related ideas’ 🙂

Photographic evidence of Che’s homecoming.