I have mentioned here several times the “alternative to the catalog” project I am trying to implement at Tech. One of the problems that I’ve had is naming the project something that lets people realize what I’m talking about, without the political hairiness of saying “catalog replacement” (since that’s technically not true, anyway).
In a meeting two weeks ago (about subject guides), I was drawing the concept of this project on the whiteboard of our conference room. It’s been up ever since and in the middle, I had written “ALTOpac” because that was an easy way to loosely describe it in a way that the uninitiated in the room could envision where I was starting from. Sitting in another meeting today, the capitalized letters jumped out at me: ALTO. It means nothing.
And I like that. Of course it still doesn’t explain what it’s about. That’s what subtitles are for.
Now, let me explain what the hell Alto is and what it is supposed to do.
Alto is a “community-based collection builder and search engine”.
Come to think of it, that might not actually clear anything up.
Let’s back up a bit, shall we?
To say searching the catalog is “searching our collection” is quite arbitrary and false. Metasearch doesn’t really solve this problem, since you’d still only point the metasearch engine at certain assets and it’s non-trivial to make relationships between assets. Metasearch is part of the solution, but hardly the panacea.
Again, our “collection” is an ambiguous term and shouldn’t be solely determined by our collection development policies/budget. It is our opinion that if something is important enough to be added to a reserves list (even a web page), it should technically be part of our collection. I would not, however, say it should be cataloged (and that’s why this isn’t a catalog replacement project, see?). If an item is even bookmarked (via a local social bookmarking service, such as unalog or connotea) it should then become part of our collection. A 1927 engineering textbook from Purdue’s catalog? Index it! If a member of our community finds it important enough to want to come back to and share with a group, it’s important enough for us to aggregate into our “collection”. Relevance comes later (keep reading, if you’re interested).
There are also relationships that our community (for the sake of argument, let’s start with “Georgia Tech”) builds that are highly relevant for finding connections between disparate “things”. So, the items put on reserve for a particular course have an umbrella of commonality between them that should be utilized for anyone that runs across any of these items. The relevance ranking should be even greater for a user that happens to be a member of the group in question (for instance, is enrolled in the class).
If Alto has a citation management-esque feature in it, users can very specifically group relevant resources together based on a project. Resources can be anything: books, websites, articles, searches, chat transcripts, trails, you name it.
And all of this should feed the “relevance beast”, as it were.
So that’s some background. Given that we’ll have some formal subject classifications for these objects (from the OPAC or from metasearch or whatever), we should be able to bridge the formal to folksonomy to make sense of how people have classified their saved things.
We can then begin to cluster search results. Format, subject, concept, group, policies… All of these can be browsed after the search begins. The search results will be a combination of metadata objects and library content. If some of the results appear in a given “subject guide”, the guide will a suggested resource (and will, in turn, push some resources into the result set).
The goal is to open the silos we have created around our resources/services. It would break down the ambiguity between “collections”, “services” and “policies” since they’re all interrelated.
How do we plan to do this? Glad you asked (you’re still reading, right?)!
We’ve exported all of the bib records from our catalog. The plan is to use METS as our wrapper around MODS. We’ll then harvest our institutional repository and index our website. That’s a pretty good base to start with. All of this is stored in a dbXML database and indexed with Lucene.
If users want to harvest a collection from citeseer or OAIster, that will be available and will become part of our collection. Annotations, links to reviews, links to content to index will all be made available.
I’m leaving a lot out and glossing some of this over… but it starts to put the idea on “paper” for me to come back later.