Archive

Monthly Archives: July 2006

It’s been a while since I’ve written anything about the mlaut. It’s been a while since I’ve written about anything, really. Lots of reasons for that: been frantically trying to pull the mlaut together in time to launch for fall semester, and I’ve got this little bit of business going on…

Still, it’s probably time to touch on some of the changes that have happened.

  1. The backend has been completely updated
  2. The intial design was… shaky… at best. While the new backend is probably still shaky (it is, after all, my creation), it’s certainly more thought-out. Incoming requests are split by their referent and referrers (see Thom Hickey’s translation for these arcane terms) and the referent is checked against a ‘Ferret store’ of referents. The rationale here is that citations take a bunch of forms in regards to their completeness and quality, so we do some fulltext searching against an index of previously resolved referents to see if we’ve dealt with this before.

    It then stores the referent and the response object as Marshaled data, which is great, except it royally screws up trying to tail -f the logs.

  3. New design
  4. Heather King, our web designer here at Tech, has vastly improved the design. There’s still quite a bit more to do (isn’t there always?), but we’ve got a good, eminently usable, interface to build upon. The bits that need to be cleaned up (mainly references to other citations) won’t be that hard to clean up.

  5. Added Social Bookmarking Support
  6. Well, read-only support. Connotea, Yahoo’s MyWeb and Unalog support were pretty easy to add courtesy of their existing APIs. The downside is that I can only hope to find bookmarks based on URL which… doesn’t work well. I really wish Connotea would get some sort of fielded searching going on. Del.icio.us support, which would be great, can’t really happen until they ease the restrictions on how often you can hit it.

    CiteULike was a bit more of a hack, as it has no API. Instead, I am finding CiteULike pages in the Google/Yahoo searches I was already doing, grabbing the RSS/PRISM and tags and then doing a search (again, retrieving the RSS/PRISM) with the tags in the first article. It’s working pretty well, although I need to work out title matching since both Yahoo and Google truncate HTML titles. I plan on adding the CiteULike journal table of contents feeds this way, too.

  7. Improved performance thanks to the magic of AJAX
  8. Let’s face it. The mlaut was bloody slow. There were a couple of reasons for this, but it was mainly due to hitting Connotea and trying to harvest any OAI records it could find for the given citation. The kicker was that this was probably unnecessary for majority of the users. Now we’ve pushed the social bookmarkers and the OAI repositories to a ‘background task’ that gets called via javascript when the page renders. It’s not technically AJAX as much as a remote procedure call, but AJAX is WebTwoFeriffic! Besides, this is a Rails project. Gotta kick up the buzz technologies.

  9. Now storing subjects in anticipation of recommender service
  10. The mlaut now grabs subject headings from Pubmed (MeSH); LCSH from our catalogs; SFX subjects; tags from Connotea, CiteULike, MyWeb and unalog; and subjects from the OAI repositories and stores them with the referent. It also stores all of these in the Ferret store. The goal here is to search on subject headings to find recommendations to other similar items. As of this writing, there is only one citation with subject associations, so there’s nothing really to see here.

The big todo that’s on my plate for the rest of the week is adding statistics gathering. I’ve got my copy of An Architecture for the Aggregation and Analysis of Scholarly Usage Data by Bollen and Van de Sompel printed out and I plan on incorporating their bX concept for this.