Archive

Monthly Archives: August 2009

One of the byproducts of the “Communicat” work I had done at Georgia Tech was a variant of Ed Summersruby-marc that went into more explicit detail regarding the contents inside the MARC record (as opposed to ruby-marc which focuses on its structure).  It had been living for the last couple of years as a branch within ruby-marc, but this was never a particularly ideal approach.  These enhancements were sort of out of scope for ruby-marc as a general MARC parser/writer, so it’s not as if this branch was ever going to see the light of day as trunk.  As a result, it was a massive pain in the butt for me to use locally:  I couldn’t easily add it as a gem (since it would have replaced the real ruby-marc, which I use far too much to live without) which meant that I would have to explicitly include it in whatever projects I wanted to use it in and update any paths included accordingly.

So as I found myself, yet again, copying the TypedRecords directory into another local project (this one to map MARC records to RDF), I decided it was time to make this its own project.

One of the amazingly wonderful aspects of Ruby is the notion of “opening up an object or class”.  For those not familiar with Ruby, the language allows you to take basically any object or class, redefine it and add your own attributes, methods, etc.  So if you feel that there is some particular functionality missing from a given Ruby object, you can just redefine it, adding or overriding the existing methods, without having to reimplement the entire thing.  So, for example:

class String
  def shout
    "#{self.upcase}!!!!"
  end
end

str = "Hello World"
str.shout
=> "HELLO WORLD!!!!"

And just like that, your String objects gained the ability to get a little louder and a little more obnoxious.

So rather than design the typed records concept as a replacement for ruby-marc, it made more sense to treat it more as an extension to ruby-marc.  By monkey patching, the regular marc parser/writer can remain the same, but if you want to look a little more closely at the contents, it will override the behavior of the original classes and objects and add a whole bunch of new functionality.  For MARC records, it’s analogous to how Facets adds all kinds of convenience methods to String, Fixnum, Array, etc.

So, now it has its own github project:  enhanced-marc.

If you want to install it:

  gem sources -a http://gems.github.com
  sudo gem install rsinger-enhanced_marc

There’s some really simple usage instructions on the project page and I’ll try to get the rdocs together as soon as I can.  In a nutshell it works almost just like ruby-marc does:

require 'enhanced_marc'

records = []
reader = MARC::Reader.open('marc.dat')
reader.each do | record
  records << record
end

As it parses each record, it examines the leader to determine what kind of record it is:

  • MARC::BookRecord
  • MARC::SerialRecord
  • MARC::MapRecord
  • MARC::ScoreRecord
  • MARC::SoundRecord
  • MARC::VisualRecord
  • MARC::MixedRecord

and adds a bunch of format specific methods appropriate for, say, a map.

It’s possible to then simply extract either the MARC codes or the (English) human readable string that the MARC code represents:

record.class
=> MARC::SerialRecord
record.frequency
=> "d"
record.frequency(true)
=> "Daily"
record.serial_type(true)
=> "Newspaper"
record.is_conference?
=> false

or, say:

record.class
=> MARC::VisualRecord
record.is_govdoc?
=> true
record.audience_level
=> "j"
record.material_type(true)
=> "Videorecording"
record.technique(true)
=> "Animation"

And so on.

There is still quite a bit I still need to add.  It pretty much ignores mixed records at the moment.  It’s something I’ll need to eventually get to, but these are uncommon enough that it’s currently a lower priority.  I also need to provide some methods that evaluate the 007 field.  I haven’t gotten to this yet, just because it’s just a ton of tedium.  It would be useful, though, so I want to get it in there.

If there is interest, it could perhaps be extended to include authority records or holdings records.  It would also be handy to have convenience methods on the data fields:

record.isbn
=> "0977616630"
record.control_number
=> "793456"

Anyway, hopefully somebody might find this to be useful.

For a couple of months this year, the library world was aflame with rage at the proposed OCLC licensing policy regarding bibliographic records.  It was a justifiable complaint, although I basically stayed out of it:  it just didn’t affect me very much.  After much gnashing of teeth, petitions, open letters from consortia, etc. OCLC eventually rescinded their proposal.

Righteous indignation: 1, “the man”: 0.

While this could certainly counted as a success (I think, although this means we default to the much more ambiguous 1987 guidelines), there is a bit of a mixed message here about where the library community’s priorities lie.  It’s great that you now have the right to share your data, but, really, how do you expect to do it?

It has been a little over a year since the Jangle 1.0 specification has been released; 15 months or so since all of the major library vendors (with one exception) agreed to the Digital Library Federation’s “Berkeley Accord”; and we’re at the anniversary of the workshop where the vendors actually agreed on how we would implement a “level 1” DLF API.

So far, not a single vendor at the table has honored their commitment, and I have seen no intention to do so with the exception of Koha (although, interestingly, not by the company represented in the Accord).

I am going to focus here on the DLF ILS-DI API, rather than Jangle, because it is something we all agreed to.  For all intents and purposes, Jangle and the ILS-DI are interchangeable:  I think anybody that has invested any energy in either project would be thrilled if either one actually caught on and was implemented in a major ILMS.  Both specifications share the same scope and purpose.  The resources required to support one would be the same as the other, the only difference between the two are the client-side interfaces.  Jangle technically meets all of the recommendations of the ILS-DI, but not to the bindings that we, the vendors, agreed to (although there is an ‘adapter’ to bridge that gap).  Despite having spent the last two years of my life working on Jangle, I would be thrilled to no end if the ILS-DI saw broad uptake.  I couldn’t care less about the serialization; I only care about the access.

There is only one reason that the vendors are not honoring their commitment:  libraries aren’t demanding that they do.

Why is this?  Why the rally to ensure that our bibliographic data is free for us to share when we lack the technology to actually do the sharing?

When you look at the open source OPAC replacements (I’m only going to refer to the OSS ones here, because they are transparent, as opposed to their commercial counterparts):  VuFind, Blacklight, Scriblio, etc. and take stock of hoops that have to be jumped through to populate their indexes and check availability, most libraries would throw their hands in the air and walk away.  There are batch dumps of MARC records.  Rsync jobs to get the data to the OPAC server.  Cron jobs to get the MARC into the discovery system.  Screen scrapers and one off “drivers” to parse holdings and status.  It is a complete mess.

It’s also the case for every Primo, Encore, Worldcat Local, AquaBrowser, etc. that isn’t sold to an internal customer.

If you’ve ever wondered why the third party integration and enrichment services are ultimately somewhat unsatisfying (think BookSite.com or how LibraryThing for Libraries is really only useful when you can actually find something), this is it.  The vendors have made it nearly impossible for a viable ecosystem to exist because there is no good way to access the library’s own data.

And it has got to stop.

For the OCLC withdrawal to mean anything, libraries have either got to put pressure on their vendors to support one of the two open APIs, migrate to a vendor that does support the open APIs, or circumvent the vendors entirely by implementing the specifications themselves (and sharing with their peers).  This cartel of closed access is stifling innovation and, ultimately, hurting the library users.

I’ll hold up my end (and ensure it’s ILS-DI compatible via this) and work towards it being officially supported here, but the 110 or so Alto customers aren’t exactly going to make or break this.

Hold your vendor’s feet to the fire and insist they uphold their commitment.