Archive

SuDoc

Jonathan Rochkind recently started a thread on the Code4lib mailing list asking how to register an info URI for SuDocs.  Ray Denenberg responded with an explanation of the process.  I won’t get into my opinions of info URIs or the merits of either side of the ensuing debate that spun out from this thread, but my takeaway was that Jonathan wasn’t really looking for an info URI, anyway.

What Jonathan wanted was:

  • A generic URI model to define SuDocs
  • For this model to be maintained and hosted by somebody other than him
  • If possible, the URIs be resolvable to something that made sense for the supplied SuDoc

I think these are reasonable desires.

I also thought that there were existing structures out there that could meet his requirements without going through the “start up costs” of registering an info URI.  Also, info URIs are not of the web, so after going through the work of creating a ‘standard’, you cannot actually use it directly to figure out what the SuDoc is referring to.

SuDocs (like all other aspects of Government Documents) are arcane, niche and not understood by anyone other than GovDoc librarians, who seem to be rare.  That being said, there is a pretty convenient web presence implicit in SuDoc — in order for a SuDoc to exist, it needs to appear in the GPO’s catalog.  Since anything that appears in the GPO’s catalog can be seen on the web, we have a basis for a dereferenceable URI structure.

The GPO uses Ex Libris’ Aleph and whatever Aleph’s out of the box web OPAC is for their catalog.  Last week, I was Googling for some information about SRU and Aleph and it led me to this page about constructing CCL queries into Aleph (note, please disregard almost everything written on this page about SRU, CQL, etc., as it’s almost completely b.s.).  Figuring there must some way to search on SuDocs, I tried a couple of combinations of things, until I found this page in the GPO catalog.  Ok, so the index for SuDocs is called “GVD”.

This gives us URLs like:  http://catalog.gpo.gov/F/?func=find-c&ccl_term=GVD%3DE%202.11/3:EL%202

Now, this could work, but it’s incredibly awkward.  It’s also extremely fragile since it’s software (in this case, Aleph) dependent, and if it was to break, requires the GPO to redirect us to the right place.

This is, of course, exactly what PURLs were designed to do.  I had never actually set up a PURL and almost didn’t for this, since the purl.org service said that it wasn’t working properly so it would be disabled for another week.  However, all the links were there, so I forged ahead.  I was in the process of setting up a regular PURL, when I ran across partial redirects.  I figured something like this had to exist for PURLs that were used for RDF vocabularies and the like, but wasn’t aware of how they work.

Anyway, they’re extremly simple.  Basically you set up a base URL (http://purl.org/NET/foo/) and anything requested past that base URL (e.g. http://purl.org/NET/foo/bar) will be redirected to the PURL endpoint verbatim.

So, I set up a partial redirect PURL at the base:  http://purl.org/NET/sudoc/

The expectation that it would be followed by a properly URL escaped SuDoc:  E 2.11/3:EL 2 becomes http://purl.org/NET/sudoc/E%202.11/3:EL%202 which then tacks that SuDoc onto http://catalog.gpo.gov/F/?func=find-c&ccl_term=GVD%3D and redirects you to http://catalog.gpo.gov/F/?func=find-c&ccl_term=GVD%3DE%202.11/3:EL%202.

What you have then is a unique identifier for a SuDoc that resolves to a human readable representation of what the URI stands for.  If the GPO changes OPACs or Ex Libris changes Aleph’s URL scheme or the GPO comes up with a better representation of SuDoc, it doesn’t matter as long as the actual SuDoc class number can be used to redirect the user to the new location.

Obviously, there’s an expectation here that PURLs remain indefinitely and that purl.org is never lost to a third party that repurposes it for other uses.  However, there are major parts of the web that rely on purl.org, so there are a lot of people that would fight to not see this happen.

Basically, I think these are the sorts of simple solutions that I feel we should be using to solve these sorts of problems on the web.  We are no longer the center of the information universe and it’s time that we accepted that and begin to use the tools that the rest of the world is using to solve the same problems that everybody else is dealing with.

How many other ‘identifiers’ could be mint persistent, dereferenceable URIs this way?  I look forward to finding out.

By the way, there are currently three possible points of failure for this URI scheme:  purl.org, GPO and me.  I would prefer not to be a single point of failure, so if you would like to be added as a maintainer to this PURL, please let me know and I would be happy to add you.