Commoditizing the Stack

I had the opportunity to attend and present at the excellent ELAG conference last week in Bratislava, Slovakia.  The event was advertised as being somewhat of a European Code4Lib, but in reality, the format seemed to me to be more in line with Access, which in my mind is a plus.

Being the ugly American that I am, I made a series of provocative statements both in my presentation and in the Twitter “back channel” (or whatever they call hash tagging an event) about vendors, library standards, and a seeming disdain for both.  I feel like I should probably clarify my position here a bit, since Twitter is a terrible medium for in-depth communication and I didn’t go into much detail in my presentation (outside of saying vendor development teams were populated by scalliwags and ne’er-do-wells from previous gigs in finance, communications and publishing).

Here was my point I was angling towards in my presentation:  your Z39.50 implementation is never going to get any better than it was in 2001.  Outside of critical bug fixes, I would wager the Z39.50 implementation has not even been touched since it was introduced, never mind improved.  The reason for this is my above “joke” about the development teams being staffed by people that do not have a library background.  They are literally just ignoring the Z-server and praying that nothing breaks in unit and regression testing.  There are only a handful of people that understand how Z39.50 works and they are all employed by IndexData.  For everybody else, it’s just voodoo that was there when they got here, but is a requirement for each patch and release.

Thing is, even as hardware gets faster, and ILSes (theoretically) get more sophisticated, the Z-server just gets worse.  You would think that if this is the most common and consistent mechanism to get data out of ILSes that we would have seen some improvement in implementations as the need for better interoperability increases, but this is just not a reality that I have witnessed.  With the last two ILSes that I primarily worked with (Voyager and Unicorn) I would routinely, accidentally, completely bring down due to trying to use the Z39.50 server as a data source in applications.  For the Umlaut, I had to export the Voyager bib database into an external Zebra index to prevent the ILS from crashing multiple times a day just to look up incoming OpenURL requests.  Let me note that a vast majority of these lookups were just ISSN or ISBN.  Unsurprisingly, the Zebra index held up with no problems.  It’s still working, in fact.

Talis uses Zebra for Alto.  It’s probably the main reason we can check off “SRU Support” in an RFP when practically nobody else can.  But, again, this means the Z/SRU-server is sort of “outside” the development plan, delegated to IndexData.  Our SRU servers technically aren’t even conformant to the spec, since we don’t serve explain documents.  I’m not sure anybody at Talis even was aware of this until I pointed it out last year.

All of this is not intended to demonize vendors (really!) or bite the hand that feeds me.  It’s also not intended to denigrate library standards.  I’m merely trying to be pragmatic and, more importantly, I’m hoping we can make library development a less frustrating and backwards exercise for all parties (even the cads and scalliwags).

My point is that initiatives like the DLF ILS-DI, on paper, make a lot of sense.  I completely understand why they chose to implement their model using a handful of library standards (OAI-PMH, SRU).  The standards are there, why not use them?  The problem is in the reality of the situation.  If the specification “requires” SRU for search, how many vendors do you think will just slap Yaz Proxy in front of their existing (shaky, flaky) Z39.50 server and call it a day?  The OAI-PMH provider should be pretty trivial, but I would not expect any company to provide anything innovative with regards to sets or different metadata formats.

As long as libraries are not going to be writing the software they use themselves, they need to reconcile the fact that suppliers of their software is more than likely not going to be written by librarians or library technologists.  If this is the case, what’s the better alternative?  Clinging to half-assed implementations of our incredibly niche standards?  Or figuring out what technologies are developing outside of the library realm that could be used to deliver our data and services?  Is there really, honestly, no way we could figure out how to use OpenSearch to do the things we expect SRU to do?

I realize I have an axe to grind here, but this isn’t really about Jangle.

I have seen OpenURL bandied about as a “solution” to problems outside of its current primary use of “retrieving context based services from scholarly citations” (I know this is not what OpenURL’s sole use case is, but it’s all it’s being used for.  Period).  The most recent example of this was in a workshop (that I didn’t participate in) at ELAG about how libraries could share social data, such as tagging, reviews, etc. in order to create the economies of scale needed to make these concepts work satisfactorily.  Since they needed a way to “identify” things in their collection (books, journals, articles, maps, etc.) somebody had the (understandable, re: DLF) idea to use OpenURL as the identifier mechanism.

I realize that I have been accused of being “allergic” to OpenURL, but in general, my advice is that if you have a problem and you think OpenURL is the answer to said problem there’s actually probably a simpler and better answer to this if you approach it from outside of a library POV.

The drawbacks of Z39.88 for this scenario are numerous, but I didn’t go into details with my criticisms in Twitter.  Here are a few reasons why I would recommend away from OpenURL for this (and they are not exclusive to this potential application):

  1. OpenURL context objects are not identifiers.  They are a means to describe a resource, not identify it.  A context object may contain an identifier in its description.  Use that, scrap the rest of it.
  2. Because a context object is a description and not an identifier, it would have to be parsed to try to figure out what exactly it is describing.  This is incredibly expensive, error prone and more sophisticated than necessary.
  3. It was not entirely clear how the context objects would be used in this scenario.  Would they just be embedded in, say, an XML document as a clue as to what is being tagged or reviewed?  Or would the consuming service actually be an OpenURL resolver that took these context objects and returned some sort of response?  If it’s the former, what would the base URI be?  If it’s the latter… well, there’s a lot there, but let’s start simple, what sort of response would it return?
  4. There is no current infrastructure defined in OpenURL for these sorts of requests.  While there are metadata formats that could handle journals, articles, books, etc., it seems as though this would just scratch the surface of what would need context objects (music, maps, archival collections, films, etc.).  There are no ‘service types’ defined for this kind of usage (tags, reviews, etc.). The process for adding metadata formats or community profiles is not nimble, which would make it prohibitively difficult to add new functionality when the need arises.
  5. Such an initiative would have to expect to interoperate with non-library sources.  Libraries, even banding together, are not going to have the scale or attraction of LibraryThing, Freebase, IMDB, Amazon, etc.  It is not unreasonable to say that an expectation that any of these services would really adopt OpenURL to share data is naive and a waste of time and energy.
  6. There’s already a way to share this data, called SIOC.  What we should be working towards, rather than pursuing OpenURL, is designing a URI structure for these sorts of resources in a service like this.  Hell, I could even be talked into info URIs over OpenURLs for this.

We could further isolate ourselves by insisting on using our standards.  Navel gaze, keep the data consistent and standard.  To me, however, it makes more sense to figure out how to bridge this gap.  After all, the real prize here is to be able to augment our highly structured metadata with the messy, unstructured web.  A web that isn’t going to fiddle around with OpenURL.  Or Z39.50.  Or NCIP.  I have a feeling the same is ultimately true with our vendors.

There comes a point that we have to ask if our relentless commitment to library-specific standards (in cases when there are viable alternatives) is actually causing more harm than help.

11 comments
  1. robert forkel said:

    absolutely agree. it’s funny how deep you can go into the rather small niche that’s library standards. but having come to web development from the academic/library angle, i actually met oai-pmh before actually realizing the potential of http.

    i think part of the problem is, that library software often still doesn’t really live in the web. so there’s always a good reason to ignore web-standard xyz, from which point on it’s just easy to ignore them altogether.

  2. Peter van Boheemen said:

    I was in the workshop and I do agree that OpenURL is describing a resource and is not an identifier. And it is true that despite Herbert van der Sompel’s efforts to write OpenURL 1.0 OpenURL is in fact used in the library world only. You should provide a clear identifier like ISSN, ISBN or even DOI (who is using that except for commercial publishers of scientific journals), but lots of stuff does not have these identifiers and you will be left with descriptive information only and then you would have to admit that the only standard used is OpenURL. Or can you name the ‘web standard we are easily ignoring’, please let us know !!

  3. robert forkel said:

    @peter: you are right. there isn’t a suitable web standard for every library problem. and i even admit that oai-pmh was a good idea considering that ATOM wasn’t much of an option when it was invented. but once the web comes up with a competing standard (e.g. OpenSearch vs. SRU), i think there’s not much to be gained by clinging to our own standards. the library community just doesn’t have the pull – i think.

    as a disclaimer i have to add, that i don’t really work in the ILS part of digital libraries, rather in the publishing world.

  4. nicomo said:

    The question I have around these issues is : is it a technical issue at all? For instance, regardless of it’s technical value, the ILS-DI people managed to get some vendors on board from the very beginning. Koha will comply with ILS-DI soon, as will Aleph (I think). Others may follow.
    What would it take for jangle to gain acceptance?
    Isn’t this also largely a matter of “politics” (in a very general sense of the word)?

    Besides, I don’t think the ILS-DI document really has the very strong, required bind you mention with standards like SRU or OAI-PMH. For instance, the function for HarvestBibliographicRecords recommands an OAI-PMH binding, but allows for others.
    Similarly, GetRecords mentions SRU/W, OAI-PMH and “Web Service Call” as *possible* bindings. The “Search” service mentions OpenSearch. Etc.

    It’s unclear to me still, but on paper at least, I’m wondering : AtomPub could be the binding? OpenSearch certainly is possible in both jangle and ils-di.

    I really wonder, and I have not done my homework about this so it’s a genuine question I ask : couldn’t jangle more or less comply with the ils-di document? I’m unclear about the opposition you seem to imply between the two.

  5. till said:

    Your remarks about OpenURL to solve one problem in our workshop are right (I was sceptical at first, too), but they miss one point: We have no identifier suitable for identifying all ressources that are connected to social network data. I felt the workshop preferred kind of a “pragmatic approach” and so we came to the point that describing a ressource instead of indentifying it may be better. You can imagine discussions about identification of ressources in a library environment (many different content models in practice, FRBR, …).
    I wouldn’t take the results of this workshop as carved into rocks (as we say in Germany), so it’s good to have discussions like this. And we must take a look at SIOC (I didn’t know this until Jakob pointed me to it, but I admit, I really haven’t bothered about all this before, and I confess that it might have been a mistake to join a workshop on this topic totally unprepared, but when you know how “workshops” at german conferences work, you don’t bother to prepare anymore…, better next year, I promise).
    Another point that I still feel is very important: Social network data is connected to social beings. After a discussion on that the workshop decided not to consider that, because of “pragmatic reasons” (further complexity, no idea how “living systems” could handle that data …, correct me if I’m wrong). I know, I tend to annoy my workshop fellows with that, but after rethinking it again, I still feel it is very important to consider the social beings…

    But back to the point: Do we need identification or description of ressources? And if either, how to do it?

  6. Ross said:

    I understand the point you’re trying to make about the identifiers. What I’m trying to say, however, is that OpenURL is still not going to really cut it. There is too much that you currently cannot describe in OpenURL (maps, movies, music, collections, realia, subjects, math equations, data sets, etc.) unless you use a more ambiguous metadata format, such as Dublin Core (or explicitly with MarcXML).

    And if you’re going to use Dublin Core (or MarcXML), why not just use Dublin Core (and leave out the massive overhead of OpenURL)?

    Instead, as I tried to point out above, it would be a much more beneficial use of time and energy to agree on some rules for URIs — say LC permalinks or Worldcat URLs (or even Amazon ids) and for things and if they don’t appear there, then try x, y and z.

    These same sorts of rules would have to be drafted for an OpenURL approach, anyway, so why not lower the bar and make this an option for both library and non-library sources?

  7. Ross said:

    Nicolas, I largely agree that there are mainly politics involved. Also the libraries themselves are partially to blame for not forcing the vendors to comply with these sorts of standards better.

    I think it’s great that Koha can claim compliance (I assume we’re talking Koha 3?). It seems like all it would require for Koha is the OAI-PMH provider (since you have SRU via Zebra), which is trivial.

    I didn’t know about Aleph.

    Jangle meets general ‘requirements’ of their abstract API (where there are no specific bindings or responses, really), but takes a little of bit of work to meet the Berkeley Accord, due to Jangle’s separation of interests with regard to ‘resources’ and ‘items’. The ‘HarvestExpandedRecords’ functionality is subsequently a more expensive operation than it would be for a system that didn’t differentiate these kinds of data.

    The same functionality can be achieved via Jangle, you would just take a somewhat different approach than bundling all these things as one ‘record’.

    Ultimately, though, you’re absolutely right. The general ambiguity around the original ‘spec’ could lead me to claim that Jangle was ILS-DI compliant. Ex Libris could probably claim the same with their X-Services. Neither of these meet the Berkeley Accord.

    If ExL can claim the first working Berkeley Accord compliant “legacy” ILS, that will go a long way to prompting the competition to follow suit.

  8. lbjay said:

    “vendor development teams were populated by scalliwags and ne’er-do-wells from previous gigs in finance, communications and publishing”

    Hey now, Ross. I may have been a right scallywag, but I assure you I did quite well at it.

  9. nicomo said:

    Ross: about compliance: yes, we’re talking about Koha 3. Even: about BibLibre’s own dev branch. It’s not been rolled out in the community’s HEAD branch, let alone in the stable version yet.

    As for the OAI-PMH server : it’s there already, in stable.

    About Aleph : I said I “think”. Only hearsay.

    About the Berkeley thing : I’m surprised it actually makes OAI-PMH a requirement (“Both full and differential harvesting options are expected to be supported through an OAI-PMH interface”), while the ils-di document itself does not (OAI-PMH is only recommended iirc).
    And didn’t Talis sign up to it?

  10. Ross said:

    Talis did, yup. I was personally the delegate that committed us.

    Technically, this was met via the Alto Jangle connector, since the ILS-DI adapter can handle the OAI-PMH and availability lookups.

    Practically, that’s not a real answer or commitment, but I do think that by providing a Jangle interface it opens the door to actually making good on our promise and to provide a means to go beyond the Berkeley Accord (which was intended to be step 1, but now might be the alpha and the omega).

  11. Hi Ross

    Do you know if there are any docs/info for installing Jangle on to a Talis Alto server?

    (I wonder if there is an argument that it should be included in a future Alto upgrade so that OAI-PMH etc comes ‘out of the box’?)

    Cheers
    Chris Keene

Leave a Reply

Your email address will not be published. Required fields are marked *