Archive

Monthly Archives: April 2008

Something I’ve taken it upon myself to do since I joined Talis is make ActiveRDF a viable client to access the Platform.  While this is mostly selfishness on my part (I want to keep developing in Ruby and there’s basically no RDF support right now, plus this gives me a chance to learn about the RDF/SPARQL-y aspects of the Platform), I also think that libraries like this can only help democratize the Platform.

So far, it’s been pretty ugly.  I haven’t had much time to work on it, granted, but the time I’ve spent on it has made me think that there will be a lot of work to do.  Couple this with some of the things that make the Platform difficult to work with in Ruby anyway (read: Digest Authentication) and this might be a more uphill battle than I’ll ever have time for, but I figure it’s either this or go back to Python and I’m not quite ready to give up on Ruby yet.

Currently, performance is abysmal with ActiveRDF against the Platform, so I’ll need to think of shortcuts to improve that (I’m not even considering write access presently).  Here’s some code (this is as much for my benefit, so I can remember what I’ve done) to work with Ian Davis’ Quotations Book Example store:

require ‘time’ # Otherwise ActiveRDF starts freaking out about DateTime
require ‘active_rdf’

$activerdf_without_xsdtype = true
# less than ideal, but without it, ActiveRDF sends
# ^^<http://www.w3.org/2001/XMLSchema#string> with string literals even if you don’t want
# to send the datatype.  I haven’t actually tried it with other datatypes to see how this breaks
# down the road.

ConnectionPool.set_data_source(:type => :sparql, :results => :sparql_xml, :engine=>:joseki,  :url=> “http://api.talis.com/stores/iand-dev2/services/sparql”)

Namespace.register :foaf, “http://xmlns.com/foaf/0.1/”
Namespace.register :dc, “http://purl.org/dc/elements/1.1/”
Namespace.register :quote, “http://purl.org/vocab/quotation/schema”

QUOTE::Quotations.find_by_dc::creator(“Loren, Sophia”).each do | quote |

# print the important stuff from each graph

# http://purl.org/vocab/quotation/schema#quote has to be manually added as a predicate
# the “#” seems to cause problems
quote.add_predicate(:quote, QUOTE::quote)
puts quote.quote
puts quote.subject
puts quote.rights
puts quote.isPrimaryTopicOf

end

If you actually try to execute this, you’ll see that it takes a long time to run (God help you if you try it on QUOTE::Quotations.find_by_dc::subject(“Age and Aging”)).  A really long time.

If you set some environment vars before you go into irb:

$ export ACTIVE_RDF_LOG_LEVEL=0
$ export ACTIVE_RDF_LOG=./activerdf.log

then you can tail -f activerdf.log and see what exactly is happening.

After ActiveRDF does it’s initial SPARQL query (SELECT DISTINCT ?s WHERE { ?s <http://purl.org/dc/elements/1.1/creator> “Loren, Sophia” . }), it’s doing two things for every request in the block:

  1. a SPARQL query for every predicate associated with the URI (http://api.talis.com/stores/iand-dev2/services/sparql?query=SELECT+DISTINCT+%3Fp+WHERE+%7B+%3Chttp%3A%2F%2Fapi.talis.com%2Fstores%2Fiand-dev2%2Fitems%2F1187139384317%3E+%3Fp+%3Fo+.+%7D+)
  2. a SPARQL query for the value of the attribute (predicate):  http://api.talis.com/stores/iand-dev2/services/sparql?query=SELECT+DISTINCT+%3Fo+WHERE+%7B+%3Chttp%3A%2F%2Fapi.talis.com%2Fstores%2Fiand-dev2%2Fitems%2F1187139384317%3E+%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Felements%2F1.1%2Fcreator%3E+%3Fo+.+%7D

for every predicate in the graph.  You can imagine how crazily inefficient this is, since to get every value for a resource, you have to make a different HTTP request for each one.

Obviously this would be a lot easier if it used DESCRIBE rather than SELECT, but without a real RDF library to parse the resulting graph, I’m not sure how ActiveRDF would deal with what the triple store returned.

So, anyway, these are some of the hurdles in making ActiveRDF work with the Platform, but I’m not quite ready to throw in the towel, yet.

After several months of trying, Jangle.org is finally starting to take off.  I set up a Drupal instance yesterday on our new web host.

When I was still at Georgia Tech, one of the things I was trying to work on was a framework to consistently and easily expose the library’s data from its various silos into external services. In that case, my initial focus was the Sakai implementation that we were rolling into production, but the intention was to make it as generic as possible (i.e. the opposite of a “Blackboard Building Block“) so it could be consumed and reconstituted into as many applications as we wanted.

Coincidentally (and, for me, conveniently), Talis was also thinking about such a framework that would supply a generic SOA layer to libraries (and potentially beyond) and contacted me about possibly collaborating with them on it as an open source project. Obviously that relationship changed a bit when they hired me and they put me and my colleague Elliot Smith (reports of his demise have been greatly exaggerated) in charge of trying to get this project off the ground. Thankfully, Elliot is the other Talis malcontent who prefers Ruby, so our early prototypes all focused on Rails (the Java that originally seeded the project, like all Java, made my eyes glaze over).

We had a hard time getting anywhere at first. Not even taking into consideration the fact that he and I were an ocean apart, we really had no idea what it was that we should be building or why it would be useful to Talis (after all, they are paying the bills) since they already have an SOA product, Keystone. Also, we didn’t want to recreate Apache Synapse or Kuali Rice. In essence, we were trying to find a solution to a problem we hadn’t really defined, yet.

In December and early January, I drove across town for a couple of meetings with Mike Rylander, Bill Erickson and Jason Etheridge from Equinox to try to generate interest in Jangle and, at the same time, solicit ideas from them as to what this project should look like and do. Thankfully, they gave me both.

Jangle still foundered a bit through February. We were waiting for the DLF’s ILS and Discovery Systems API recommendation to come out (since we had targeted that as goal) and Elliot produced a prototype in JRuby (we had long abandoned Rails for this) that effectively consumed the Java classes used for Keystone and rewrote them for Jangle.  The problem we were still facing, though, is that we were, effectively, just creating another niche library interface from scratch and there were too many possible avenues to take to accomplish that.  Our freedom was paralyzing us.

I gave a lightning talk on Jangle at Code4lib2008 that was big on rah-rah rhetoric (free your data!) and short on details (since we hadn’t really come up with any yet) that generated some interest and a few more subscriptions to our Google Group.  A week later, the DLF met with the vendors to talk about their recommendation.   I attended by phone.  While in many ways I feel the meeting was a wash, it did help define for me what Jangle needed to do.

At the end of my first meeting with Equinox, Mike Rylander asked me if we had considered supporting the Atom Publishing Protocol in Jangle.  At the time, I hadn’t.  In fact, I didn’t until I sat on the phone for 8 hours listening to the vendors hem and haw over the DLF’s recommendation.  The more I sat there (with my ear getting sore), the more I realized that AtomPub might be a good constraint to get things moving (as well as useful to appealing to non-library developers).

We are just now trying to start building how this spec might work.  Basically there are two parts.  First, the Jangle “core” which is an AtomPub interface to external clients.  It’s at this level that we need to model how library resources map to Atom (and other common web data structures, like vCard) and where we need to extend Atom to include data like MARC (when necessary).  The Jangle core also proxies these requests to the service “connectors” and translates their responses back to the AtomPub client.  The connectors are service specific applications that takes the specific schema and values in, say, a particular ILS’s RDBMS and puts them in a more syntax to send back to the Jangle core.  Right now, the proposal is that all communication between the core and connectors would be JSON over HTTP (again, to help forward momentum).

So at this point you may be asking why AtomPub rather than implementing the recommendations of the DLF directly?  The recommendation assumes the vendors will be complicit, uniform and timely in implementing their API and I cynically feel that is unrealistic.  I also think it helps to get a common, consistent interface to help build interoperability (like the kind that the DLF group is advocating), since then you’d only have to write one, say, NCIP adapter and it would work for all services that have a Jangle connector.  Also, by leveraging non-library technologies, it opens up our data to groups outside our walls.

So, if you’re interested in freeing your data (rah-rah!), come help us build this spec.  We’re trying to conform to the Rogue ’05 specification that Dan Chudnov came up with for development of this so, while it will still be a painful process, it won’t be painful and long. 🙂  In other words, this ain’t NISO.