For a long time, I was massively confused about what the Platform was or did.  Months after I started at Talis I was still fairly unclear of what the Platform actually did.  I’ve now got my head around it, use it, and have a pretty good understanding of why and how it’s useful, but I fully realize that a lot of people (and by people I’m really referring to library people) don’t and don’t really care to learn.

What they want is Solr.  Actually, no, what they want is a magical turnkey system that takes their crappy OPAC (or whatever) data and transmogrifies it into a modern, of-the-web type discovery system.  What is powering that discovery system is mostly irrelevant if it behaves halfway decently and is pretty easy to get up and running for a proof-of-concept.  These two points, of course, are why Solr is so damned popular; to say that it meets those criteria is a massive understatement.  The front-end of that Solr index is another story entirely, but Solr itself is a piece of cake.

Almost from the time I started at Talis I have thought that a Solr-clone API for the Platform would make sense.  Although the Platform doesn’t have all of the functionality of Solr, it has several of the sexy bits (Lucene syntax and faceting, for example) and if it had some way to respond to an out of the box Solr client, it seemed to me that it would make it a lot easier to turn an off-the-shelf Solr powered application (a la VuFind or Blacklight) into a Platform powered, RDF/linked data application with minimal customization.  It’s not Solr and in many ways is quite different than Solr — but if it can exploit its similarities with Solr enough to leverage the pretty awesome client base that Solr has, it’ll make it easier to open the door for things the Platform is good at.  Alternately, if the search capabilities of the Platform become too limited compared to Solr, the data is open — just index it in Solr.  Theoretically, if the API is a Solr-clone, you should be able to point your application at either.

The proof-of-concept project I’m working on right now is basically a reënvisioned Communicat:  a combination discovery interface; personal and group resource collection aggregator; resource-list content management system (for course reserves, say, or subject/course guides, etc.);  and “discovered” resources (articles, books, etc.) cache and recommendation service.  None of these would be terribly sophisticated at a first pass, I’m just trying to get (and show) a clearer understanding of how a Communicat might work.  As such, I’m trying to do as little development from the ground up as I can get away with.

I’ll go into more detail later as it starts to get fleshed out some, but for the discovery and presentation piece, I plan on using Blacklight.  Of the OSS discovery interfaces, it’s the most versatile for the wide variety of resources I would hope to be in a Communicat-like system.  It’s also Ruby, so I feel the most comfortable hacking away at it.  It also meant I needed the aforementioned Solr-like API for the Platform, so I hastily cobbled together something using Pho and Sinatra.  I’m calling it pret-a-porter, and the sources are available on Github.

You can see it in action here.  The first part of the path corresponds with whatever Platform store you want to search.  The only “Response Writers” available are Ruby and JSON (I’ll add an XML response as soon as I can — I just needed Ruby for Blacklight and JSON came basically for free along with it).  It’s incredibly naive and rough at this point, but it’s a start.  Most importantly, I have Blacklight working against it.  Here’s Blacklight running off of a Prism 3 store.  It took a little bit of customization of Blacklight to make this work, but it would still be interchangeable with a Solr index (assuming you were still planning on using the Platform for your data storage).  When I say a “little bit”, I mean very little.  Both pieces (pret-a-porter and the Blacklight implementation) took less than three days total to get running.

If only the rest of the Communicat could come together that quickly!

Something I’ve taken it upon myself to do since I joined Talis is make ActiveRDF a viable client to access the Platform.  While this is mostly selfishness on my part (I want to keep developing in Ruby and there’s basically no RDF support right now, plus this gives me a chance to learn about the RDF/SPARQL-y aspects of the Platform), I also think that libraries like this can only help democratize the Platform.

So far, it’s been pretty ugly.  I haven’t had much time to work on it, granted, but the time I’ve spent on it has made me think that there will be a lot of work to do.  Couple this with some of the things that make the Platform difficult to work with in Ruby anyway (read: Digest Authentication) and this might be a more uphill battle than I’ll ever have time for, but I figure it’s either this or go back to Python and I’m not quite ready to give up on Ruby yet.

Currently, performance is abysmal with ActiveRDF against the Platform, so I’ll need to think of shortcuts to improve that (I’m not even considering write access presently).  Here’s some code (this is as much for my benefit, so I can remember what I’ve done) to work with Ian Davis’ Quotations Book Example store:

require ‘time’ # Otherwise ActiveRDF starts freaking out about DateTime
require ‘active_rdf’

$activerdf_without_xsdtype = true
# less than ideal, but without it, ActiveRDF sends
# ^^<> with string literals even if you don’t want
# to send the datatype.  I haven’t actually tried it with other datatypes to see how this breaks
# down the road.

ConnectionPool.set_data_source(:type => :sparql, :results => :sparql_xml, :engine=>:joseki,  :url=> “”)

Namespace.register :foaf, “”
Namespace.register :dc, “”
Namespace.register :quote, “”

QUOTE::Quotations.find_by_dc::creator(“Loren, Sophia”).each do | quote |

# print the important stuff from each graph

# has to be manually added as a predicate
# the “#” seems to cause problems
quote.add_predicate(:quote, QUOTE::quote)
puts quote.quote
puts quote.subject
puts quote.rights
puts quote.isPrimaryTopicOf


If you actually try to execute this, you’ll see that it takes a long time to run (God help you if you try it on QUOTE::Quotations.find_by_dc::subject(“Age and Aging”)).  A really long time.

If you set some environment vars before you go into irb:

$ export ACTIVE_RDF_LOG=./activerdf.log

then you can tail -f activerdf.log and see what exactly is happening.

After ActiveRDF does it’s initial SPARQL query (SELECT DISTINCT ?s WHERE { ?s <> “Loren, Sophia” . }), it’s doing two things for every request in the block:

  1. a SPARQL query for every predicate associated with the URI (
  2. a SPARQL query for the value of the attribute (predicate):

for every predicate in the graph.  You can imagine how crazily inefficient this is, since to get every value for a resource, you have to make a different HTTP request for each one.

Obviously this would be a lot easier if it used DESCRIBE rather than SELECT, but without a real RDF library to parse the resulting graph, I’m not sure how ActiveRDF would deal with what the triple store returned.

So, anyway, these are some of the hurdles in making ActiveRDF work with the Platform, but I’m not quite ready to throw in the towel, yet.