Navigation

Saturday, March 20, 2010

Accessing DBpedia's SPARQL endpoint with Sesame

I'd like to share a tip on using Sesame as a client for SPARQL endpoints. This may be rather trivial to some but perhaps new to others.

We are developing a tool called ROC (Rapid Ontology Construction) (see our paper@ASWC'08), which is a tool that allows domain experts to quickly build a basic vocabulary for their domain, re-using existing terminology whenever possible. How this works is that the ROC tool asks the domain expert for a set of keywords that are 'core' terms of the domain, and then queries remote sources for concepts matching those terms. These are then presented to the user, who can select terms from the list, find relations to other terms, and expand the set of terms and relations, iteratively. The resulting vocabulary (or 'proto-ontology', basically a SKOS-like thesaurus) can be used as is, or can be used as input for a knowledge engineer to base a more comprehensive domain ontology on.

ROC is developed on top of Sesame, and up until now we simply supplied the tool with 'remote sources' by adding data to a locally running Sesame repository. In order to act with the LOD cloud, we obviously needed something a bit less awkward, so I started looking into ways to extend the functionality to be able to query linked open data.

Fortunately, I didn't have to look far, because Sesame actually already supports this: Sesame's client server protocol is a superset the SPARQL protocol. This I already knew, but what I hadn't yet tried was to see if that meant you could use Sesame's client libraries to query any SPARQL endpoint (instead of just connect to a remote Sesame server, which is what it is primarily designed for, after all). And guess what, it turns out that you can!

Here's a bit of code that connects to DBPedia's SPARQL endpoint and fires a query. The idea is simply to reuse Sesame HTTPRepository class, supply the endpoint URL as the server, and specify no repository:
String endpointURL = "http://dbpedia.org/sparql";
HTTPRepository dbpediaEndpoint = 
         new HTTPRepository(endpointURL, "");
dbpediaEndpoint.initialize();

RepositoryConnection conn = 
         dbpediaEndpoint.getConnection();
try {
  String sparqlQuery = 
         " SELECT * WHERE {?X ?P ?Y} LIMIT 10 ";
  TupleQuery query = conn.prepareTupleQuery(SPARQL, query);
  TupleQueryResult result = query.evaluate();

  while (result.hasNext()) {
      ... // do something linked and open
  }
}
finally {
  conn.close();
}
Now, I'm perhaps easily impressed, but to me this was beautifully easy. This makes integrating arbitrary linked open data in most of our Sesame-based tooling (including ROC) completely painless.