http://people.csail.mit.edu/pcm/tempISWC/workshops/SWPM2010/InvitedPaper_6.pdf
http://purl.org/net/biordfmicroarray/demo
(Note: all resources for Eric’s slides at Sample Queries: http://www.w3.org/2010/Talks/1208-egp-swobjects/goProt.zip . The cookbook queries do not correspond to the queries in the above zip file and were originally on the paper handout. File names in this document have now been changed to avoid name clash with zip file.)
Courtesy Nigam Shah (NCBO), excerpted from e-mail communications
To connect to UCSC database(s) from the command line:
mysql --user=genome --host=genome-mysql.cse.ucsc.edu
Add a parameter to select the database and execute a query from the command line:
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A <dbname> -e '<myselectstatement>’'
For human P53, (uniprot id P04637).
mysql> select * from uniprot.gene where acc ='P04637'
mysql --user=genome --host=genome-mysql.cse.ucsc.edu -A uniProt -e 'select * from gene where acc ="P04637"'
you will get two rows back. Then do
mysql> select * from gene_product where symbol in ('TP53', 'P53');
you will get three rows back. For each of the ids (column 1), look up the GO annotation.
mysql> use go
Database changed
mysql> select * from association where gene_product_id in (17440, 3431471, 3586773)
Putting it all together in one query:
select uniProt.gene.val from uniProt.gene, go.gene_product, go.association
where uniProt.gene.acc ='P04637'
and go.gene_product.symbol = uniProt.gene.val
and go.gene_product.id = go.association.gene_product_id
Or get more human readable output:
select uniProt.gene.val, go.association.term_id, go.term.name from uniProt.gene, go.gene_product, go.association, go.term
where uniProt.gene.acc ='P04637'
and go.gene_product.symbol = uniProt.gene.val
and go.gene_product.id = go.association.gene_product_id
and go.association.term_id = go.term.id
So we just looked up annotation for all genes (in multiple species) that are named similar to the human protein identified by the uniprot id P04637. You will get 158 results .. and can make the output more interesting by adding the species names etc columns by doing the relevant joins to the species table in the "go" database.
cmd> sparql --debug 1 --stem http://ucsc.example/uniProt/ -S mysql://genome@genome-mysql.cse.ucsc.edu/uniProt --serve http://localhost:8001/uniProt
cmd> sparql --debug 1 --stem http://ucsc.example/go/ -S mysql://genome@genome-mysql.cse.ucsc.edu/go --serve http://localhost:8003/go
goProt3.rq:
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX uniprot: <http://purl.uniprot.org/core/>
PREFIX go: <http://www.geneontology.org/dtd/go.dtd>
PREFIX gene: <http://yetanothergenevocabulary.org#>
SELECT ?gene_symbol ?goterm
{
_:gene uniprot:id 'P04637' ;
skos:prefLabel ?gene_symbol .
?go_product gene:symbol ?gene_symbol .
?go_id gene:product ?go_product .
?go_id go:term ?goterm_id .
?goterm_id rdfs:label ?goterm .
}
goProt3.map:
# Common RDF vocabularies:
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
# Uniprot and GO:
PREFIX uniprot: <http://purl.uniprot.org/core/>
PREFIX go: <http://www.geneontology.org/dtd/go.dtd>
PREFIX fn: <http://www.w3.org/2005/xpath-functions#>
PREFIX gene: <http://yetanothergenevocabulary.org#>
# Direct graph tables:
PREFIX Ugene: <http://ucsc.example/uniProt/gene#>
PREFIX gene_product: <http://ucsc.example/go/gene_product#>
PREFIX association: <http://ucsc.example/go/association#>
PREFIX term: <http://ucsc.example/go/term#>
<uniProt> CONSTRUCT
{
_:gene uniprot:id ?id ; skos:prefLabel ?gene_symbol
}
WHERE
{
SELECT (fn:lower-case(?u_gene_symbol) AS ?gene_symbol)
{
SERVICE <http://localhost:8001/uniProt>
{
_:gene Ugene:acc ?id ; Ugene:val ?u_gene_symbol
}
}
}
<go> CONSTRUCT
{
?go_product gene:symbol ?gene_symbol .
?go_id gene:product ?go_product .
?go_id go:term ?goterm_id .
?goterm_id rdfs:label ?goterm .
}
WHERE
{
SERVICE <http://localhost:8003/go>
{
?gp gene_product:Symbol ?gene_symbol .
?association association:gene_product_id ?gp .
?association association:term_id ?t .
?t term:name ?goterm
}
}
Create above files and execute:
sparql -m goProt3.map goProt3.rq
@@ introduction to shared names @@
Now use the tool to create *computable* shared names;
create goProt2.rq and goProt2.map with the following contents:
goProt4.rq:
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX uniprot: <http://purl.uniprot.org/core/>
PREFIX go: <http://www.geneontology.org/dtd/go.dtd>
PREFIX gene: <http://yetanothergenevocabulary.org#>
SELECT ?symbol ?label
{
<http://www.uniprot.org/uniprot/P04637>
skos:prefLabel ?symbol .
?product gene:symbol ?symbol .
?id gene:product ?product .
?id go:term ?goterm .
?goterm rdfs:label ?label .
}
goProt4.map:
# Common RDF vocabularies:
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
# Uniprot and GO:
PREFIX uniprot: <http://purl.uniprot.org/core/>
PREFIX unigene: <http://www.uniprot.org/uniprot/>
PREFIX go: <http://www.geneontology.org/dtd/go.dtd>
PREFIX fn: <http://www.w3.org/2005/xpath-functions#>
PREFIX gene: <http://yetanothergenevocabulary.org#>
# Direct graph tables:
PREFIX Ugene: <http://ucsc.example/uniProt/gene#>
PREFIX gene_product: <http://ucsc.example/go/gene_product#>
PREFIX association: <http://ucsc.example/go/association#>
PREFIX term: <http://ucsc.example/go/term#>
<uniProt> CONSTRUCT
{
?gene uniprot:id ?id ; skos:prefLabel ?gene_symbol
}
WHERE
{
SELECT (fn:concat(unigene:, ?id) AS ?gene)
(fn:lower-case(?u_gene_symbol) AS ?gene_symbol)
{
SERVICE <http://localhost:8001/uniProt>
{
_:gene Ugene:acc ?id ; Ugene:val ?u_gene_symbol
}
}
}
<go> CONSTRUCT
{
?go_product gene:symbol ?gene_symbol .
?go_id gene:product ?go_product .
?go_id go:term ?goterm_id .
?goterm_id rdfs:label ?goterm .
}
WHERE
{
SERVICE <http://localhost:8003/go>
{
?gp gene_product:Symbol ?gene_symbol .
?association association:gene_product_id ?gp .
?association association:term_id ?t .
?t term:name ?goterm
}
}
Create above files and execute:
sparql -m goProt4.map goProt4.rq
The full query
PREFIX diseasome: <http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseasome/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX stat: <http://purl.org/net/biordfmicroarray/stat#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX biordf: <http://purl.org/net/biordfmicroarray/ns#>
PREFIX neurolex: <http://neurolex.org/wiki/Category:>
PREFIX doid: <http://purl.org/obo/owl/DOID#>
SELECT DISTINCT ?diseaseName ?geneLabel ?geneName WHERE {
#Retrieve a list of overexpressed genes in the entorhinal cortex of AD patients
{
?sampleList biordf:patients_have_disease ?alzheimers .
FILTER (?alzheimers = doid:DOID_10652 )
?experimentSet dct:isPartOf ?microarray_experiment ;
biordf:has_input_value ?sampleList ;
biordf:differentially_expressed_gene ?gene ;
biordf:has_ouput_value ?foldChange .
?gene rdfs:label ?geneLabel ;
biordf:name ?geneName .
?foldChange rdf:value ?foldChangeValue ;
stat:p_value ?pval .
#Apply filters to constrain the amount of results
FILTER (xsd:float(?foldChangeValue) > 0)
FILTER (xsd:float(?pval) < 0.001 )
}
#Find most recently updated SPARQL endpoint that contains information about genes and diseases.
{
?source rdf:type void:Dataset ;
void:sparqlEndpoint ?srvc ;
dct:issued ?issued ;
dct:subject diseasome:diseases ;
dct:subject diseasome:genes .
OPTIONAL {
?source1 rdf:type void:Dataset ;
void:sparqlEndpoint ?srvc2 ;
dct:issued ?issued2 ;
dct:subject diseasome:diseases ;
dct:subject diseasome:genes .
FILTER (?issued2 > ?issued)
}
FILTER (!BOUND(?srvc2))
}
#Get associated diseases from most recently updated Diseasome server.
SERVICE ?srvc {
?diseasomeGene rdfs:label ?geneLabel .
?disease diseasome:associatedGene ?diseasomeGene.
?disease rdfs:label ?diseaseName .
}
}
mArray.rq:
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX : <http://easytorememberpredicates.com/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT ?gene ?geneLabel ?geneDescription ?otherDiseases
{
?gene :expressed_id ?disease .
?gene :fold_change ?fold_change .
?gene rdfs:label ?geneLabel .
?gene rdfs:comment ?geneDescription .
FILTER regex(?disease, "Alzheimer")
FILTER (xsd:float(?fold_change) > 0)
?diseasomeGene rdfs:label ?geneLabel .
?diseasomeGene :also_involved_in ?otherDiseases .
}
mArray.map
# Common RDF vocabularies:
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX void: <http://rdfs.org/ns/void#>
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX : <http://easytorememberpredicates.com/>
# contextual:
PREFIX diseasome: <http://www4.wiwiss.fu-berlin.de/diseasome/resource/diseasome/>
PREFIX stat: <http://purl.org/net/biordfmicroarray/stat#>
PREFIX biordf: <http://purl.org/net/biordfmicroarray/ns#>
PREFIX neurolex: <http://neurolex.org/wiki/Category:>
PREFIX doid: <http://purl.org/obo/owl/DOID#>
<mArray> CONSTRUCT
{
?gene :expressed_id ?disease .
?gene :fold_change ?fold_change .
?gene rdfs:label ?geneLabel .
?gene rdfs:comment ?geneDescription .
?diseasomeGene rdfs:label ?geneLabel .
?diseasomeGene :also_involved_in ?otherDiseases .
}
WHERE
{
SERVICE <http://hcls.deri.org/sparql>
{
?experimentSet dct:isPartOf ?microarray_experiment ;
biordf:has_input_value ?sampleList ;
biordf:differentially_expressed_gene ?gene ;
biordf:has_ouput_value ?foldChange .
?sampleList biordf:patients_have_disease ?alzheimers .
?gene rdfs:label ?geneLabel ;
biordf:name ?geneDescription .
?foldChange rdf:value ?fold_change ;
stat:p_value ?pval .
FILTER (xsd:float(?pval) < 0.001 )
?alzheimers rdfs:label ?disease .
?diseasomeGene rdfs:label ?geneLabel .
?diseasomeDisease diseasome:associatedGene ?diseasomeGene .
?diseasomeDisease rdfs:label ?otherDiseases .
}
}
<diseasome> CONSTRUCT
{
?diseasomeGene :also_involved_in ?otherDiseases .
}
WHERE {
SERVICE <http://hcls.deri.org/sparql> {
?diseasomeGene rdfs:label ?geneLabel .
?diseasomeDisease diseasome:associatedGene ?diseasomeGene .
?diseasomeDisease rdfs:label ?otherDiseases .
}
}
Appendix
Suppose someone wants to ask a question which spans a bunch of query
services, but doesn't (care to) enumerate those resources. The
ChainingMapper takes rules (currently written as SPARQL CONSTRUCTs)
and maps a query over the consequents of those rules to a query over
the antecedents of those rules. This can be used to partition queries
into SERVICE graphs.
For a terse example, imagine service S12 includes triples with
predicates <p1> and <p2>, and S3 includes predicates <p3>. Voila a
SPARQL invocation and tranformed query:
Query and mapping rules:
SPARQL -npq \
-M 'CONSTRUCT { ?s <p1> ?o1 ; <p2> ?o2 } WHERE { SERVICE <S12> { ?s <p1> ?o1 ; <p2> ?o2 } }' \
-M 'CONSTRUCT { ?s <p3> ?o } WHERE { SERVICE <S3> { ?s <p3> ?o } }' \
-e 'SELECT * WHERE { ?x1 <p1> ?n1 ; <p2> ?n2 ; <p3> ?n3 }'
Transformed query:
SELECT * WHERE {
SERVICE <S12> { ?x1 <p1> ?n1 . ?x1 <p2> ?n2 }
SERVICE <S3> { ?x1 <p3> ?n3 }
}
If the system has a list of SERVICE graphs, then the user just has to
ask the base question and the systems handles query federation. What
if some predicates can be obtained from multiple endpoints?
SPARQL -npq \
-M 'CONSTRUCT { ?s <p1> ?o1 ; <p2> ?o2 } WHERE { SERVICE <S12> { ?s <p1> ?o1 ; <p2> ?o2 } }' \
-M 'CONSTRUCT { ?s <p2> ?o2 ; <p3> ?o3 } WHERE { SERVICE <S23> { ?s <p2> ?o2 ; <p3> ?o3 } }' \
-e 'SELECT * WHERE { ?x1 <p1> ?n1 ; <p2> ?n2 ; <p3> ?n3 }'
Now we expect <p2> to come from both S12 and S23:
SELECT * WHERE { {
SERVICE <S12> { ?x1 <p1> ?n1 . ?x1 <p2> ?n2 }
SERVICE <S23> { ?x1 <p2> ?_0x8752540_0_o2 . ?x1 <p3> ?n3 }
} UNION {
SERVICE <S12> { ?x1 <p1> ?n1 . ?x1 <p2> ?_0x874df10_1_o2 }
SERVICE <S23> { ?x1 <p2> ?n2 . ?x1 <p3> ?n3 }
} }
So our query for predicates <p1> <p2> <p3> could be answered by <S12>
providing <p1> <p2> and <S23> providing <p3>, or by <p2> coming from
<S23> (hence the UNION).
What are those odd variables like _0x8752540_0_o2 ? In order to be
sound with respect to the rules, the ChainingMapper has to ensure that
the complete antecedent of each query is satisfied. Looking at the
first side of the UNION, <S23> is expected to match { ?x <p2> ?o2;
<p3> ?o3 }, so the generated query must make sure that for any { ?x1
<p3> ?n3 }, ?x1 also has a property <p2> (though the value is
unimportant).
Since those rules are two course, we can write more, simpler rules:
SPARQL -npq \
-M 'CONSTRUCT { ?s <p1> ?o1 } WHERE { SERVICE <S12> { ?s <p1> ?o1 } }' \
-M 'CONSTRUCT { ?s <p2> ?o2 } WHERE { SERVICE <S12> { ?s <p2> ?o2 } }' \
-M 'CONSTRUCT { ?s <p2> ?o2 } WHERE { SERVICE <S23> { ?s <p2> ?o2 } }' \
-M 'CONSTRUCT { ?s <p3> ?o3 } WHERE { SERVICE <S23> { ?s <p3> ?o3 } }' \
-e 'SELECT * WHERE { ?x1 <p1> ?n1 ; <p2> ?n2 ; <p3> ?n3 }'
and see a simpler federated query:
SELECT * WHERE { {
SERVICE <S12> { ?x1 <p1> ?n1 }
SERVICE <S12> { ?x1 <p2> ?n2 }
SERVICE <S23> { ?x1 <p3> ?n3 }
} UNION {
SERVICE <S12> { ?x1 <p1> ?n1 }
SERVICE <S23> { ?x1 <p3> ?n3 }
SERVICE <S23> { ?x1 <p2> ?n2 }
} }
I got this related Q:
> Let's say we have two endpoints - S1 and S2 - registered with the federator.
>
> Triples in S1:
>
> ?x rdf:type ?y
>
> Triples in S2:
>
> ?a rdf:type foaf:person
> ?b foaf:mbox ?c
>
> Now, I get a query:
>
> Select ?x1
> Where {
> ?x1 rdf:type foaf:person
> ?x1 foaf:mbox ?y1
> }
>
> What does the mapper do in this instance? Is there a way to tell the
> mapper to look only at S2 (or only S1 to get bindings for the first
> triple in the where clause)? Or, is the federator expected to return
> the most complete answer by doing a union across both endpoints?
We can test this:
Query:
SPARQL -npq \
-M 'CONSTRUCT { ?x a ?y } WHERE { SERVICE <S1> { ?x a ?y } }' \
-M 'CONSTRUCT { ?a a <person> . ?b <mbox> ?c } WHERE { SERVICE <S2> { ?a a <person> . ?b <mbox> ?c } }' \
-e 'SELECT ?x1 WHERE { ?x1 a <person> . ?x1 <mbox> ?y1 }'
and get the expected UNION because two services can produce rdf:type
arcs:
Result (query):
SELECT ?x1
WHERE
{
{
SERVICE <S1>
{
?x1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <person> .
}
SERVICE <S2>
{
?_0x8580540_0_a <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <person> .
?x1 <mbox> ?y1 .
}
}
UNION
SERVICE <S2>
{
?x1 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <person> .
?x1 <mbox> ?y1 .
}
}