RDF is a directed, labeled graph data model for representing information in the Web. SPARQL can be used to express queries across diverse data sources, whether the data is stored natively as
RDF or viewed as RDF via middleware. This specification defines the syntax and semantics of SPARQL 1.1 Federated Query extension for executing queries distributed over different SPARQL endpoints.
The SERVICE
keyword extends SPARQL 1.1 to support queries that merge data distributed across the Web.
This specification is published by the RDF Star Working Group as part of the update of specifications for format and errata.
The growing number of SPARQL query services offer data consumers an opportunity to merge data distributed across the Web. This specification defines the syntax and semantics of the
SERVICE
extension to the SPARQL 1.1 Query Language. This extension allows a query author to direct a portion of a query to a particular SPARQL endpoint. Results are returned to the
federated query processor and are combined with results from the rest of the query.
This document uses the same namespaces as from the SPARQL 1.1 Query document.
Result sets are illustrated in tabular form as in the SPARQL 1.1 Query document.
x | y | z |
---|---|---|
"Alice" | <http://example/a> |
A 'binding' is a pair (variable, RDF term). There
are three variables: x
, y
and z
(shown as column headers). Each solution is shown as one row in the body of the table. Here, there is a single
solution, in which variable x
is bound to "Alice"
, variable y
is bound to http://example/a
, and variable z
is not bound to
an RDF term. Variables are not required to be bound in a solution.
The following terms are defined in SPARQL 1.1 Query Language [SQRY] and reused in this document:
RDF URI
reference
)The SERVICE
keyword instructs a federated query processor to invoke a portion of a SPARQL query against a remote SPARQL endpoint. This section presents examples of how to use
the SERVICE
keyword. The following sections define the syntax and semantics of this extension.
This example shows how to query a remote SPARQL endpoint and join the returned data with the data from the local RDF Dataset. Consider a query to find the names of the people we know.
Data about the names of various people is available at the http://people.example.org/sparql
endpoint:
@prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix : <http://example.org/> . :people15 foaf:name "Alice" . :people16 foaf:name "Bob" . :people17 foaf:name "Charles" . :people18 foaf:name "Daisy" .and one wants to combine with a local FOAF file
http://example.org/myfoaf.rdf
that contains the single triple:
<http://example.org/myfoaf/I> <http://xmlns.com/foaf/0.1/knows> <http://example.org/people15> .
Query:
PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name FROM <http://example.org/myfoaf.rdf> WHERE { <http://example.org/myfoaf/I> foaf:knows ?person . SERVICE <http://people.example.org/sparql> { ?person foaf:name ?name . } }
This query, on the data above, has one solution:
Query Result:
name |
---|
"Alice" |
Imagine we want to query people and optionally obtain their interests and the names of people they know. Imagine for instance, two endpoints containing data about people:
Data in the default graph at remote SPARQL endpoint: http://people.example.org/sparql
@prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix : <http://example.org/> . :people15 foaf:name "Alice" . :people16 foaf:name "Bob" . :people17 foaf:name "Charles" . :people17 foaf:interest <http://www.w3.org/2001/sw/rdb2rdf/> .
and data in the default graph the remote SPARQL endpoint: http://people2.example.org/sparql
@prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix : <http://example.org/> . :people15 foaf:knows :people18 . :people18 foaf:name "Mike" . :people17 foaf:knows :people19 . :people19 foaf:name "Daisy" .
Query:
PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?person ?interest ?known WHERE { SERVICE <http://people.example.org/sparql> { ?person foaf:name ?name . OPTIONAL { ?person foaf:interest ?interest . SERVICE <http://people2.example.org/sparql> { ?person foaf:knows ?known . } } } }
This query, on the data above, has three solutions:
Query Result:
person | interest | known |
---|---|---|
"Alice" | ||
"Bob" | ||
"Charles" | <http://www.w3.org/2001/sw/rdb2rdf/> | <http://example.org/people19> |
Notice that in the query above there is a nested SERVICE
in the OPTIONAL
clause. This query requires the SPARQL query service at
http://people.example.org/sparql
to support basic federated query.
The execution of a SERVICE
pattern may fail due to several reasons: the remote service may be down, the service IRI may not be dereferenceable, or the endpoint may return an
error to the query. Normally, under such circumstances the invoked query containing a SERVICE
pattern fails as a whole. Queries may explicitly allow failed SERVICE
requests with the use of the SILENT
keyword. The SILENT
keyword indicates that errors encountered while accessing a remote SPARQL endpoint should be ignored while
processing the query. The failed SERVICE
clause is treated as if it had a result of a single solution with no bindings.
In the following query the SILENT
keyword is present. If the remote SPARQL endpoint is not available because the SPARQL endpoint does not exist, it is down or it is not
accessible the query will return a solution sequence of one empty solution mapping. If the SILENT
keyword is not present, the query will stop and return the error.
Data in <http://people.example.org/sparql>
endpoint:
<http://example.org/people15> <http://xmlns.com/foaf/0.1/name> "Charles" .
Query:
PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?name WHERE { SERVICE SILENT <http://people.example.org/sparql> { <http://example.org/people15> foaf:name ?name . } }
Query result if an error occurs while querying the remote SPARQL endpoint:
name |
---|
SPARQL 1.1 Query includes the VALUES
clause (VALUES), which can be used to provide an unordered
solution sequence that is joined with the results of the query evaluation. Implementers of SPARQL 1.1 Federated Query may use the VALUES
clause to constrain the results received
from a remote endpoint based on solution bindings from evaluating other parts of the query.
The following example shows how SERVICE
and VALUES
can work together. Suppose a query that asks for all instances of foaf:Person in the default graph and also
their known people in the remote endpoint http://example.org/sparql
:
Data in the default graph:
@prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix : <http://example.org/> . :a a foaf:Person ; foaf:name "Alan" ; foaf:mbox; "alan@example.org" . :b a foaf:Person ; foaf:name "Bob" ; foaf:mbox "bob@example.org" .
and data in the default graph the remote SPARQL endpoint http://example.org/sparql
:
@prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix : <http://example.org/> . :a foaf:knows :b . :b foaf:knows :c . :c foaf:knows :a . :a foaf:interest "SPARQL 1.1 Basic Federated Query" . :b foaf:interest "SPARQL 1.1 Query" . :c foaf:interest "RDB2RDF Direct mapping" .
Query:
PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?s { ?s a foaf:Person . SERVICE <http://example.org/sparql> {?s foaf:knows ?o } }
When the original query is executed naively, with an unconstrained service call the endpoint may return more results than necessary. It may also happen that the SPARQL endpoint will not
return all of them. Many existing SPARQL endpoints have restrictions in the number of results they return and may miss the ones matching subjects ?s
from the local default graph.
Thus, an implementation of a query planner for federated queries may decide to decompose the query into two queries instead, where first the bindings from the local default graph are
evaluated:
Query:
PREFIX : <http://example.org/> PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT ?s { ?s a foaf:Person }
This query, on the data above, has two solutions:
Query Result:
s |
---|
<http://example.org/a> |
<http://example.org/b> |
Next, dispatch to the remote endpoint <http://example.org/sparql> a constrained query with the solutions for ?s
:
PREFIX foaf: <http://xmlns.com/foaf/0.1/> PREFIX : <http://example.org/> SELECT * {?s foaf:knows ?o } VALUES (?s) { (:a) (:b) }
This query on <http://example.org/sparql> has two solutions:
Query Result:
s | o |
---|---|
<http://example.org/a> | <http://example.org/b> |
<http://example.org/b> | <http://example.org/c> |
Both intermediary solution sequences can now be joined together, and projected to the requested variables to get to the expected two solution mappings for the overall query:
s |
---|
<http://example.org/a> |
<http://example.org/b> |
The SERVICE
extension is defined as an additional type of GroupGraphPattern, with an accompanying addition to SPARQL Query 1.1's Transform (syntax form):
If the form is
GroupGraphPattern
From the Translate Graph Patterns section of [SPARQL 1.1 Query Language] we extend the transformation of GroupGraphPattern to define the transformation of
SERVICE
patterns:
Let FS := the empty set Let G := the empty pattern, Z, a basic graph pattern which is the empty set. Let SilentOp := boolean, indicating SERVICE error behavior. For each element E in the GroupGraphPattern If E is of the form FILTER(expr) FS := FS ∪ {expr} End If E is of the form OPTIONAL{P} Let A := Transform(P) If A is of the form Filter(F, A2) G := LeftJoin(G, A2, F) Else G := LeftJoin(G, A, true) End End If E is of the form MINUS{P} G := Minus(G, Transform(P)) End If E is of the form BIND(expr AS var) G := Extend(G, var, expr) End If E is any other form Let A := Transform(E) G := Join(G, A) End If E is of the form SERVICE [SILENT] IRI {P} Let G := Join(G, Service(IRI, Transform(P), SilentOp)) End End If FS is not empty: Let X := Conjunction of expressions in FS G := Filter(X, G) The result is G.
The evaluation of SERVICE
is defined in terms of the SPARQL Results [RESULTS]
returned by a SPARQL Protocol [SPROT] execution of the nested graph pattern:
Definition: Evaluation of a Service Pattern
LetΩ0
the solution set with one empty solution, andthen:
SELECT * WHERE Q
against the service endpoint with IRI iri where Q is the serialization
of P in SPARQL syntax, in case of a successful service invocation according to the SPARQL protocol, and otherwiseΩ0.
in case SilentOp is true, and otherwiseIn the folowing section we introduce two examples showing the evaluation of SERVICE
patterns in the SPARQL algebra:
Example: a SERVICE
graph pattern in a series of joins:
Example: a SERVICE SILENT
graph pattern in a series of joins:
In the this section we do not present official evaluation semantics for the SPARQL pattern SERVICE VAR
. We only provide indications about how the evaluation of the SPARQL
pattern SERVICE VAR
can be evaluated.
A variable used in place of a service IRI indicates that the service call for any solution depends on that variable's binding in that solution. For instance, the default graph may contain data about which services contain data about project endpoints. We assume the following data on various projects that contains information about SPARQL endpoints where data about these projects (using the DOAP vocabulary) can be queried from:
@prefix void: <http://rdfs.org/ns/void#> . @prefix dc: <http://purl.org/dc/elements/1.1/> . @prefix doap: <http://usefulinc.com/ns/doap#> . [] dc:subject "Querying RDF" ; void:sparqlEndpoint <http://projects1.example.org/sparql> . [] dc:subject "Querying RDF remotely" ; void:sparqlEndpoint <http://projects2.example.org/sparql> . [] dc:subject "Updating RDF remotely" ; void:sparqlEndpoint <http://projects3.example.org/sparql> .
Data in the default graph at remote SPARQL endpoint http://projects2.example.org/sparql:
_:project1 doap:name "Query remote RDF Data" . _:project1 doap:created "2011-02-12"^^xsd:date . _:project2 doap:name "Querying multiple SPARQL endpoints" . _:project2 doap:created "2011-02-13"^^xsd:date .
Data in the default graph at remote SPARQL endpoint http://projects3.example.org/sparql:
_:project3 doap:name "Update remote RDF Data" . _:project3 doap:created "2011-02-14"^^xsd:date .
We now want to query the project names of projects on the subject "remote":
PREFIX void: <http://rdfs.org/ns/void#> PREFIX dc: <http://purl.org/dc/elements/1.1/> PREFIX doap: <http://usefulinc.com/ns/doap#> SELECT ?service ?projectName WHERE { # Find the service with subject "remote". ?p dc:subject ?projectSubject ; void:sparqlEndpoint ?service . FILTER regex(?projectSubject, "remote") # Query that service projects. SERVICE ?service { ?project doap:name ?projectName . } }
In the following table we present the intuitive solutions for this query with the data above:
Query Result:
service | title |
---|---|
<http://projects2.example.org/sparql> | "Query remote RDF Data" |
<http://projects2.example.org/sparql> | "Querying multiple SPARQL endpoints" |
<http://projects3.example.org/sparql> | "Update remote RDF Data" |
A SERVICE
clause involving a variable can be executed as a series of separate invocations of SPARQL query services. The results of each invocation are combined using union.
The query engine must determine the possible target SPARQL query services. The exact mechanism for doing this is not defined in this document. Execution order may also be used to determine
the list of services to to be tried. The example above suggests a specific order of execution: evaluating the basic graph pattern and filter outside the SERVICE
block first will
yield bindings for ?service
which may then be used to evaluate the SERVICE
block:
?p dc:subject ?projectSubject ; void:sparqlEndpoint ?service FILTER regex(?projectSubject, "remote")
Once ?service
has been evaluated it is possible to execute SERVICE
for each value of ?service
:
SERVICE ?service { ?project doap:name ?projectName . }
Note that blank nodes are unique to any document which serializes them. Also, SERVICE
calls depend on the SPARQL Protocol [SPROT] which transfers serialized
RDF documents making blank nodes unique between service calls.
See section 4 SPARQL 1.1 Federated Query Grammar regarding conformance of SPARQL Query strings that include the SPARQL 1.1 Federated Query Extensions. See section 3.1 Definition
of SERVICE for conformance of query results for the SERVICE
keyword.
This specification is intended for use in conjunction with the SPARQL 1.1 Query Language. See that specification for its conformance criteria.
SPARQL queries using SERVICE
imply that a URI will be dereferenced, and that the result will be incorporated into a working data set. All of the security issues of SPARQL Protocol 1.1 [SPROT] Section 3.1 SPARQL 1.1 Query [SQRY] Section 21, and Uniform Resource
Identifier (URI): Generic Syntax [RFC3986] Section 7 should be considered.
The SPARQL 1.1 Federated Query document is a product of the whole of the W3C SPARQL Working Group, and our thanks for discussions, comments and reviews go to all present and past members.
In addition, we have had comments and discussions with many people through the working group comments list. All comments go to making a better document. Carlos would also like to particularly thank Jorge Pérez, Oscar Corcho and Marcelo Arenas for their discussions on the syntax and semantics of the Federated query extension.
TODO
TODO
TODO