Copyright © 2023 the Contributors to the Reconciliation Service API v0.2 Specification, published by the Entity Reconciliation Community Group under the W3C Community Final Specification Agreement (FSA). A human-readable summary is available.
This document describes the reconciliation service API, a protocol edited by the W3C Entity Reconciliation Community Group. It is intended as a comprehensive and definitive specification of this API in its given state. Various aspects of this API need to be improved, as hinted by notes throughout this document.
This specification was published by the Entity Reconciliation Community Group. It is not a W3C Standard nor is it on the W3C Standards Track. Please note that under the W3C Community Final Specification Agreement (FSA) other conditions apply. Learn more about W3C Community and Business Groups.
Members of the Community Group are encouraged to contribute to this document by documenting the current behaviour of the reconciliation API. The ReSpec Editor's Guide can be used to learn more about the markup to use in this document.
GitHub Issues are preferred for discussion of this specification. Alternatively, you can send comments to our mailing list. Please send them to public-reconciliation@w3.org (subscribe, archives).
This section is non-normative.
Integrating data from sources which do not share common unique identifiers often requires matching (or
Various mechanisms exist to state the equivalence between two URIs: for instance, such a correspondence can be stated with the owl:sameAs
property [owl-features], or using looser notions equivalences defined in SKOS [skos-primer]. But such statements must in turn be themselves findable.
One can aggregate owl:sameAs
statements from various sources to infer identities by transitivity, but this is a subtle art as some data sources can erroneously equate different concepts [beek-2018].
After all, any quest towards building a universal identifier system which avoids duplicates is necessarily doomed.
Data publishers use different granularities to model the world. Concepts have fluctuating boundaries across languages, cultures and time.
In practice, we can determine if two database records refer to the same entity by comparing their attributes. For instance, two entries about cities bearing the same name, in the same country and with the same mayor are likely to refer to the same city. The reconciliation API that we present here makes it easier to discover such matches. It is a protocol that a data provider can implement, enabling its consumers to efficiently match their own data to the entities represented by the provider.
By nature, reconciliation is a heuristic process. Different entities can have many identical characteristics, leading to false positives. The same entity can be represented in different ways by two databases, for instance by spelling names differently, leading to false negatives. This problem has been extensively studied and many heuristics have been proposed to tackle it [christen-2012]. The reconciliation API is agnostic about the particulars of the heuristics involved: it lets data providers choose how they want to determine which of their entities are good candidates for a particular query. What it provides is a web API to let users obtain these candidate entities without having to implement the underlying reconciliation heuristics themselves, nor download the entire contents of the target database.
This API was originally designed by Metaweb as a protocol used between Freebase and Gridworks (now known as OpenRefine).
Freebase was a free crowdsourced knowledge graph, storing data about a broad range of topics and exposed on the web as linked data.
OpenRefine is tool which was originally designed to help populate this knowledge graph by importing data into it.
It supports a range of operations which help the user reshape their data to prepare it for ingestion in a data model such as Freebase's.
One of these operations is
The reconciliation API was then turned into a generic protocol that any database could implement.
This made it possible to register such a database into OpenRefine by adding it as a
This API was documented on OpenRefine's wiki as a living document which evolved gradually, as OpenRefine improved. In addition to its core feature, fetching reconciliation candidates matching a given query, services are optionally able to implement additional endpoints which ease the integration of the service in OpenRefine's UI, by providing previews for entities (with a Preview Service) and auto-completion for various inputs (with Suggest Services). In 2018, a Data Extension Service was added, letting consumers pull data from the target database once they have reconciled their records.
In 2019 the W3C Entity Reconciliation Community Group was formed, with the intention of promoting and improving this API outside the strict scope of the OpenRefine project. This document is an attempt to better specify this API.
A list of known public endpoints is maintained by the community, where they can also be tried out interactively. OpenRefine's wiki also hosts a list of reconciliable data source which also includes non-hosted or discontinued services. Existing clients to the API, such as OpenRefine or Cocoda can be used to interact with reconciliation services.
This section summarizes the differences between successive versions of the API.
Initial documentation of the reconciliation API as supported by OpenRefine 3.0 to 3.2.
Initial improvements to the specifications made by our Community Group. Most of them are backwards-compatible, except for the requirement to support CORS for cross-origin access.
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words MAY, MUST, and SHOULD in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.
This section documents the data model behind the reconciliation API. A reconciliation service lets users match their data against entities exposed by the service. Matching can be refined by filtering by type or properties with property values. The purpose of this section is to define these notions.
An entity is a record in the data source exposed by the service. It comprises the following fields:
id
name
description
type
Reconciliation services can define in their service manifest a view template for entities,
which associates to each entity a corresponding URI, by inserting its identifier in the template.
A view template is a string which contains the {{id}}
substring.
For each entity, replacing {{id}}
in the template by the entity's identifier
MUST result in a valid URI. To guarantee the correctness of the formed URI, clients MUST percent-encode
component-delimiting reserved characters in the identifier, i.e. encode it as a URI component [RFC3986].
Similarly, it is possible to associate to each matching feature a URL where documentation about the feature is provided, by means of a view template. Inserting any feature identifier in this template generates the URL for the feature.
A type represents a category of entities. It comprises the following fields:
id
name
broader
A property represents a type of attribute that entities can have in the data source. It comprises the following fields:
id
name
A property value can be any of the following:
A reconciliation service MUST define two URIs, exposed in its service manifest:
http://www.wikidata.org/entity/
or https://d-nb.info/gnd/
. This URI MAY resolve to a page describing these entities and their identifiers;http://www.w3.org/2004/02/skos/core#Concept
or https://schema.org/Thing
. This URI MAY resolve to a page describing this type.If two different reconciliation services expose the same entities and properties, then they SHOULD use the same identifier and schema space URIs, signalling that (for instance) the Data Extension service of the first one can be used on reconciliation candidates by the second one.
The notions of identifier and schema space have been inherited from the API's original purpose, when it was specific to Freebase. Their original meaning was to be understood within Freebase's own data model.
This section documents how reconciliation services are exposed as HTTP(S) services and how they can announce the features of the API they implement.
The endpoint of a reconciliation service is a URL from which the reconciliation service is offered.
When the reconciliation service endpoint is queried with a HTTP GET query without parameters, the service manifest MUST be returned.
A service manifest consists of the following fields:
versions
["0.1", "0.2"]
. Since this field did not exist in version 0.1, services which do not declare a versions
field are expected to only support version 0.1.name
identifierSpace
schemaSpace
defaultTypes
documentation
logo
serviceVersion
versions
which is about the versions of the reconciliation API supported by the service;view
url
. Its value is a view template for entities;feature_view
url
. Its value is a view template for matching features;preview
suggest
entity
property
type
extend
batchSize
authentication
For instance, a service could expose the following minimal service manifest:
{
"versions": ["0.2"],
"name": "VIAF",
"identifierSpace": "http://vocab.getty.edu/doc/#GVP_URLs_and_Prefixes",
"schemaSpace": "http://vocab.getty.edu/doc/#The_Getty_Vocabularies_and_LOD"
}
A more complete example, with some optional services implemented:
{
"versions": ["0.2"],
"defaultTypes": [
{
"id": "/ulan",
"name": "ULAN search"
},
{
"id": "/tgn",
"name": "TGN search"
},
{
"id": "/aat",
"name": "AAT search"
},
{
"id": "/all",
"name": "Search all Vocabs"
}
],
"identifierSpace": "http://vocab.getty.edu/doc/#GVP_URLs_and_Prefixes",
"name": "Getty Vocabularies Reconciliation Service",
"batchSize": 50,
"preview": {
"height": 200,
"url": "https://services.getty.edu/vocab/reconcile/preview?id={{id}}",
"width": 350
},
"schemaSpace": "http://vocab.getty.edu/doc/#The_Getty_Vocabularies_and_LOD",
"suggest": {
"property": {
"service_path": "/suggest/property",
"service_url": "https://services.getty.edu/vocab/reconcile"
}
},
"view": {
"url": "http://vocab.getty.edu/page/{{id}}"
}
}
In the interest of protecting the data sent as reconciliation queries, all endpoints of reconciliation services SHOULD be available over HTTPS [RFC7230] [SECURING-WEB]. This does not apply to locally hosted services.
All HTTP(S) endpoints exposed by the service MUST enable access by CORS [cors] to enable web-based clients to access the service from a different domain without exposing themselves to untrusted third-party code.
Some clients might only require cross-origin access on some particular endpoints, which are called directly by a web UI. Since this depends on the architecture of the client, this cannot be relied upon and cross-origin access MUST be implemented for all endpoints in a uniform way.
In addition, endpoints exposed by the service MAY support JSONP [JSONP], which enables older web-based clients to access the service from a different domain.
Services SHOULD use the broad spectrum of HTTP status codes [RFC2616] [RFC6585] to expose errors, for instance due to malformed or too frequent queries.
The response body of such error responses is not specified so far.
Services MAY request users to provide an authentication token when making queries. They can do so by adding a security scheme to their manifest. Security schemes are defined in [OPENAPIS] and support authentication by API key, HTTP Authentication [RFC7617], OAuth 2 [RFC6749] and OpenID Connect.
For instance, the following security scheme indicates that basic HTTP authentication is required on this endpoint:
{
"type": "http",
"scheme": "basic"
}
Requiring an API key passed as a query parameter can be expressed as follows:
{
"type": "apiKey",
"name": "api_key",
"in": "query"
}
If a security scheme is provided in the service manifest, all queries to the service MUST provide the corresponding credentials, except for retrieving the service manifest itself. When invalid authentication is supplied in any HTTP request, the service MUST return an HTTP 401 error.
This section specifies how clients can send reconciliation queries to services and how services respond to them.
A reconciliation query consists of the following fields.
At least one of query
or properties
must be supplied, but all other
fields are optional.
query
type
limit
properties
pid
field)
to one or more property values (in the v
field). These are used to further filter the set of candidates (similar to a WHERE clause in SQL),
by allowing clients to specify other attributes of entities that should match, beyond their name in the query
field.
How reconciliation services handle this further restriction ("must match all properties" or "should match some") and how it affects the score, is up to the service.
A reconciliation service that supports properties SHOULD provide a suggest service for discovering these properties;type_strict
"should"
, "all"
or "any"
.A reconciliation query batch is a set of reconciliation queries indexed by string identifiers.
Minimal example of a reconciliation query batch with mandatory fields only:
{
"q1": {
"query": "Hans-Eberhard Urbaniak"
},
"q2": {
"query": "Ernst Schwanhold"
}
}
Full example of a reconciliation query batch with all optional fields:
{
"q0": {
"query": "Christel Hanewinckel",
"type": "DifferentiatedPerson",
"limit": 5,
"properties": [
{
"pid": "professionOrOccupation",
"v": "Politik*"
},
{
"pid": "affiliation",
"v": "http://d-nb.info/gnd/2022139-3"
}
],
"type_strict": "should"
},
"q1": {
"query": "Franz Thönnes",
"type": "DifferentiatedPerson",
"limit": 5,
"properties": [
{
"pid": "professionOrOccupation",
"v": "Politik*"
},
{
"pid": "affiliation",
"v": "http://d-nb.info/gnd/2022139-3"
}
],
"type_strict": "should"
}
}
For a single property it is possible to provide multiple values as an array. The values provided do not need to have the same type. In the following example a string and a reconciled value are provided as values for the same property.
{
"q0": {
"query": "Christel Hanewinckel",
"type": "DifferentiatedPerson",
"limit": 5,
"properties": [
{
"pid": "professionOrOccupation",
"v": [
"Politik*",
{
"id": "wissenschaftler",
"name": "Wissenschaftler(in)"
}
]
}
],
"type_strict": "should"
}
}
A JSON schema to validate the serialization of a query batch is available.
The meaning of the type_strict
is unclear, it is inherited from Freebase's API but is not used
or documented in OpenRefine.
A reconciliation candidate represents an entity as a response to a reconciliation query. It is proposed to the client as a potential matching entity for this query. It contains the following fields:
id
name
description
type
score
features
match
A matching feature is a numerical or boolean value which can be used to determine how likely it is for the candidate to be the correct entity. It contains the following fields:
id
"name_tfidf"
or "pagerank"
. This id must be unique among all the matching features returned for a given candidate;value
score
field). By exposing individual features in their responses, services make it possible for clients
to compute matching scores which fit their use cases better.
Example of a reconciliation candidate with all possible fields:
{
"id": "1117582299",
"name": "Urbaniak, Hans-Eberhard",
"score": 85.71888,
"features": [
{
"id": "name_tfidf",
"value": 378.239
},
{
"id": "pagerank",
"value": -3.1209
},
{
"id": "type_match",
"value": 10.329
},
{
"id": "deprecated",
"value": false
}
],
"match": true,
"type": [
{
"id": "AuthorityResource",
"name": "Normdatenressource"
},
{
"id": "DifferentiatedPerson",
"name": "Individualisierte Person",
"broader": [
{
"id": "Person",
"name": "Person"
}
]
}
]
}
A reconciliation result is a set of reconciliation candidates. It is serialized in JSON as an array of such reconciliation candidate objects. This array SHOULD be sorted by decreasing score.
A reconciliation result batch is a set of reconciliation results indexed by string identifiers of the corresponding reconciliation query batch.
Full example of a reconciliation result batch:
{
"q1": {
"result": [
{
"id": "120333937",
"name": "Urbaniak, Regina",
"description": "1969-| Diss. Fachbereich Mathematik",
"score": 53.015232,
"match": false,
"features": [
{
"id": "name_tfidf",
"value": 378.239
},
{
"id": "pagerank",
"value": -3.1209
},
{
"id": "type_match",
"value": 10.329
},
{
"id": "deprecated",
"value": false
}
],
"type": [
{
"id": "AuthorityResource",
"name": "Normdatenressource"
},
{
"id": "DifferentiatedPerson",
"name": "Individualisierte Person"
}
]
},
{
"id": "1127147390",
"name": "Urbaniak, Jan",
"description": "Universität Wrocław, Niederlandestudien",
"score": 52.357353,
"match": false,
"type": [
{
"id": "AuthorityResource",
"name": "Normdatenressource"
},
{
"id": "DifferentiatedPerson",
"name": "Individualisierte Person"
}
]
}
]
},
"q2": {
"result": [
{
"id": "123064325",
"name": "Schwanhold, Ernst",
"description": "1948-| Mitglied des Deutschen Bundestages, SPD (1993)",
"score": 86.43497,
"features": [
{
"id": "name_tfidf",
"value": 334.188
},
{
"id": "pagerank",
"value": -4.1581
},
{
"id": "type_match",
"value": 13.78
},
{
"id": "deprecated",
"value": false
}
],
"match": true,
"type": [
{
"id": "AuthorityResource",
"name": "Normdatenressource"
},
{
"id": "DifferentiatedPerson",
"name": "Individualisierte Person"
}
]
},
{
"id": "116362988X",
"name": "Schwanhold, Nadine",
"description": "Dissertation Potsdam, Universität, Mathematik-Naturwissenschaftliche Fakultät, 2017",
"score": 62.04763,
"match": false,
"type": [
{
"id": "AuthorityResource",
"name": "Normdatenressource"
},
{
"id": "DifferentiatedPerson",
"name": "Individualisierte Person"
}
]
}
]
}
}
A JSON schema to validate the serialization of a reconciliation result batch is available.
The primary role of a reconciliation service is to translate reconciliation query batches to reconciliation result batches over HTTP.
A reconciliation service MUST support HTTP POST requests on its endpoint with
application/x-www-form-urlencoded
bodies containing a
reconciliation query batch (serialized in JSON) in a form element named queries
.
POST / queries=<URL-encoded reconciliation query batch>
Similarly, a reconciliation service SHOULD support HTTP GET requests with a
reconciliation query batch in a query string parameter named queries
.
GET /?queries=<URL-encoded reconciliation query batch>
In both cases, the service returns the corresponding query batch serialized in JSON.
The POST method is the primary way to send reconciliation queries to a service since it does not restrict the length of the query batches. The GET method is useful for interactive debugging of reconciliation queries in a web browser, for instance.
This section is non-normative.
The way candidates are retrieved from the underlying database and scored against the query is left
entirely at the discretion of the service.
However services should retrieve and score the candidates of each query in a batch independently
of the other queries in the same batch, or in previous ones.
It is also expected that reconciliation queries where query
matches exactly the name
of an entity in the database and with no other constraint should return at least this entity, unless
it is hidden by many namesakes. Similarly, supplying an entity identifier as query
should return the corresponding entity as a candidate, with a high score.
Deciding on a scoring method is one of the main difficulties in developing a reconciliation service. Services are encouraged to expose as many matching features as they deem useful, in particular features which require knowledge of global statistics on the database or other attributes. Examples include:
Many open source reconciliation services are available and these might provide some inspiration concerning indexing and scoring methods when developing new services. See External Resources for some examples.
This section specifies how reconciliation services can provide embeddable HTML previews of their entities, which clients can display in their user interface.
Reconciliation services MAY offer a preview service by providing the preview metadata as an object stored in the service manifest under the key preview
. It consists of the following fields, all mandatory:
url
{{id}}
such that replacing {{id}}
by an entity identifier encoded as a URI component yields the preview URL for that entity. This preview URL MUST resolve to an HTML page summarizing the entity.
It SHOULD render appropriately in an <iframe>
whose dimensions are specified by the service in the following fields;width
<iframe>
element where to render an entity preview;height
<iframe>
.For instance, a service may expose the following preview metadata:
{
"url": "https://example.com/api/preview?id={{id}}",
"width": 200,
"height": 100
}
A preview service is queried by resolving the preview URL for an entity. The URL must resolve to an HTML document.
For instance, assuming the example preview metadata above, the service could respond to a preview request as follows:
GET /api/preview?id=H-34bd8e0bba
<html>
<head><meta charset="utf-8" /></head>
<body>
<h1>Cumulonimbus</h1>
<p>Type of cloud</p>
</body>
</html>
This section specifies how reconciliation services can provide auto-complete endpoints for their entities, properties and types. A reconciliation service can offer a suggest service for any of these three classes. For instance, a service which only exposes a single type might not want to expose a suggest service for types. These suggest services can be used by clients to let users select an entity, property or type manually, at various stages of their reconciliation workflows. Suggest services for entities, properties and types are declared independently in the service manifest by providing a suggest metadata for them.
A suggest metadata object consists of the following fields:
service_url
service_path
service_url
to obtain the full URL of the suggest service;flyout_service_url
service_url
;flyout_service_path
flyout_service_url
to obtain the full URL of the flyout service. The absence of this parameter indicates that no flyout service is associated with this suggest service.For instance, a suggest metadata could be as follows:
{
"service_url": "https://example.com/api",
"service_path": "/suggest",
"flyout_service_path": "/suggest/flyout/${id}"
}
https://example.com/api/suggest
with an associated flyout endpoint at https://example.com/api/suggest/flyout/${id}
.
A suggest service MUST accept GET queries with the following URL-encoded parameters:
prefix
cursor
A response to a suggest query consists of the following fields:
result
id
name
description
name
;notable
id
and name
field which represent the type.
The key notable
comes from a notion of notable types that existed in Freebase.
For instance, a suggest service for entities could return the following response:
{
"result": [
{
"name": "cumulonimbus",
"description": "genus of clouds, dense towering vertical cloud associated with thunderstorms and atmospheric instability",
"id": "Q182311",
"notable": [
{
"name": "cloud genera",
"id": "Q1840368"
}
]
},
{
"name": "Cumulopuntia",
"description": "genus of plants",
"id": "Q310599",
"notable": [
{
"name": "taxon",
"id": "Q16521"
}
]
},
{
"name": "cumulonimbus incus",
"description": "variety of cloud",
"id": "Q1358304",
"notable": []
}
]
}
A suggest service for properties could return the following response:
{
"result": [
{
"name": "coordinate location",
"description": "geocoordinates of the subject. For Earth, please note that only WGS84 coordinating system is supported at the moment",
"id": "P625"
},
{
"name": "place of birth",
"description": "most specific known (e.g. city instead of country, or hospital instead of city) birth location of a person, animal or fictional character",
"id": "P19"
},
{
"name": "located in time zone",
"description": "time zone for this item",
"id": "P421"
}
]
}
And a suggest service for types could return the following response:
{
"result": [
{
"id": "Work",
"name": "Werk"
},
{
"id": "MusicalWork",
"name": "Werk der Musik",
"broader": [
{
"id": "Work",
"name": "Werk"
}
]
},
{
"id": "BuildingOrMemorial",
"name": "Bauwerk oder Denkmal"
},
{
"id": "VersionOfAMusicalWork",
"name": "Fassung eines Werks der Musik"
}
]
}
JSON schemas to validate suggest responses are available for entities, for properties and for types.
This section is non-normative.
It is generally expected by users that an entity suggest query where prefix
is the name of an entity should return this entity in the suggest response, unless that entity is hidden behind many other namesakes.
Similarly, supplying an entity identifier as prefix
should return this entity in the suggest response.
Analogous expectations apply for property and type suggest services.
As the prefix
name suggests, suggest services are expected to perform prefix search on their database of records, such that a suggest service can be used to provide auto-completion as users type names or identifiers in a field.
A flyout service provides small previews of suggested elements. These previews are designed to be shown when hovering a suggested element. When a suggest service supports flyout, it declares the flyout endpoint in its suggest metadata.
A preview for a suggested entity, property or type can then be obtained at the flyout endpoint by replacing ${id}
by the identifier for the entity, property or type, encoded as a URI component. Upon a GET query to this URL, the service returns a JSON response
consisting of an object with the following fields:
id
html
For instance, if a service's flyout endpoint is https://example.com/suggest/entities/flyout?id=${id}
,
then by retrieving https://example.com/suggest/entities/flyout?id=Q38274
, one might get the following
response:
{
"id": "Q38274",
"html": "<p style=\"font-size: 0.8em; color: black;\">Thai musician</p>"
}
Flyout services were used by Freebase and are mostly redundant with the description
field in suggest responses.
Given that they allow services to return arbitrary HTML content, they also pose a security threat to clients. It is therefore proposed that this functionality is dropped in the future.
This section specifies how reconciliation services can let clients fetch the values of some properties on a selection of entities.
A data extension service MUST support data extension query requests.
A data extension service SHOULD provide data extension property proposals.
A data extension service MAY support data extension property settings.
The data extension metadata is an object stored in the service manifest in the extend
field. It consists of the following settings, all optional:
propose_properties
service_url
service_path
property_settings
A data extension property setting consists of:
name
label
type
"number"
, "text"
, "checkbox"
, or "select"
. This determines which type of value the property setting gis expected to store: clients SHOULD render this setting with the corresponding HTML element;
default
help_text
choices
type
is select
, an array of property setting choices.Example of data extension metadata with all optional fields:
{
"propose_properties": {
"service_url": "https://lobid.org",
"service_path": "/gnd/reconcile/properties"
},
"property_settings": [
{
"name": "limit",
"label": "Limit",
"type": "number",
"default": 0,
"help_text": "Maximum number of values to return per row (0 for no limit)"
},
{
"name": "content",
"label": "Content",
"type": "select",
"default": "literal",
"help_text": "Content type: ID or literal",
"choices": [
{
"value": "id",
"name": "ID"
},
{
"value": "literal",
"name": "Literal"
}
]
}
]
}
A data extension property proposal service returns properties for a given type identifier.
The service MUST support HTTP GET requests with a type
query string parameter.
The service SHOULD support an optional limit
query string parameter to control the number of proposed properties.
The service URL and path are declared in the data extension metadata of the service manifest.
GET /properties?type=<type identifier>[&limit=<limit>]
A data extension property proposal response consists of:
properties
type
limit
Example of a data extension property proposal response:
{
"limit": 5,
"type": "DifferentiatedPerson",
"properties": [
{
"id": "affiliation",
"name": "Affiliation"
},
{
"id": "geographicAreaCode",
"name": "Ländercode"
},
{
"id": "preferredName",
"name": "Bevorzugter Name"
},
{
"id": "professionOrOccupation",
"name": "Beruf oder Beschäftigung"
},
{
"id": "variantName",
"name": "Varianter Name"
}
]
}
A data extension query request lets clients fetch the values of some properties on a selection of entities.
The fact that a reconciliation service offers data extension MUST be announced by including a data extension metadata in the extend
field of the service manifest.
A data extension service MUST support HTTP POST requests with
application/x-www-form-urlencoded
bodies containing a
data extension query in a form element named extend
.
POST / extend=<URL-encoded data extension query>
A data extension service SHOULD support HTTP GET requests with a
data extension query in a query string parameter named extend
.
GET /?extend=<URL-encoded data extension query>
A data extension query consists of:
Example of a data extension query:
{
"ids": [
"10662041X",
"1064905412"
],
"properties": [
{
"id": "variantName",
"settings": {
"limit": "5"
}
},
{
"id": "geographicAreaCode",
"settings": {
"limit": "1",
"content": "id"
}
},
{
"id": "professionOrOccupation"
},
{
"id": "wikidataId"
}
]
}
A data extension response consists of metadata and rows.
The metadata contains the properties used for data extension, as requested in the data extension query. If properties have entities as values, they MAY specify a type in the metadata.
The rows object contains, for each entity identifier in the
data extension query, for each property identifier in the
metadata, the property values of that property in that entity.
If the property values are entities, their identifiers are expected to be in the service's identifier space.
If that is not the case, the service MUST specify in the meta
section the endpoint of another reconciliation service whose identifier space
contains the returned entities. This endpoint is specified on a column-per-column basis.
Response example for the data extension query from the previous example:
{
"meta": [
{
"id": "variantName",
"name": "Varianter Name"
},
{
"id": "geographicAreaCode",
"name": "Ländercode"
},
{
"id": "professionOrOccupation",
"name": "Beruf oder Beschäftigung",
"type": {
"id": "SubjectHeading",
"name": "Schlagwort"
}
},
{
"id": "wikidataId",
"name": "Wikidata ID",
"service": "https://www.wikidata.org/api/reconcile"
}
],
"rows": {
"10662041X": {
"variantName": [
{
"str": "Stryi-Leitgeb, Gerda"
},
{
"str": "Leitgeb, Gerda Stryi-"
}
],
"geographicAreaCode": [
{
"str": "http://d-nb.info/standards/vocab/gnd/geographic-area-code#XA-DE"
}
],
"professionOrOccupation": [
{
"id": "4037223-6",
"name": "Malerin",
"description": "Beruf"
},
{
"id": "4033430-2",
"name": "Künstlerin"
}
],
"wikidataId": [
{
"id": "Q3874347",
"name": "Gerda Stryi-Leitgeb"
}
]
},
"1064905412": {
"variantName": [
{}
],
"geographicAreaCode": [
{
"str": "http://d-nb.info/standards/vocab/gnd/geographic-area-code#XA-DE"
}
],
"professionOrOccupation": [
{
"id": "4002844-6",
"name": "Architekt"
}
],
"wikidataId": [
{
"id": "Q3874347",
"name": "Gerda Stryi-Leitgeb"
}
]
}
}
}
This appendix provides JSON schemas [json-schema] which can be used to validate the JSON serialization of various elements as specified by these specifications.
The manifest schema can be used to validate a service manifest.
{
"$schema": "http://json-schema.org/schema#",
"$id": "https://reconciliation-api.github.io/specs/0.2/schemas/manifest.json",
"type": "object",
"description": "This validates a service manifest, describing the features supported by the endpoint.",
"properties": {
"versions": {
"type": "array",
"description": "The list of API versions supported by this service.",
"items": {
"type": "string"
},
"contains": {
"enum": ["0.2"]
}
},
"name": {
"type": "string",
"description": "A human-readable name for the service or data source"
},
"identifierSpace": {
"type": "string",
"description": "A URI describing the entity identifiers used in this service"
},
"schemaSpace": {
"type": "string",
"description": "A URI describing the schema used in this service"
},
"documentation": {
"type": "string",
"description": "A URI which hosts documentation about this service"
},
"serviceVersion": {
"type": "string",
"description": "A string representing the version of the software which exposes this service"
},
"logo": {
"type": "string",
"description": "A URI to a square image which can be used as logo for this service"
},
"authentication": {
"$ref": "http://swagger.io/v2/schema.json#/definitions/securityDefinitions/additionalProperties"
},
"view": {
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "A template to transform an entity identifier into the corresponding URI",
"pattern": ".*\\{\\{id\\}\\}.*"
}
},
"required": [
"url"
]
},
"feature_view": {
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "A template to transform a matching feature identifier into the corresponding URI",
"pattern": ".*\\{\\{id\\}\\}.*"
}
},
"required": [
"url"
]
},
"defaultTypes": {
"type": "array",
"description": "A list of default types that are considered good generic choices for reconciliation",
"items": { "$ref": "type.json" },
"uniqueItems": true
},
"suggest": {
"type": "object",
"description": "Settings for the suggest protocol, to auto-complete entities, properties and types",
"definitions": {
"service_definition": {
"type": "object",
"properties": {
"service_url": {
"type": "string"
},
"service_path": {
"type": "string"
},
"flyout_service_url": {
"type": "string"
},
"flyout_service_path": {
"type": "string",
"pattern": ".*\\$\\{id\\}.*"
}
},
"required": []
}
},
"properties": {
"entity": {
"$ref": "#/properties/suggest/definitions/service_definition"
},
"property": {
"$ref": "#/properties/suggest/definitions/service_definition"
},
"type": {
"$ref": "#/properties/suggest/definitions/service_definition"
}
}
},
"preview": {
"type": "object",
"description": "Settings for the preview protocol, for HTML previews of entities",
"properties": {
"url": {
"type": "string",
"pattern": ".*\\{\\{id\\}\\}.*",
"description": "A URL pattern which transforms the entity ID into a preview URL for it"
},
"width": {
"type": "integer",
"description": "The width of the iframe where to include the HTML preview"
},
"height": {
"type": "integer",
"description": "The height of the iframe where to include the HTML preview"
}
},
"required": [
"url",
"width",
"height"
]
},
"extend": {
"type": "object",
"description": "Settings for the data extension protocol, to fetch property values",
"properties": {
"propose_properties": {
"type": "object",
"description": "Location of the endpoint to propose properties to fetch for a given type",
"properties": {
"service_url": {
"type": "string"
},
"service_path": {
"type": "string"
}
}
},
"property_settings": {
"type": "array",
"description": "Definition of the settings configurable by the user when fetching a property",
"items": {
"oneOf": [
{
"type": "object",
"description": "Defines a numerical setting on a property",
"properties": {
"type": {
"type": "string",
"enum": [
"number"
]
},
"default": {
"type": "number"
},
"label": {
"type": "string"
},
"name": {
"type": "string"
},
"help_text": {
"type": "string"
}
},
"required": [
"type",
"label",
"name"
]
},
{
"type": "object",
"description": "Defines a string setting on a property",
"properties": {
"type": {
"type": "string",
"enum": [
"text"
]
},
"default": {
"type": "string"
},
"label": {
"type": "string"
},
"name": {
"type": "string"
},
"help_text": {
"type": "string"
}
},
"required": [
"type",
"label",
"name"
]
},
{
"type": "object",
"description": "Defines a boolean setting on a property",
"properties": {
"type": {
"type": "string",
"enum": [
"checkbox"
]
},
"default": {
"type": "boolean"
},
"label": {
"type": "string"
},
"name": {
"type": "string"
},
"help_text": {
"type": "string"
}
},
"required": [
"type",
"label",
"name"
]
},
{
"type": "object",
"description": "Defines a setting with a fixed set of choices",
"properties": {
"type": {
"type": "string",
"enum": [
"select"
]
},
"default": {
"type": "string"
},
"label": {
"type": "string"
},
"name": {
"type": "string"
},
"help_text": {
"type": "string"
},
"choices": {
"type": "array",
"items": {
"type": "object",
"properties": {
"value": {
"type": "string"
},
"name": {
"type": "string"
}
},
"required": [
"value",
"name"
]
}
}
},
"required": [
"type",
"label",
"name",
"choices"
]
}
]
}
}
}
}
},
"required": [
"versions",
"name",
"identifierSpace",
"schemaSpace"
]
}
The reconciliation query batch schema can be used to validate the JSON serialization of any reconciliation query batch, i.e. the payload of a GET/POST to the reconciliation endpoint.
{
"$schema": "http://json-schema.org/schema#",
"$id": "https://reconciliation-api.github.io/specs/0.2/schemas/reconciliation-query.json",
"type": "object",
"description": "This schema validates the JSON serialization of any reconciliation query batch, i.e. the payload of a GET/POST to a reconciliation endpoint.",
"definitions": {
"property_value": {
"oneOf": [
{
"type": "string"
},
{
"type": "number"
},
{
"type": "boolean"
},
{
"type": "object",
"description": "A property value which represents another entity, for instance if it was previously reconciled itself",
"properties": {
"id": {
"type": "string"
},
"name": {
"type": "string"
}
},
"required": [
"id"
]
}
]
}
},
"patternProperties": {
"^.*$": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "A string to be matched against the name of the entities"
},
"type": {
"description": "Either a single type identifier or a list of type identifiers",
"oneOf": [
{
"type": "string"
},
{
"type": "array",
"items": {
"type": "string"
}
}
]
},
"limit": {
"type": "number",
"description": "The maximum number of candidates to return"
},
"properties": {
"type": "array",
"description": "An optional list of property mappings to refine the query",
"items": {
"type": "object",
"properties": {
"pid": {
"type": "string",
"description": "The identifier of the property, whose values will be compared to the values supplied"
},
"v": {
"description": "A value (or array of values) to match against the property values associated with the property on each candidate",
"oneOf": [
{
"$ref": "#/definitions/property_value"
},
{
"type": "array",
"items": {
"$ref": "#/definitions/property_value"
}
}
]
}
},
"required": [
"pid",
"v"
]
}
},
"type_strict": {
"type": "string",
"description": "A classification of the type matching strategy when multiple types are supplied",
"enum": [
"any",
"should",
"all"
]
}
},
"anyOf": [
{
"required": [
"query"
]
},
{
"required": [
"properties"
],
"properties": {
"properties": {
"type": "array",
"minItems": 1
}
}
}
],
"additionalProperties": false
}
}
}
The reconciliation result batch schema can be used to validate the JSON serialization of any reconciliation result batch.
{
"$schema": "http://json-schema.org/schema#",
"$id": "https://reconciliation-api.github.io/specs/0.2/schemas/reconciliation-result-batch.json",
"type": "object",
"description": "This schema can be used to validate the JSON serialization of any reconciliation result batch.",
"patternProperties": {
"^.*$": {
"type": "object",
"properties": {
"result": {
"type": "array",
"items": {
"type": "object",
"properties": {
"id": {
"type": "string",
"description": "Entity identifier of the candidate"
},
"name": {
"type": "string",
"description": "Entity name of the candidate"
},
"description": {
"type": "string",
"description": "Optional description of the candidate entity"
},
"score": {
"type": "number",
"description": "Number indicating how likely it is that the candidate matches the query"
},
"features": {
"type": "array",
"description": "A list of features which can be used to derive a matching score",
"items": {
"type": "object",
"properties": {
"id": {
"type": "string",
"description": "A unique string identifier for the feature"
},
"value": {
"description": "The value of the feature for this reconciliation candidate",
"oneOf": [
{
"type": "boolean"
},
{
"type": "number"
}
]
}
}
}
},
"match": {
"type": "boolean",
"description": "Boolean value indicating whether the candiate is a certain match or not."
},
"type": {
"type": "array",
"description": "Types the candidate entity belongs to",
"items": {
"oneOf": [
{
"type": "object",
"description": "A type can be given by id and name",
"properties": {
"id": {
"type": "string"
},
"name": {
"type": "string"
}
},
"required": [
"id"
]
},
{
"type": "string",
"description": "Alternatively, if only a string is given, it is treated as the id"
}
]
}
}
},
"required": [
"id",
"name",
"score"
]
}
}
},
"required": [
"result"
]
}
}
}
The suggest entities response schema can be used to validate the JSON serialization of any suggest response for entities.
{
"$schema": "http://json-schema.org/schema#",
"$id": "https://reconciliation-api.github.io/specs/0.2/schemas/suggest-entities-response.json",
"type": "object",
"description": "This schema can be used to validate the JSON response of a suggest service for entities.",
"properties": {
"result": {
"type": "array",
"items": {
"type": "object",
"properties": {
"id": {
"type": "string",
"description": "Identifier of the suggested entity"
},
"name": {
"type": "string",
"description": "Name of the suggested entity"
},
"description": {
"type": "string",
"description": "An optional description which can be provided to disambiguate namesakes, providing more context."
},
"notable": {
"type": "array",
"description": "Types the suggest entity belongs to",
"items": {
"oneOf": [
{
"type": "object",
"description": "A type can be given by id and name",
"properties": {
"id": {
"type": "string"
},
"name": {
"type": "string"
}
},
"required": [
"id"
]
},
{
"type": "string",
"description": "Alternatively, if only a string is given, it is treated as the id"
}
]
}
}
},
"required": [
"id",
"name"
]
}
}
},
"required": [
"result"
]
}
The suggest properties response schema can be used to validate the JSON serialization of any suggest response for properties.
{
"$schema": "http://json-schema.org/schema#",
"$id": "https://reconciliation-api.github.io/specs/0.2/schemas/suggest-types-response.json",
"type": "object",
"description": "This schema can be used to validate the JSON response of a suggest service for types.",
"properties": {
"result": {
"type": "array",
"items": { "$ref": "type.json" }
}
},
"required": [
"result"
]
}
The suggest types response schema can be used to validate the JSON serialization of any suggest response for types.
{
"$schema": "http://json-schema.org/schema#",
"$id": "https://reconciliation-api.github.io/specs/0.2/schemas/suggest-types-response.json",
"type": "object",
"description": "This schema can be used to validate the JSON response of a suggest service for types.",
"properties": {
"result": {
"type": "array",
"items": { "$ref": "type.json" }
}
},
"required": [
"result"
]
}
The data extension query schema validates data extension property proposal responses.
{
"$schema": "http://json-schema.org/schema#",
"$id": "https://reconciliation-api.github.io/specs/draft/schemas/data-extension-property-proposal.json",
"type": "object",
"description": "This schema can be used to validate the JSON response of a property proposal endpoint (part of the data extension feature).",
"properties": {
"properties": {
"type": "array",
"items": {
"type": "object",
"properties": {
"id": {
"type": "string",
"description": "Identifier of the suggested property"
},
"name": {
"type": "string",
"description": "Name of the suggested property"
},
"description": {
"type": "string",
"description": "An optional description which can be provided to disambiguate namesakes, providing more context."
}
},
"required": [
"id",
"name"
]
}
},
"type": {
"type": "string",
"description": "The identifier of the type for which those properties are suggested"
},
"limit": {
"type": "number",
"description": "The maximum number of results requested."
}
},
"required": [
"properties"
]
}
The data extension query schema validates data extension queries.
{
"$schema": "http://json-schema.org/schema#",
"$id": "https://reconciliation-api.github.io/specs/0.2/schemas/data-extension-query.json",
"type": "object",
"description": "This schema validates a data extension query",
"properties": {
"ids": {
"type": "array",
"description": "The list of entity identifiers to fetch property values from",
"items": {
"type": "string"
}
},
"properties": {
"type": "array",
"description": "The list of properties to fetch, with their optional configuration",
"items": {
"type": "object",
"properties": {
"id": {
"type": "string"
},
"settings": {
"type": "object"
}
},
"required": [
"id"
]
}
}
},
"required": [
"ids",
"properties"
]
}
The data extension response schema validates data extension responses.
{
"$schema": "http://json-schema.org/schema#",
"$id": "https://reconciliation-api.github.io/specs/0.2/schemas/data-extension-response.json",
"type": "object",
"description": "This schema validates a data extension response",
"properties": {
"meta": {
"type": "array",
"items": {
"type": "object",
"properties": {
"id": {
"type": "string"
},
"name": {
"type": "string"
},
"type": {
"type": "object",
"properties": {
"id": {
"type": "string"
},
"name": {
"type": "string"
}
},
"required": [
"id"
]
},
"service": {
"type": "string",
"format": "uri",
"pattern": "^https?://"
}
},
"required": [
"id",
"name"
]
}
},
"rows": {
"type": "object",
"patternProperties": {
".*": {
"type": "object",
"patternProperties": {
".*": {
"type": "array",
"items": {
"oneOf": [
{
"type": "object",
"additionalProperties": false
},
{
"type": "object",
"properties": {
"id": {
"type": "string"
},
"name": {
"type": "string"
},
"description": {
"type": "string"
}
},
"required": [
"id",
"name"
],
"additionalProperties": false
},
{
"type": "object",
"properties": {
"str": {
"type": "string"
}
},
"required": [
"str"
],
"additionalProperties": false
},
{
"type": "object",
"properties": {
"float": {
"type": "number"
}
},
"required": [
"float"
],
"additionalProperties": false
},
{
"type": "object",
"properties": {
"int": {
"type": "integer"
}
},
"required": [
"int"
],
"additionalProperties": false
},
{
"type": "object",
"properties": {
"date": {
"type": "string",
"description": "Date and time formatted in ISO format",
"pattern": "^(-?(?:[1-9][0-9]*)?[0-9]{4})-(1[0-2]|0[1-9])-(3[01]|0[1-9]|[12][0-9])T(2[0-3]|[01][0-9]):([0-5][0-9]):([0-5][0-9])(\\.[0-9]+)?(Z|[+-](?:2[0-3]|[01][0-9]):[0-5][0-9])?$"
}
},
"required": [
"date"
],
"additionalProperties": false
}
]
}
}
}
}
}
}
},
"required": [
"rows",
"meta"
]
}
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in:
Referenced in: