This document reports on evidence and implementations of the Data on the Web Best Practices Candidate Recommendation. In particular, it demonstrates that the DWBP are already in use and are also implementable.
One of the main goals of the Data on the Web Best Practices (DWBP) is to facilitate interaction between publishers and consumers of data on the Web. A set of 35 Best Practices were created to cover different challenges related to data publishing and consumption, such as Metadata, Data licenses, Data provenance, Data quality, Data versioning, Data identification, Data formats, Data vocabularies, Data access and APIs, Data preservation, Feedback, Data enrichment and Data republication.
To show that the DWBP are implementable as well broadly adopted and referenced by well-known organizations, we collected evidence in the form of datasets, data portals, documents, references and guidelines (Section 2). We used two forms to collect this evidence: (DWBP evidence form and DWBP template form). The results are summarized in this report.
Besides the results collected from the surveys, in order to strengthen the DWBP adoption evidence, we also present our evaluation of how DWBP are currently being adopted by the major data catalog solutions, including CKAN, Socrata, DKAN, JUNAR, ArcGIS Open Data and OPENDATASOFT (Section 3). Finally, we also present some examples to illustrate that each one of the DWBP is implementable (Section 4).
We followed the steps described below to collect evidence for the DWBP: As noted, to have a broader coverage of the DWBP adoption we considered different types of evidence:
We followed the steps described below to collect evidence for the DWBP:
As noted, to have a broader coverage of the DWBP adoption we considered different types of evidence:
As described in the DWBP charter, to move on to Proposed Recommendation, evidence will be adduced in order to demonstrate that each of the best practices has been recommended or adopted in at least two environments, such as data portals and formal policies. Evidence of implementation was gathered from existing datasets and data portals, which already implement the proposed best practices, as well as from national or sector-specific guidelines that reference the DWBP and documents available on the Web.
The table below shows the evidence collected for each one of the DWBP.
The following table shows organizations and implementers that contributed with DWBP evidence in the form of Datasets, Data Portals and Vocabularies.
* This column indicates if a data catalog solution is used to provide the data. The data catalog can be based on an existing solution like CKAN or can be a proprietary one.
The following table shows organizations and implementers that contributed with DWBP evidence in the form Documents and References.
The following table shows organizations and implementers that contributed with DWBP evidence in the form Guidelines.
One of our main concerns when we started to collect evidence for each one of the DWBP was to have implementations from well-known organizations as well as high profile datasets and data portals worldwide, like DBpedia, Data.gov.uk, Data.gov and World Bank. Analyzing the tables presented in the previous section, we can say that we accomplished this goal. The DWBP evidence were collected from well-known organizations and projects including the ones mentioned before as well as BBC, Twitter, Europeana, Pacific Northwest National Laboratory and OpenStreetMaps. Considering the geographical coverage, we collected implementations from several countries, including Brazil, France, Ireland, New Zealand, Spain, UK, USA and Italy. It is also important to notice that evidence in the form of guidelines concerns several governmental organizations from Europe. Other important characteristic from the DWBP implementations is their broad domain coverage, e.g. they refer to different domains, like Government, Environment and Healthcare, as described in the graphic below.
As we can observe in the graphic below, there is a broad adoption of DWBP related to Metadata (BP1 and BP2), Data Licenses (BP4), Data Identification (BP9 and BP10), Data Formats (BP12 and BP14), Vocabularies (BP15 and BP16), Data Access (BP23, BP24, BP25 and BP26) and Feedback (BP29). On the other hand, for others, such as Preserve identifiers (BP27), Assess dataset coverage (BP28), Provide real-time access (BP20) and Provide an explanation for data that is not available (BP22), collection of evidence was more difficult, especially related to datasets and data portals. This can be justified by comments received during the evidence gathering process and also available in the DWBP evidence form. Bill Roberts from the SWIRRL, for example, made the following comment about one of the Data Preservation best practices: "Too difficult to test in a meaningful way. In this system, no datasets have yet been taken offline, so the archiving process has not been developed." In the same way, he made a comment about the Best Practice Provide real-time access: "The system does not currently hold dataset collected in 'real time'. Generally the data is statistical in nature and goes through a slower collection and processing cycle."
In this section we present some more evidence that shows the adoption of the DWBP. Rather than specific datasets or data portals, we use the following data catalog solutions as evidence: CKAN, Socrata, DKAN, JUNAR, ArcGIS Open Data and OPENDATASOFT. For each one of the DWBP, we show the list of data catalog solutions that implement it.
As we may notice, there is no evidence for some of the DWBP. This happens because these Best Practices do not concern the solution used for making the data available on the Web, e.g. the data catalog solution, as explained below.
Concerning BP27 none of the data catalog solutions implement it. In general, when a dataset is not available then just a 404 error message is returned.
Some Best Practices related to metadata are partially implemented by the data catalog solutions. Note that almost all data catalog solutions are compatible with DCAT, which means that metadata covered by DCAT may be completely or partially available both in human-readable and machine-readable formats. In general, it means that just a human-readable or a machine-readable version of the metadata is available, as detailed in the following.
As a general analysis with regards to the Data on the Web Challenges, we can say that Metadata, Data Licenses and Data Formats challenges are a main concern of the data catalog solutions. The Data Access challenge has also been recognized as an important one except when it concerns real-time data. The use of Data Access APIs is a consensus. The major data catalog solutions also deal with the Data Identification challenge, however just part of the problem has been solved. The Data Vocabularies challenge has also been considered as an important one since data catalog solutions reuse existing vocabularies, e.g. DCAT, when publishing metadata about the data catalogs. Other challenges like Data Provenance, Data Versioning and Feedback have been superficially dealt with in the data catalog solutions. In general, Data Quality, Data Preservation, Data Enrichment and Data Republications are challenges still not explored by the major data catalog solutions.
The following list shows the set of best practices linked to the DWBP document:
The editors gratefully acknowledge the contributions made to gathering evidence for the DWBP by all members of the working group. Especially Annette Greiner, Antoine Isaac, Carlos Laufer, Christophe Guéret, Deirdre Lee, Eric Stephan, Makx Dekkers, Martin Alvarez-Espinar, Peter Winstanley, Phil Archer and Riccardo Albertoni.
The editors would also like to thank evidences received from Bill Roberts, Christophe Guéret, Diogo Cortiz, Fábio Rodrigues, Eduardo Rodrigues Vasconcelos, Gregor Boyd, Herbert Van de Sompel, Jefferson Rafael Silva, João Victor Pacheco Dias, José Marcio Martins Junior, Laura Manley, Markus Freudenberg, Milos Jovanovik, Rafael Sá Anselmo, Reinaldo Ferraz and Williams Alcântara.