Dataset Exchange Working Group Charter

The mission of the Dataset Exchange WG is to:

Join the Dataset Exchange Working Group.

Start date 20 January 2020
End date 31 December 2021
Charter Extension See Change History.
Chairs Peter Winstanley, The Scottish Government,
A N Other
Team Contacts Dave Raggett (0.1 FTE)
Meeting Schedule Teleconferences: 1-hour calls will be held weekly
Face-to-face: twice per year, expected to include the W3C's annual Technical Plenary week.

Goals

Sharing data among researchers, governments and citizens, whether openly or not, requires the provision of metadata. Different communities use different metadata standards to describe their datasets, some of which are highly specialized. At a general level W3C’s Data Catalog Vocabulary, DCAT, is in widespread use, but so too are CKAN’s native schema, schema.org's dataset description vocabulary, ISO 19115, DDI, SDMX, CERIF, VoID, INSPIRE and, in the healthcare and life sciences domain, the Dataset Description vocabulary and DATS (ref) among others. This variety is a clear indication that no single vocabulary offers a complete and universally accepted solution.

DCAT has known gaps in coverage, for example around time series and versions. DCAT has been successful and is in wide use, but these gaps must be addressed if usage is to continue to grow across different communities and the variety of metadata schemas is to reduce.

Maximizing interoperability between services such as data catalogs, e-Infrastructures and virtual research environments requires not just the use of standard vocabularies but of application profiles. These define how a vocabulary is used, for example by providing cardinality constraints and/or enumerated lists of allowed values such that data can be validated. The development of several application profiles based on DCAT, such as the European Commission's DCAT-AP is particularly noteworthy in this regard.

Rather than limit the number of metadata standards and application profiles in use, systems should be able to expose and ingest (meta)-data according to multiple standards through transparent and sustainable interfaces. We thus need a mechanism for servers to indicate the available standards and application profiles, and for clients to choose an appropriate one. This leads to the concept of content negotiation by application profile, which is orthogonal to content negotiation by data format and language that is already part of HTTP. A new RFC on this topic currently under development at IETF with input from the Dataset Exchange Working Group, is based on the draft presented at the SDSVoc workshop. The combination of DXWG's definition of what is meant by "application profile", together with the DXWG view of how clients and servers may interact in different ways based on these profiles, together with this external work will provide a powerful means to exchange data in any format (JSON, RDF, XML etc.) according to declared structures against which the data can be validated.

The goals of the working group are to maintain the version 2 of DCAT and extend the standard to version 3 in line with work done to date and the ongoing work on dataset exchange being undertaken by communities more generally, and to develop to a recommendation the work undertaken in the 2017-2019 charter period on content negotiation by profile.

Scope

DCAT is formulated as an RDF vocabulary and is expected to remain so, however, as with all earlier work, the working Group is agnostic about data formats. Methods for expressing DCAT in other (existing) formats are in scope.

Government data, scientific research data, industry/enterprise and cultural heritage data, whether shared openly or not, are all explicitly in scope. The working group will primarily look at cross-domain requirements.

Input Documents

The following documents SHOULD be considered by the Working Group as direct inputs to the specifications to be developed.

Existing Project Materials

In the course of the first charter period of the Dataset Exchange Working Group material was developed which did not get used in the documents published as recommendations and notes, these include various things on the DXWG Github and the project wiki, but particularly those Github issues marked as Future-work.

W3C documents
Non-W3C documents

DCAT must take account of current practice in many different communities. The following list is therefore not exhaustive.

Out of Scope

The Dataset Exchange Working Group will not create application profiles or metadata standards that only apply to very specific domains (such as particle physics, accountancy, oncology etc.)

Success Criteria

In order to advance to Proposed Recommendation, the WG should show that each term in the revised version of DCAT is used in multiple catalogs and related systems. As a minimum, evidence will be adduced that each term has been published and consumed independently at least once, although a higher number is expected for the majority of terms.

For the content negotiation by application profile specification, each fall back mechanism identified by the Working Group is expected to have two independent implementations. The DXWG is not responsible for proving implementations of the RFC defined at IETF.

Each specification should contain a section detailing any known security or privacy implications for implementers, Web authors, and end users.

Each specification should have a testing plan, some guide to help implementers know if they have followed the specification correctly.

Deliverables

Recommendation-track Deliverables

The Working Group will deliver the following W3C normative specifications (titles of the documents are provisional; some documents listed below may be grouped into one document or split into several, constituent documents):

DCAT 3

An update and expansion of the current DCAT Recommendation. The new version may deprecate, but MUST maintain backwards compatibility.

Content Negotiation by Application Profile

An explanation of how to implement the expected RFC and suitable fallback mechanisms as discussed at the SDSVoc workshop.

Other Deliverables

Other non-normative documents may be created such as:

  • A use case and requirements document
  • A test suite for content negotiation by application profile
  • A primer on the uses of DCAT
  • Guidance on publishing application profiles of vocabularies.
  • Subject to its capacity, the working group may choose to develop additional relevant vocabularies in response to community demand.

Timeline - Expected Delivery Dates

  • Define Github milestones based on the backlog use cases and requirements (UCR) from DCAT2 and also considering any new use cases from the community, which will be distilled into a UCR document to guide the work on DCAT3 and subsequent revisions - Q1-2 2020
  • Get a first public working draft (FPWD) for DCAT 3 - Q3-4 2020
  • Get to Candidate Recommendation (CR) the content negotiation by application profile work - Q1 2020
  • Get to CR all Recommendation Track documents - Q4 2021

Transition to Evergreen Standards Process

It is expected that within the charter period the deliverables will enter the W3C "evergreen" standards process and procedures and deliverables will transition at an appropriate time.

Coordination

For all specifications, this Working Group will seek horizontal review for accessibility, internationalization, performance, privacy, and security with the relevant Working and Interest Groups, and with the TAG. Invitation for review must be issued during each major standards-track document transition, including FPWD and CR, and should be issued when major changes occur in a specification.

Additional technical coordination with the following Groups will be made, per the W3C Process Document:

W3C Groups

Internationalization Activity

Ensure that multilinguality concerns continue to be properly reflected in DCAT revision.

Privacy Interest Group

Ensure that privacy concerns are addressed, for example, if a dataset includes personally identifiable information.

Web Application Security Working Group

In particular concerning the conneg by application profile spec, ensuring that no security vulnerabilities are introduced.

Shape Expressions Community Group

The work of this CG is of direct relevance to the concept of application profiles.

The RDF Data Shapes Working Group

This WG is expected to have completed its work shortly after the DXWG is formed, however, efforts should be made to liaise with its community.

schema.org for datasets Community Group

This CG is clearly of high relevance to the DXWG

Open Digital Rights Language (ODRL) Community Group

Ensure that the mechanisms being standardized by the ODRL Community Group for machine readable permissions, obligations, licenses, rights etc. are given due consideration.

External Organizations

European Commission's ISA Programme

This is the body responsible for interoperability across the EU and whose outputs include various application profiles of DCAT such as DCAT-AP, GeoDCAT-AP and StatDCAT-AP.

Research Data Alliance

Many of the issues raised in the DXWG are of direct relevance to work at the RDA around metadata, citation and more. It is important to align and not duplicate effort.

GO-FAIR, FORCE11, CODATA (the International Science Council's Committee on Data) and FAIRsharing.org

Findability of data assets is potentially made much easier if there are adequate and interoperable catalogs. As this is a key goal of the FAIR Principles [FAIR] we need to coordinate with communities such as FORCE11, CODATA, GO-FAIR and FAIRsharing.org.

EPOS, the European Plate Observing System

EPOS use DCAT as the basis for their dataset catalog application profile. Their datasets bring together information on geo-hazards and those geodynamic phenomena (including geo-resources) relevant to the environment and human welfare. Datasets come from a wide, international community and so are an important route for the working group to gather input and feedback on its deliverables.

Participation

To be successful, this Working Group is expected to have 6 or more active participants for its duration, including representatives from key implementors and users (e.g, governments and research data managers) of this specification, and active Editors. The Chairs, specification Editors, and Test Leads are expected to contribute half of a day per week towards the Working Group. There is no minimum requirement for other Participants.

The group encourages questions, comments and issues on its public mailing lists and document repositories, as described in Communication.

The group also welcomes non-Members to contribute technical submissions for consideration upon their agreement to the terms of the W3C Patent Policy.

Communication

Technical discussions for this Working Group are conducted in public: the meeting minutes from teleconference and face-to-face meetings will be archived for public review, and technical discussions and issue tracking will be conducted in a manner that can be both read and written to by the general public. Working Drafts and Editor's Drafts of specifications will be developed on a public repository, and may permit direct public contribution requests. The meetings themselves are not open to public participation, however.

Information about the group (including details about deliverables, issues, actions, status, participants, and meetings) will be available from the Dataset Exchange Working Group home page.

This group primarily conducts its technical work: on the public mailing list public-dxwg-wg@w3.org (archive). The public is invited to review, discuss and contribute to this work.

The group may use a Member-confidential mailing list for administrative purposes and, at the discretion of the Chairs and members of the group, for member-only discussions in special cases when a participant requests such a discussion.

Decision Policy

This group will seek to make decisions through consensus and due process, per the W3C Process Document (section 3.3). Typically, an editor or other participant makes an initial proposal, which is then refined in discussion with members of the group and other reviewers, and consensus emerges with little formal voting being required.

However, if a decision is necessary for timely progress, but consensus is not achieved after careful consideration of the range of views presented, the Chairs may call for a group vote, and record a decision along with any objections.

To afford asynchronous decisions and organizational deliberation, any resolution (including publication decisions) taken in a face-to-face meeting or teleconference will be considered provisional. A call for consensus (CfC) will be issued for all resolutions (for example, via email and/or web-based survey), with a response period from one week to 10 working days, depending on the chair's evaluation of the group consensus on the issue. If no objections are raised on the mailing list by the end of the response period, the resolution will be considered to have consensus as a resolution of the Working Group.

All decisions made by the group should be considered resolved unless and until new information becomes available, or unless reopened at the discretion of the Chairs or the Director.

This charter is written in accordance with the W3C Process Document (Section 3.4, Votes), and includes no voting procedures beyond what the Process Document requires.

Patent Policy

This Working Group operates under the W3C Patent Policy (5 February 2004 Version). To promote the widest adoption of Web standards, W3C seeks to issue Recommendations that can be implemented, according to this policy, on a Royalty-Free basis. For more information about disclosure obligations for this group, please see the W3C Patent Policy Implementation.

Licensing

This Working Group will use the W3C Document license for all its deliverables.

About this Charter

This charter has been created according to section 5.2 of the Process Document. In the event of a conflict between this document or the provisions of any charter and the W3C Process, the W3C Process shall take precedence.

Charter History

The following table lists details of all changes from the initial charter, per the W3C Process Document (section 5.2.3):

Charter Period Start Date End Date Changes
Initial Charter .. .. ..
Charter Extension .. .. ..

Bibliography

Informative references

[FAIR]
The FAIR Guiding Principles for scientific data management and stewardship. Mark D. Wilkinson et al. Nature. Scientific Data, vol. 3, Article nr. 160018. URL: https://doi.org/10.1038/sdata.2016.18