DRAFT Dataset Exchange Working Group Charter

The mission of the Dataset Exchange Working Group is to:

Join the Dataset Exchange Working Group.

This proposed charter is available on GitHub. Feel free to raise issues.

Charter Status See the group status page and detailed change history.
Start date [dd monthname yyyy] (date of the "Call for Participation", when the charter is approved)
End date [dd monthname yyyy] (date of the "Call for Participation", when the charter is approved)
Chairs Franck Cotton, Making Sense
Albert Meroño-Peñuela, King's College London
Team Contacts Pierre-Antoine Champin (0.1 FTE)
Meeting Schedule Teleconferences: 1-hour calls will be held every other week
Face-to-face: face-to-face meetings may be scheduled by consent of the participants, usually no more than 1 per year.

Motivation and Background

Sharing data among researchers, governments and citizens, whether openly or not, requires the provision of metadata. Different communities use different metadata standards to describe their datasets, some of which are highly specialized. At a general level W3C’s Data Catalog Vocabulary, DCAT, is in widespread use, but so too are CKAN’s native schema, schema.org's dataset description vocabulary, ISO 19115, DDI, SDMX, CERIF, VoID, INSPIRE and, in the healthcare and life sciences domain, the Dataset Description vocabulary and DATS (ref) among others. The machine learning, AI, and scientific data communities have recently proposed Croissant and RO-Crates for similar purposes. This variety is a clear indication that no single vocabulary offers a complete and universally accepted solution.

A success of a previous WG charter was the publication of DCAT-3, which evolved through multiple drafts to improve clarity, extend functionality, and align with best practices. In particular, it introduced dataset series as first-class entities, refined versioning and inverse property handling, enhanced multilingual and accessibility support, improved alignment with external standards, and strengthened guidance on security, privacy, and metadata usage.

However, despite all those improvements, DCAT has known gaps in coverage, for example in describing the variables encoded within datasets in an explicitily and semantically interoperable manner. Variables are a fundamental concept in both dataset design and dataset search, partly covered in existing recommendations like the Metadata Vocabulary for Tabular Data (used to annotate tabular data) and the RDF Data Cube Vocabulary, QB, (used to describe statistical data and time series). While these recommendations have also been successful and are in wide use, this gap must be addressed if usage is to continue to grow across different communities and the variety of metadata schemas is to be reduced.

One goal of this Working Group is to develop a new model for machine-actionable and interoperable dataset variable specifications, extending DCAT and QB for compatibility. Another goal is to maintain the version 3 of DCAT and the QB, as these vocabularies are key components of the W3C's approach to dataset exchange and their maintenance is essential for ensuring their relevance and usefulness to the community.

Scope

DCAT and QB are formulated as RDF vocabularies and are expected to remain so. However, as with all earlier work, the Working Group is agnostic about data formats. Methods for expressing DCAT or QB in other (existing) formats are in scope.

Government data, scientific research data, industry/enterprise and cultural heritage data, whether shared openly or not, are all explicitly in scope. The Working Group will primarily look at cross-domain requirements.

Input Documents

The following documents SHOULD be considered by the Working Group as direct inputs to the specifications to be developed.

Existing Project Materials

Material from the DXWG Github repository and the project wiki that have not yet been included in the Working Group's deliverables. In particular, Github issues marked as Future-work.

W3C documents

Non-W3C documents

The WG must take account of current practice in many different communities. The following list is therefore not exhaustive.

Out of Scope

The following features are out of scope, and will not be addressed by this Working group.

The Dataset Exchange Working Group will not create application profiles or metadata standards that only apply to very specific domains (such as particle physics, accountancy, oncology etc.)

Deliverables

Updated document status is available on the group publication status page.

Draft state indicates the state of the deliverable at the time of the charter approval. Expected completion indicates when the deliverable is projected to become a Recommendation, or otherwise reach a stable state.

Normative Specifications

The Working Group will deliver the following W3C normative specifications:

Data Catalog Vocabulary (DCAT) - Version 3

DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web. This document defines the schema and provides examples for its use.

Maintenance of the current DCAT-3 Recommendation.

Draft state: Maintenance

Expected completion: Not specified

RDF Data Cube Vocabulary (QB)

The Data Cube vocabulary provides means to publish multi-dimensional data, such as statistics, on the web in such a way that it can be linked to related data sets and concepts. This document defines the schema and provides examples for its use.

Maintenance of the current QB Recommendation

Draft state: Maintenance

Expected completion: Not specified

Content Negotiation by Profile

This document describes how Internet clients may negotiate for content provided by servers based on data profiles to which the content conforms. This is distinct from negotiating by Media Type or Language: a profile may specify the content of information returned, which may be a subset of the information the responding server has about the requested resource, and may be structured in a specific way to meet interoperability requirements of a community of practice.

Draft state: Working Draft

Expected completion: Not specified

Adopted Draft: Content Negotiation by Profile, W3C Working Draft 26 November 2019

Exclusion Draft: Content Negotiation by Profile W3C First Public Working Draft 18 December 2018. associated Call for Exclusion on 2018-12-18 ended on 2019-05-17
Exclusion Draft Charter: https://www.w3.org/2017/dxwg/charter

Other Deliverables

Other non-normative documents may be created such as:

  • A specification for the variable description vocabulary (VVD)
  • A use case and requirements document for the use of VVD, possibly in combination with related standards.
  • Technical notes on bindings between VVD and other related W3C and non-W3C standards (DDI-CDI, RDF Data Cube, COW, SSN/SOSA, R2ML, etc)
  • A test suite for content negotiation by application profile
  • Guidance on publishing application profiles of vocabularies.
  • Subject to its capacity, the working group may choose to develop additional relevant vocabularies in response to community demand.

Timeline

  • September 2025: First teleconference
  • November 2025: First face-to-face meeting
  • April 2026: Requirements and Use Cases for VVD
  • August 2026: FPWD for VVD
  • November 2026: Second face-to-face meeting
  • April 2027: Bindings between VVD and other W3C/non-W3C specifications
  • August 2027: Guidance on publishing variable descriptions with VVD

Success Criteria

In order to advance to Proposed Recommendation, each normative specification is expected to have at least two independent interoperable implementations of every feature defined in the specification, where interoperability can be verified by passing open test suites.

There should be testing plans for each specification, starting from the earliest drafts. For vocabularies such as DCAT, QB or VVD, the WG should show that each term is used in multiple catalogs and related systems. As a minimum, evidence will be adduced that each term has been published and consumed independently at least once, although a higher number is expected for the majority of terms.

To promote interoperability, all changes made to specifications in Candidate Recommendation or to features that have deployed implementations should have tests. Testing efforts should be conducted via the Web Platform Tests project.

Each specification should contain separate sections detailing all known security and privacy implications for implementers, Web authors, and end users.

This Working Group expects to follow the TAG Web Platform Design Principles.

Coordination

For all specifications, this Working Group will seek horizontal review for accessibility, internationalization, privacy, and security with the relevant Working and Interest Groups, and with the TAG. Invitation for review must be issued during each major standards-track document transition, including FPWD. The Working Group is encouraged to engage collaboratively with the horizontal review groups throughout development of each specification. The Working Group is advised to seek a review at least 3 months before first entering CR and is encouraged to proactively notify the horizontal review groups when major changes occur in a specification following a review.

Additional technical coordination with the following Groups will be made, per the W3C Process Document:

W3C Groups

Data Shapes Working Group

The work of this WG produces the W3C Shapes Constraint Language Recommendation. Efforts should be made to liaise with its community.

schema.org for datasets Community Group

This CG is clearly of high relevance to the DXWG

JSON-LD Working Group

JSON-LD is a key technology for the publication of datasets on the Web, and so the DXWG should coordinate with the JSON-LD WG to ensure that any new features in DCAT, QB or VVD are can be easily represented with JSON-LD.

RDF & SPARQL Working Group

DXWG should coordinate with the RDF and SPARQL specifications to ensure that any new features in DCAT, QB or VVD are can be easily represented with RDF 1.2 and queried in SPARQL.

Web Machine Learning Working Group

Datasets that use DCAT, QB or VVD should be easily consumed in ML workflows and used for inference.

Open Digital Rights Language (ODRL) Community Group

Ensure that the mechanisms of the W3C ODRL Recommendation being maintained by the ODRL Community Group for machine readable permissions, obligations, licenses, rights etc. are given due consideration.

Note: Do not list horizontal groups here, only specific WGs relevant to your work.

Note: Do not bury normative text inside the liaison section. Instead, put it in the scope section.

External Organizations

DDI Alliance

The DDI Alliance is an international membership organization that creates and maintains technical standards for describing research data in the social, demographic, economic, and health sciences such as DDI Cross-Domain Integration (DDI-CDI).

MLCommons

MLCommons is an Artificial Intelligence engineering consortium, built on a philosophy of open collaboration to improve AI systems by proposing standards such as Croissant.

European Commission's ISA Programme

This is the body responsible for interoperability across the EU and whose outputs include various application profiles of DCAT such as DCAT-AP, GeoDCAT-AP and StatDCAT-AP.

Research Data Alliance

Many of the issues raised in the DXWG are of direct relevance to work at the RDA around metadata, citation and more. It is important to align and not duplicate effort.

GO-FAIR, FORCE11, CODATA (the International Science Council's Committee on Data) and FAIRsharing.org

Findability of data assets is potentially made much easier if there are adequate and interoperable catalogs. As this is a key goal of the FAIR Principles [FAIR] we need to coordinate with communities such as FORCE11, CODATA, GO-FAIR and FAIRsharing.org.

EPOS, the European Plate Observing System

EPOS use DCAT as the basis for their dataset catalog application profile. Their datasets bring together information on geo-hazards and those geodynamic phenomena (including geo-resources) relevant to the environment and human welfare. Datasets come from a wide, international community and so are an important route for the working group to gather input and feedback on its deliverables.

Participation

To be successful, this Working Group is expected to have 6 or more active participants for its duration, including representatives from the key implementors of this specification, and active Editors and Test Leads for each specification. The Chairs, specification Editors, and Test Leads are expected to contribute half of a working day per week towards the Working Group. There is no minimum requirement for other Participants.

The group encourages questions, comments and issues on its public mailing lists and document repositories, as described in Communication.

The group also welcomes non-Members to contribute technical submissions for consideration upon their agreement to the terms of the W3C Patent Policy.

Participants in the group are required (by the W3C Process) to follow the W3C Code of Conduct.

Communication

Technical discussions for this Working Group are conducted in public: the meeting minutes from teleconference and face-to-face meetings will be archived for public review, and technical discussions and issue tracking will be conducted in a manner that can be both read and written to by the general public. Working Drafts and Editor's Drafts of specifications will be developed in public repositories and may permit direct public contribution requests. The meetings themselves are not open to public participation, however.

Information about the group (including details about deliverables, issues, actions, status, participants, and meetings) will be available from the Dataset Exchange Working Group home page.

Most Dataset Exchange Working Group teleconferences will focus on discussion of particular specifications, and will be conducted on an as-needed basis.

This group primarily conducts its technical work: on the public mailing list public-dxwg-wg@w3.org (archive) and on GitHub issues. The public is invited to review, discuss and contribute to this work.

The group may use a Member-confidential mailing list for administrative purposes and, at the discretion of the Chairs and members of the group, for member-only discussions in special cases when a participant requests such a discussion.

Decision Policy

This group will seek to make decisions through consensus and due process, per the W3C Process Document (section 5.2.1, Consensus). Typically, an editor or other participant makes an initial proposal, which is then refined in discussion with members of the group and other reviewers, and consensus emerges with little formal voting being required.

However, if a decision is necessary for timely progress and consensus is not achieved after careful consideration of the range of views presented, the Chairs may call for a group vote and record a decision along with any objections.

To afford asynchronous decisions and organizational deliberation, any resolution (including publication decisions) taken in a face-to-face meeting or teleconference will be considered provisional. A call for consensus (CfC) will be issued for all resolutions (for example, via email, GitHub issue or web-based survey), with a response period from one week to 10 working days, depending on the chair's evaluation of the group consensus on the issue. If no objections are raised by the end of the response period, the resolution will be considered to have consensus as a resolution of the Working Group.

All decisions made by the group should be considered resolved unless and until new information becomes available or unless reopened at the discretion of the Chairs.

This charter is written in accordance with the W3C Process Document (Section 5.2.3, Deciding by Vote) and includes no voting procedures beyond what the Process Document requires.

Patent Policy

This Working Group operates under the W3C Patent Policy (Version of 15 September 2020). To promote the widest adoption of Web standards, W3C seeks to issue Web specifications that can be implemented, according to this policy, on a Royalty-Free basis. For more information about disclosure obligations for this group, please see the licensing information.

Licensing

This Working Group will use the W3C Software and Document license for all its deliverables.

About this Charter

This charter has been created according to section 3.4 of the Process Document. In the event of a conflict between this document or the provisions of any charter and the W3C Process, the W3C Process shall take precedence.

Charter History

The following table lists details of all changes from the initial charter, per the W3C Process Document (section 4.3, Advisory Committee Review of a Charter):

Charter Period Start Date End Date Changes
Initial Charter 2017-05-05 2019-06-30 -
Charter Extension 2019-07-01 2019-12-31 -
Charter Extension 2020-01-01 2020-01-31 -
Rechartered 2020-02-05 2022-01-31

DCAT 3, new document license

Charter Extension 2022-02-05 2022-06-30 -
Rechartered 2022-06-28 2024-06-30 Use Patent Policy 2020. Changed in Team FTE (+0.1%)
Charter Extension 2024-07-01 2024-12-31 -
Charter Extension 2025-01-01 2025-06-30 -
Rechartered TBD TBD

Change log

Changes to this document are documented in this section.