Dataset Exchange Working Group Charter
The mission of the Dataset Exchange WG is to:
- Maintain and revise the Data Catalog Vocabulary, DCAT, taking into account feature requests from the DCAT user community.
- Define and publish guidance on the specification and use of application profiles when requesting and serving data on the Web.
Start date | 20 January 2020 |
---|---|
End date | 31 December 2021 |
Charter Extension | See Change History. |
Chairs | Peter Winstanley, The Scottish Government, Caroline Burle (NIC.br - Brazilian Network Information Center) |
Team Contacts | Philippe Le Hegaret (0.05 FTE) |
Meeting Schedule | Teleconferences: 1-hour calls will be held weekly Face-to-face: twice per year, expected to include the W3C's annual Technical Plenary week. |
Goals
Sharing data among researchers, governments and citizens, whether openly or not, requires the provision of metadata. Different communities use different metadata standards to describe their datasets, some of which are highly specialized. At a general level W3C’s Data Catalog Vocabulary, DCAT, is in widespread use, but so too are CKAN’s native schema, schema.org's dataset description vocabulary, ISO 19115, DDI, SDMX, CERIF, VoID, INSPIRE and, in the healthcare and life sciences domain, the Dataset Description vocabulary and DATS (ref) among others. This variety is a clear indication that no single vocabulary offers a complete and universally accepted solution.
DCAT has known gaps in coverage, for example around time series and versions. DCAT has been successful and is in wide use, but these gaps must be addressed if usage is to continue to grow across different communities and the variety of metadata schemas is to reduce.
Maximizing interoperability between services such as data catalogs, e-Infrastructures and virtual research environments requires not just the use of standard vocabularies but of application profiles. These define how a vocabulary is used, for example by providing cardinality constraints and/or enumerated lists of allowed values such that data can be validated. The development of several application profiles based on DCAT, such as the European Commission's DCAT-AP is particularly noteworthy in this regard.
Rather than limit the number of metadata standards and application profiles in use, systems should be able to expose and ingest (meta)-data according to multiple standards through transparent and sustainable interfaces. We thus need a mechanism for servers to indicate the available standards and application profiles, and for clients to choose an appropriate one. This leads to the concept of content negotiation by application profile, which is orthogonal to content negotiation by data format and language that is already part of HTTP. A new Internet Draft on profile negotiation currently under development at IETF with input from the Dataset Exchange Working Group, is based on the draft presented at the SDSVoc workshop. The combination of DXWG's definition of what is meant by "application profile", together with the DXWG view of how clients and servers may interact in different ways based on these profiles, together with this external work will provide a powerful means to exchange data in any format (JSON, RDF, XML etc.) according to declared structures against which the data can be validated.
The goals of the working group are to maintain the version 2 of DCAT and extend the standard to version 3 in line with work done to date and the ongoing work on dataset exchange being undertaken by communities more generally, and to develop to a recommendation the work undertaken in the 2017-2019 charter period on content negotiation by profile.
Scope
DCAT is formulated as an RDF vocabulary and is expected to remain so, however, as with all earlier work, the working Group is agnostic about data formats. Methods for expressing DCAT in other (existing) formats are in scope.
Government data, scientific research data, industry/enterprise and cultural heritage data, whether shared openly or not, are all explicitly in scope. The working group will primarily look at cross-domain requirements.
Input Documents
The following documents SHOULD be considered by the Working Group as direct inputs to the specifications to be developed.
Existing Project Materials
In the course of the first charter period of the Dataset Exchange Working Group material was developed which did not get used in the documents published as recommendations and notes, these include various things on the DXWG Github and the project wiki, but particularly those Github issues marked as Future-work.
W3C documents
- DCAT version 2 and the HCLS Community profile
- The Data Quality and Dataset Usage vocabularies
- The Smart Data & Smarter Descriptions (SDSVoc) workshop report, in particular the section on content negotiation by application profile.
- Data on the Web Best Practices
Non-W3C documents
DCAT must take account of current practice in many different communities. The following list is therefore not exhaustive.
- DCAT-AP and related work, such as DCAT-AP-NO, DCAT-AP-IT, GeoDCAT-AP, DCAT-AP.de etc.
- schema.org's dataset description vocabulary;
- Other related vocabularies such as CERIF (the Common European Research Information Format), DBpedia DataID, DDI (the Data Documentation Initiative), DataCite, Hypercat etc.
- The FAIR Principles (a community-based movement to make data assets findable, accessible, interoperable and reusable).
- Description Set Profiles (constraint language for Dublin Core Application Profiles).
Out of Scope
The Dataset Exchange Working Group will not create application profiles or metadata standards that only apply to very specific domains (such as particle physics, accountancy, oncology etc.)
Success Criteria
In order to advance to Proposed Recommendation, the WG should show that each term in the revised version of DCAT is used in multiple catalogs and related systems. As a minimum, evidence will be adduced that each term has been published and consumed independently at least once, although a higher number is expected for the majority of terms.
For the content negotiation by application profile specification, each fall back mechanism identified by the Working Group is expected to have two independent implementations. The DXWG is not responsible for proving implementations of the RFC defined at IETF.
Each specification should contain a section detailing any known security or privacy implications for implementers, Web authors, and end users.
Each specification should have a testing plan, some guide to help implementers know if they have followed the specification correctly.
Deliverables
Recommendation-track Deliverables
The Working Group will deliver the following W3C normative specifications (titles of the documents are provisional; some documents listed below may be grouped into one document or split into several, constituent documents):
- DCAT 3
An update and expansion of the current DCAT Recommendation. The new version may deprecate, but MUST maintain backwards compatibility.
- Content Negotiation by Application Profile
An explanation of how to implement the expected RFC and suitable fallback mechanisms as discussed at the SDSVoc workshop.
Other Deliverables
Other non-normative documents may be created such as:
- A use case and requirements document
- A test suite for content negotiation by application profile
- A primer on the uses of DCAT
- Guidance on publishing application profiles of vocabularies.
- Subject to its capacity, the working group may choose to develop additional relevant vocabularies in response to community demand.
Timeline - Expected Delivery Dates
- Define Github milestones based on the backlog use cases and requirements (UCR) from DCAT2 and also considering any new use cases from the community, which will be distilled into a UCR document to guide the work on DCAT3 and subsequent revisions - Q1-2 2020
- Get a first public working draft (FPWD) for DCAT 3 - Q3-4 2020
- Get to Candidate Recommendation (CR) the content negotiation by application profile work - Q1 2020
- Get to CR all Recommendation Track documents - Q4 2021
Transition to Evergreen Standards Process
The participants of the Working Group look forward to moving this work to an "Evergreen standard" model as is being discussed for Process 2020.
Coordination
For all specifications, this Working Group will seek horizontal review for accessibility, internationalization, performance, privacy, and security with the relevant Working and Interest Groups, and with the TAG. Invitation for review must be issued during each major standards-track document transition, including FPWD and CR, and should be issued when major changes occur in a specification.
Additional technical coordination with the following Groups will be made, per the W3C Process Document:
W3C Groups
- Internationalization Activity
Ensure that multilinguality concerns continue to be properly reflected in DCAT revision.
- Privacy Interest Group
Ensure that privacy concerns are addressed, for example, if a dataset includes personally identifiable information.
- Web Application Security Working Group
In particular concerning the conneg by application profile spec, ensuring that no security vulnerabilities are introduced.
- Shape Expressions Community Group
The work of this CG is of direct relevance to the concept of application profiles.
- SHACL Community Group
This CG is continuing work on the W3C Shapes Constraint Language Recommendation. Efforts should be made to liaise with its community.
- schema.org for datasets Community Group
This CG is clearly of high relevance to the DXWG
- Open Digital Rights Language (ODRL) Community Group
Ensure that the mechanisms of the W3C ODRL Recommendation being maintained by the ODRL Community Group for machine readable permissions, obligations, licenses, rights etc. are given due consideration.
External Organizations
- European Commission's ISA Programme
-
This is the body responsible for interoperability across the EU and whose outputs include various application profiles of DCAT such as DCAT-AP, GeoDCAT-AP and StatDCAT-AP.
- Research Data Alliance
Many of the issues raised in the DXWG are of direct relevance to work at the RDA around metadata, citation and more. It is important to align and not duplicate effort.
- GO-FAIR, FORCE11, CODATA (the International Science Council's Committee on Data) and FAIRsharing.org
Findability of data assets is potentially made much easier if there are adequate and interoperable catalogs. As this is a key goal of the FAIR Principles [FAIR] we need to coordinate with communities such as FORCE11, CODATA, GO-FAIR and FAIRsharing.org.
- EPOS, the European Plate Observing System
EPOS use DCAT as the basis for their dataset catalog application profile. Their datasets bring together information on geo-hazards and those geodynamic phenomena (including geo-resources) relevant to the environment and human welfare. Datasets come from a wide, international community and so are an important route for the working group to gather input and feedback on its deliverables.
Participation
To be successful, this Working Group is expected to have 6 or more active participants for its duration, including representatives from key implementors and users (e.g, governments and research data managers) of this specification, and active Editors. The Chairs, specification Editors, and Test Leads are expected to contribute half of a day per week towards the Working Group. There is no minimum requirement for other Participants.
The group encourages questions, comments and issues on its public mailing lists and document repositories, as described in Communication.
The group also welcomes non-Members to contribute technical submissions for consideration upon their agreement to the terms of the W3C Patent Policy.
Communication
Technical discussions for this Working Group are conducted in public: the meeting minutes from teleconference and face-to-face meetings will be archived for public review, and technical discussions and issue tracking will be conducted in a manner that can be both read and written to by the general public. Working Drafts and Editor's Drafts of specifications will be developed on a public repository, and may permit direct public contribution requests. The meetings themselves are not open to public participation, however.
Information about the group (including details about deliverables, issues, actions, status, participants, and meetings) will be available from the Dataset Exchange Working Group home page.
This group primarily conducts its technical work: on the public mailing list public-dxwg-wg@w3.org (archive). The public is invited to review, discuss and contribute to this work.
The group may use a Member-confidential mailing list for administrative purposes and, at the discretion of the Chairs and members of the group, for member-only discussions in special cases when a participant requests such a discussion.
Decision Policy
This group will seek to make decisions through consensus and due process, per the W3C Process Document (section 3.3). Typically, an editor or other participant makes an initial proposal, which is then refined in discussion with members of the group and other reviewers, and consensus emerges with little formal voting being required.
However, if a decision is necessary for timely progress, but consensus is not achieved after careful consideration of the range of views presented, the Chairs may call for a group vote, and record a decision along with any objections.
To afford asynchronous decisions and organizational deliberation, any resolution (including publication decisions) taken in a face-to-face meeting or teleconference will be considered provisional. A call for consensus (CfC) will be issued for all resolutions (for example, via email and/or web-based survey), with a response period from one week to 10 working days, depending on the chair's evaluation of the group consensus on the issue. If no objections are raised on the mailing list by the end of the response period, the resolution will be considered to have consensus as a resolution of the Working Group.
All decisions made by the group should be considered resolved unless and until new information becomes available, or unless reopened at the discretion of the Chairs or the Director.
This charter is written in accordance with the W3C Process Document (Section 3.4, Votes), and includes no voting procedures beyond what the Process Document requires.
Patent Policy
This Working Group operates under the W3C Patent Policy (5 February 2004 Version). To promote the widest adoption of Web standards, W3C seeks to issue Recommendations that can be implemented, according to this policy, on a Royalty-Free basis. For more information about disclosure obligations for this group, please see the W3C Patent Policy Implementation.
Licensing
This Working Group will use the W3C Document license for all its deliverables.
About this Charter
This charter has been created according to section 5.2 of the Process Document. In the event of a conflict between this document or the provisions of any charter and the W3C Process, the W3C Process shall take precedence.
Charter History
The following table lists details of all changes from the initial charter, per the W3C Process Document (section 5.2.3):
Charter Period | Start Date | End Date | Changes |
---|---|---|---|
Initial Charter | .. | .. | .. |
Charter Extension | .. | .. | .. |
Bibliography
Informative references
- [FAIR]
- The FAIR Guiding Principles for scientific data management and stewardship. Mark D. Wilkinson et al. Nature. Scientific Data, vol. 3, Article nr. 160018. URL: https://doi.org/10.1038/sdata.2016.18