Dataset Usage Vocabulary

Abstract

Datasets published on the web are accessed and experienced by consumers in a variety of ways, but little information about these experiences is typically conveyed. Dataset publishers many times lack feedback from consumers about how datasets are used. Consumers lack an effective way to discuss experiences with fellow collaborators and explore referencing material citing the dataset. Datasets as defined by DCAT are a collection of data, published or curated by a single agent, and available for access or download in one or more formats. The Dataset Usage Vocabulary (DUV) is used to describe consumer experiences, citations, and feedback about the dataset from the human perspective.

By specifying a number of foundational concepts used to collect dataset consumer feedback, experiences, and cite references associated with a dataset, APIs can be written to support collaboration across the web by structurally publishing consumer opinions and experiences, and provide a means for data consumers and producers advertise and search for published open dataset usage.

Status of This Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is a draft document which may be merged with the Data Quality Vocabulary or remain as a standalone document. Feedback is sought on the overall direction being taken as much as the specific details of the proposed vocabulary.

This document was published by the Data on the Web Best Practices Working Group as a First Public Working Draft.If you wish to make comments regarding this document, please send them to public-dwbp-comments@w3.org (subscribe, archives). All comments are welcome.

Publication as a First Public Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 August 2014 W3C Process Document.

Prefix	Namespace
dcat	http://www.w3.org/ns/dcat#
dct	http://purl.org/dc/terms/
dctype	http://purl.org/dc/dcmitype/
foaf	http://xmlns.com/foaf/0.1/
rdf	http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs	http://www.w3.org/2000/01/rdf-schema#
skos	http://www.w3.org/2004/02/skos/core#
vcard	http://www.w3.org/2006/vcard/ns#
xsd	http://www.w3.org/2001/XMLSchema#
duv	http://www.w3.org/ns/duv#
oa	http://www.w3.org/ns/oa#
rev	http://purl.org/stuff/rev#
prov	http://www.w3.org/ns/prov#
cito	http://purl.org/spar/cito#
bibo	http://purl.org/ontology/bibo#

5. Relationship to other Vocabularies

The DUV is a “glue” vocabulary reusing and extending existing vocabulary classes and properties to support citation, feedback, and usage. This section provides our rationale and approach for vocabulary selection and re-use.

Core to the dataset usage vocabulary is the “dataset”. The DUV uses the Data Catalog (DCAT) vocabulary dcat:Dataset class and all properties associated with the class. From a data usage perspective the DUV can be considered an extension of the dcat:Dataset.

The Web Annotation Vocabulary is used to describe duv:Feedback as a subclass inheriting the behavior of oa:Annotation. A crucial part of the Web Annotation Model are “motivations” that describe the role of particular Annotation. Each duv:Feedback must have at least one oa:motivated_by property with a relationship to an instance of oa:Motivation. A subset of the Motivation instances are important to describe feedback to data publishers, and blogs between dataset consumers. In addition to supporting duv:Feedback because the Web Annotation vocabulary provides a generic way of annotating any web resource, it is recommended that Web Annotation vocabulary be used to annotate the duv:Dataset for uses beyond the scope of the DUV.

The Provenance Ontology (Prov-O) is a vocabulary used by data providers to pass details about the data history to data users. Properties associated with prov:Activity provide relationships (prov:used, prov:hasGenerated) from a historical perspective using past tense forms of words and phrases. The developed and duv:WebThing reuses these properties by creating subProperties from Prov-O to describe usage from a present tense perspective.

Both the Citation Typing Ontology (CiTO) and Dublin Core vocabularies are used to describe citations and references between datasets and cited sources.

6. Examples

This section shows some examples to illustrate the application of the Dataset Usage Vocabulary.

Example 1 - Usage: A 2-D plot application developed by Laufer can be used to create temperature plots and consumes temperature readings from a dataset to produce the plot. A data logger used to provide temperature readings uses a configuration file for operation of the data logger.

Example 1

@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct:  <http://purl.org/dc/terms/> .
@prefix duv:  <http://www.w3.org/ns/duv#> .
@prefix :     <http://example.org#> .

:laufer
a foaf:Agent, foaf:Person;
   foaf:givenName "Laufer";
   foaf:mbox <mailto:laufer@example.org>
   duv:develops :xyplotter;
.

:xyplotter
  a duv:Application;
  rdfs:label "2dplotter" ;
  duv:consumes :dataset-03312004
  duv:developedBy :laufer ;
.

:insitu-measurement-data-logger
  a duv:WebThing;
  rdfs:label "surface meteorology data logger" ;
  duv:consumes :configfile ;
.

:configfile-csv
  a dcat:Distribution;
.

:configfile
  a dcat:Dataset ;
  dct:title "configuration settings" ;
  dcat:distribution :configfile-csv ;
.

:dataset-Jan-Mar-2004-csv 
  a dcat:Distribution;
.

:dataset-03312004
  a dcat:Dataset;
  dct:title "Mars Quarterly Temperature Plot"; 
  dcat:distribution :dataset-Jan-Mar-2004-csv;
.

Example 2 - Feedback: Laufer provides feedback about the temperature readings dataset.

Example 2

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct:  <http://purl.org/dc/terms#> .
@prefix oa:   <http://www.w3.org/ns/oa#>  .
@prefix duv: <http://www.w3.org/ns/duv#> .
@prefix : <http://example.org#> .

:laufer
   a duv:Person  ;
   foaf:givenName "Laufer"  ;
   foaf:mbox <mailto:laufer@example.org> ;
.

:dataset-03312004
   a dcat:Dataset ;
   dct:title "Mars Quarterly Temperature Plot" ; 
.

:comment1
   a duv:Feedback ;
   oa:hasBody "Written in MS-DOS text format." ;
   oa:hasTarget :dataset-03312004 ;
   oa:annotatedBy :laufer ;
.

:comment2
   a duv:Feedback;
   duv:hasRating "3 Star";
   oa:hasBody "Linked Data Rating";
   oa:hasTarget :dataset-03312004;
.

Example 3 - Citation: A technical report :paperA identified by a DOI cites the dataset. The :dataset-03312013 is also identified by a digital object identifier (DOI).

Example 3

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .
@prefix dct:  <http://purl.org/dc/terms#> .
@prefix oa:   <http://www.w3.org/ns/oa#>  .
@prefix cito: <http://purl.org/spar/cito#> .
@prefix duv: <http://www.w3.org/ns/duv#> .
@prefix : <http://example.org#> .

:dataset-03312013
    a dcat:Dataset;
    dc:identifier "doi:10.1038/ex2158";
    dct:title "Mars Quarterly Temperature Plot"@en ;
    dct:alternative "Qtrly Temp Plot"@en;
    dct:description "This plot features average surface temperatures measured by the Mars Land Rover. "@en ;
    dct:created "2013-03-31T15:18:00Z"^^xsd:dateTime ;
    dct:creator "Laufer" ;
    dct:license <http://creativecommons.org/licenses/by-sa/3.0/> ;
    dcat:keyword "Mars";
    dct:language <http://www.lexvo.org/page/iso639-3/eng> ;
    cito:isCitedAsDataSourceBy :paperA ; 
.

:thisCitation 
    a duv:Citation;
    cito:hasCitingEntity :dataset-03312004;
    cito:hasCitedEntity :paperA;
.

:paperA
   a foaf:document
   dc:identifier "doi:20.1055/ex7758";
   dct:title "Mars Weather Technical Report"@en;
   duv:cites :dataset-03312004;
.

8. Vocabulary Specification

Note

This is an initial proposal of DUV classes and properties. We are still evaluating the use of classes like duv:Citation and duv:Feedback.

8.1 Class:Agent

RDF Class:	foaf:Agent
Definition	An agent (eg. person, group, software or physical artifact).
rdfs:isDefinedBy	http://xmlns.com/foaf/spec/#term_Agent
Label	Agent

8.2 Class:Annotation

RDF Class:	oa:Annotation
Definition	Information about a web resource or associations between resources.
rdfs:isDefinedBy	http://www.w3.org/ns/oa
Label	Annotation

Property:title

RDF Property:	dct:title
Definition	A name given to the Annotation
Range	rdfs:Literal

Property: description

RDF Property:	dct:description
Definition	A free-text description of the Annotaion
Range	rdfs:Literal

8.3 Class: Application

RDF Class:	duv:Application
Definition	Software that is capable of reading and processing a corresponding dataset.
Label	Application

Issue 2

Should we use Software or earl:Software instead of developed? Issue-170

Property: title

RDF Property:	dct:title
Definition	A name given to the Application
Range	rdfs:Literal

Property: description

RDF Property:	dct:description
Definition	A free-text description of the Application
Range	rdfs:Literal

Property: developedBy

RDF Property:	duv:developedBy
Definition	Describes the agent associated with the development of an application
Range	foaf:Agent
Label	developed by
rdfs:isDefinedBy	http://www.w3c.org/ns/duv

Issue 3

Should dct:creator or doap:developer be used instead of duv:developedBy? Issue-171

Property: consumes

RDF Property:	duv:consumes
Definition	A dataset being consumed by an application.
Range	dcat:Dataset
Label	consumes
rdfs:isDefinedBy	http://www.w3c.org/ns/duv

Issue 4

Should duv:consumes be used instead of duv:consumed? Should we be able to reify Consumption? Issue-177

Property: generates

RDF Property:	duv:generates
Definition	Usage experience associated with the dataset being generated.
Range	dcat:Dataset
Label	generates
rdfs:isDefinedBy	http://www.w3c.org/ns/duv
rdfs:subPropertyOf	prov:generated

8.4 Class: Dataset

RDF Class:	dcat:Dataset
Definition	A collection of data, published or curated by a single source, and available for access or download in one or more formats.
rdfs:isDefinedBy	http://www.w3.org/ns/dcat
Label	Dataset
rdfs:subClassOf	dctype:Dataset

Property: title

RDF Property:	dct:title
Definition	A name given to the Dataset
Range	rdfs:Literal

Property: description

RDF Property:	dct:description
Definition	A free-text account of the Dataset
Range	rdfs:Literal

8.5 Class: Document

RDF Class:	foaf:Document
Definition	The Document class represents those things which are, broadly conceived, 'documents'.
rdfs:isDefinedBy	http://xmlns.com/foaf/spec/#term_Document
Label	Document

Property: title

RDF Property:	dct:title
Definition	A name given to the Document
Range	rdfs:Literal

Property: description

RDF Property:	dct:description
Definition	A free-text account of the Document
Range	rdfs:Literal

Property: cites

RDF Property:	duv:cites
Definition	The citing entity cites the cited entity, either directly and explicitly (as in the reference list of a journal article), indirectly (e.g. by citing a more recent paper by the same group on the same topic), or implicitly (e.g. as in artistic quotations or parodies, or in cases of plagiarism).
Range	dcat:Dataset
Label	cites
rdfs:isDefinedBy	http://www.w3c.org/ns/duv

8.6 Class: Activity

RDF Class:	prov:Activity
Definition	An activity is something that occurs over a period of time and acts upon or with entities; it may include consuming, processing, transforming, modifying, relocating, using, or generating entities.
rdfs:isDefinedBy	http://www.w3.org/ns/prov
Label	Activity

8.7 Class: Rating

RDF Class:	duv:Rating
Definition	Metric used to evaluate the dataset.
rdfs:isDefinedBy	http://www.w3.org/ns/duv
Label	Rating

8.8 Class: Citation

RDF Class:	duv:Citation
Definition	Citation in document that references dataset.
rdfs:isDefinedBy	http://www.w3.org/ns/duv
Label	Citation
rdfs:subClassOf	cito:CitationsAct

Issue 5

The use of cito:CitationAct and duv:Citation is under evaluation. Issue-173

Property: hasCitingEntity

RDF Property:	cito:hasCitingEntity
Definition	The citation act relates to the entity containing that citation.
Range	dcat:Dataset
Label	has citing entity
rdfs:isDefinedBy	http://purl.org/spar/cito/hasCitingEntity

8.9 Class: Feedback

RDF Class:	duv:Feedback
Definition	Feedback on the dataset. Expresses whether the dataset was useful or not, for example.
rdfs:isDefinedBy	http://www.w3.org/ns/duv
Label	Feedback
rdfs:subClassOf	oa:Annotation

Issue 6

The definition of duv:Feedback needs to be reviewed because it is not clear if it should be a subclass of oa:Annotation or just an instance of oa:Motivation. Issue-178

Property: endorses

RDF Property:	duv:endorses
Definition	Agent provided feedback providing endorsement of dataset.
Range	dcat:Dataset
Label	endorses
rdfs:isDefinedBy	http://www.w3c.org/ns/duv

Property: annotatedBy

RDF Property:	oa:annotatedby
Definition	Feedback resource that identifies the agent responsible for creating the Annotation.
Range	foaf:Agent
Label	annotatedBy

Property: retains

RDF Property:	duv:retains
Definition	A feedback annotation may refer to another feedback annotation.
Range	duv:Feedback
Label	retains
rdfs:isDefinedBy	http://www.w3c.org/ns/duv

Issue 7

The meaning of duv:retains needs to be clarified and more examples will be provided. Issue-174

Property: has_rating

RDF Property:	duv:has_rating
Definition	An optional rating provided as part of feedback.
Range	duv:Rating
Label	has_rating
rdfs:isDefinedBy	http://www.w3c.org/ns/duv

Property: has_datasetCorrection

RDF Property:	duv:has_datasetCorrection
Definition	An optional data correction provided as part of feedback.
Range	duv:DatasetCorrection
Label	has_datasetCorrection
rdfs:isDefinedBy	http://www.w3c.org/ns/duv

8.10 Class: DatasetCorrection

RDF Class:	duv:DatasetCorrection
Definition	A dataset correction suggested by user as part of a feedback.
rdfs:isDefinedBy	http://www.w3.org/ns/duv
Label	DatasetCorrection

8.11 Class: WebThing

RDF Class:	duv:WebThing
Definition	A Web of Things (WoT) device, sensor, or hardware on the Web that consumes a dataset.
rdfs:isDefinedBy	http://www.w3.org/ns/duv
Label	WebThing

Issue 8

Should prov:SoftwareAgent be used instead of Application/WebOfThing? Issue-176

9. DUV Requirements

This section shows some of the requirements that motivated the development of the Dataset Usage Vocabulary. These requirements were derived from the use cases described in Data on the Web Best Practices Use Cases & Requirements document.

R-TrackDataUsage

It should be possible to track the usage of data.

Capability of tracking data usage can help enhancing reputation of the datasets. Records of data usage shows all the successful outcome of the data usage and all the entities associated with it such as the person, organisation, application, research projects that has used these datasets. It increases trust in the data. It also provides provence about how data versions over the time.

Use Case	R-TrackDataUsage Benefits
Airborne Snow Observatory	Data is used in decision making process by Water Reservoir Managers. Capability to track usage of data will lead to identification of all the decisions and policy changes made by authorities based on this data. It will also list applications, tools and frameworks suitable for analysis of this kind of data.
LandPortal	Data is used in Research; Policy Making, Journalism; Development; Investments; Governance; Food security; Poverty; Gender issues. Usage tracking will help in assessing the impact of published data.
LusTRE	>Data is put in public for reuse and reference in nature conservation activities. Information about use of this data will determined impact of this framework. Usage of this data MUST lead to future publications of less heterogenous data and more and more used of standardised thesauri.
Open Experimental Field Studies	Data is used in computational models and studies. Capabilities to track usage of data will enable data publishers to identify all the users communities making use of this data. It will also identify combined use of multiple datasets in one big study. This will identify related datasets which can be recommended to future users.
RDESC	Data is published in Linked Data Format for discovery and recommendations of related datasets. Capability to keep track of its usage will list all the tools and application suitable to be used with this data. Because RDESC is not data publisher but more of a data facilitator, usage tracking will identify highly search dataset and the trends in the temporal, spatial and domain specific search queries.
UKOpenResearchForum	Data is published with intelligent openness to support research projects. Capability to track data usage will provide adequate acknowledgement to data originator.

R-UsageFeedback

Data consumers should have a way of sharing feedback and rating data.

User feedback is important to address data quality concerns about published dataset. Different users may have different experience with the same dataset so it is important to capture the context in which data was used and the profile of the user who uses it. R-UsageFeedback should also provide a way to communicate suggested corrections and update to the datasets by the users back to data publisher. Data publishers should have a review mechanism to incorporate submitted corrections.

Use Case	R-UsageFeedback Benefits
Airborne Snow Observatory	Data grows rapidly each year. User feedback can reports issues of data completeness and correctness.
DadosGovBr	Data came from various publishers. As a catalog, the site has faced several challenges, one of them was to integrate the various technologies and formulas used by publishers to provide datasets in the portal. User feedback can provided usabilities of those technologies and formulas. User feedback can be used to crowdsource discrepancies in the vocabularies used to describe datasets.
LusTRE	Data multilingualism is one of the challenge for this use case. User feedback can be used to crowdsource multilingual text alignment.
Experimental Field Studies	Data is used in computational models and studies. User feedback can be used to identify good quality data required for good quality research. completeness, time resolution and usability can be captured using user feedback.>
RDESC	RDESC curate different data source and publish metadata in Linked Data Format. User feedback is useful to assess metadata quality. Availability of the source datasets, Correctness of persistent URI, Correctness of the concepts defined in RDESC such as FOAF Agents, Organizations, Physical Properties and Usability of the search interface can be captured in user feedback.

R-Citable

It should be possible to cite data on the Web.

Use Case	R-Citable Benefits
Open Experimental Field Studies	Various experiments and fields studies are performed to generate data which is used in computational models and bigger studies.Capability to capture all the citations of the published data can justify the efforts used in publishing. Citation information can be used to identify all the user communities interested in data source.
LATimes	On 27 March 2014, the LA Times published a story Women earn 83 cents for every $1 men earn in L.A. city government. It was based on an Infographic released by LA's City Controller, Ron Galperin. This report could only cite data portal of all the resource. It could not cite to exact dataset because tool long URI.
RDESC	RDESC is a data curator so it uses data from different sources. But this usage is not communicated to data publishers because of lack of such mechanism provided by publishers.

Abstract

Status of This Document

Table of Contents

1. Introduction

2. Namespaces

3. Audience

4. Scope

5. Relationship to other Vocabularies

6. Examples

7. Vocabulary Overview

8. Vocabulary Specification

8.1 Class:Agent

8.2 Class:Annotation

Property:title

Property: description

8.3 Class: Application

Property: title

Property: description

Property: developedBy

Property: consumes

Property: generates

8.4 Class: Dataset

Property: title

Property: description

8.5 Class: Document

Property: title

Property: description

Property: cites

8.6 Class: Activity

8.7 Class: Rating

8.8 Class: Citation

Property: hasCitingEntity

8.9 Class: Feedback

Property: endorses

Property: annotatedBy

Property: retains

Property: has_rating

Property: has_datasetCorrection

8.10 Class: DatasetCorrection

8.11 Class: WebThing

9. DUV Requirements

A. Acknowledgements

B. Change history

C. References

C.1 Normative references

C.2 Informative references