This document is the reference glossary used by Data on The Web Working Group.


The deliverables of the Best Practices for Data on the Web include of documents that aim to to facilitate the work between data consumers and data publishers. To fulfill this mission, the WG decided to build a Glossary to ensure the common ground terms between pdata consumers and data publishers.

There is a mental model listed to ensure that the scope is delimited.


A dataset is defined as a collection of data, published or curated by a single agent, and available for access or download in one or more formats. A dataset does not have to be available as a downloadable file.


A Citation may be either direct and explicit (as in the reference list of a journal article), indirect (e.g. a citation to a more recent paper by the same research group on the same topic), or implicit (e.g. as in artistic quotations or parodies, or in cases of plagiarism)

From: CiTO

Data Consumer

For the purposes of this WG, a Data Consumer is A person or group accessing, using, and potentially performing post-processing steps on data."

From: Strong, Diane M., Yang W. Lee, and Richard Y. Wang. "Data quality in context." Communications of the ACM 40.5 (1997): 103-110.

Data format

Data Format defined as a specific convention for data representation i.e. the way that information is encoded and stored for use in a computer system, possibly constrained by a formal data type or set of standards."

From DH Curation Guide

Data producer

Data Producer is a person or group responsible for generating and maintaining data.

From: Strong, Diane M., Yang W. Lee, and Richard Y. Wang. "Data quality in context." Communications of the ACM 40.5 (1997): 103-110.

Data representation

Data representation is any convention for the arrangement of symbols in such a way as to enable information to be encoded by a data producer and later decoded by data consumers.">Data representation

From DH Curation Guide


Feedback is a forum used to collect messages posted by consumers about a particular topic. Messages can include replies to other consumers. Datetime stamps are associated with each message and the messages can be associated with a person or submitted anonymously.

SIOC, (2) Annotation#Motivation

To better understand why annotation (See Annotation) was created SKOS is used to show inter-related annotation between communities with more meaningful distinctions than a simple class/subclass tree.

Data Preservation

Data Preservation is defined by APA as "The processes and operations in ensuring the technical and intellectual survival of objects through time". This is part of a data management plan focusing on preservation planning and meta-data. Whether it is worthwhile to put effort into preservation depends on the (future) value of the data, the resources available and the opinion of the stakeholders (= designated community)

Data Archiving

Data Archiving is the set of practices around the storage and monitoring of the state of digital material over the years.

These tasks are the responsibility of a Trusted Digital Repository (TDR), also sometimes referred to as Long-Term Archive Service (LTA). Often such services follow the Open Archival Information System which defines the archival process in terms of ingest, monitoring and re-use of data.

File Format

File Format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free and may be either unpublished or open.

Examples of file formats: txt, pdf, ps,avi, gif or jpg

From Wikipedia

Machine Readable Data

Machine Readable Data are data formats that may be readily parsed by computer programs without access to proprietary libraries. For example CSV and RDF turtle family for graphs are machine readable, but PDF and Jpeg are not.

From Linked Data Glossary


Vocabulary is A collection of "terms" for a particular purpose. Vocabularies can range from simple such as the widely used RDF Schema, Foaf and Dublin Core Metadata Element Set to complex vocabularies with thousands of terms, such as those used in healthcare to describe symptoms, diseases and treatments. Vocabularies play a very important role in Linked Data, specifically to help with data integration. The use of this term overlaps with Ontology.

From: Linked Data Glossary

Structured data

Structured Data refers to data that conforms to a fixed schema. Relational databases and spreadsheets are examples of structured data.