Data on the Web Best Practices: Data Quality Vocabulary

Abstract

This document provides a framework in which the quality of a dataset can be described, whether by the dataset publisher or by a broader community of users. It does not provide a formal, complete definition of quality, rather, it sets out a consistent means by which information can be provided such that a potential user of a dataset can make his/her own judgment about its fitness for purpose.

Prefix	Namespace
daq	http://purl.org/eis/vocab/daq#
dcat	http://www.w3.org/ns/dcat#
dcterms	http://purl.org/dc/terms/
dqv	http://www.w3.org/ns/dqv#
duv	http://www.w3.org/ns/duv#
oa	http://www.w3.org/ns/oa#
prov	http://www.w3.org/ns/prov#

5. 4. Vocabulary specification

Note

This section is work in progress. We will include later more tables with specification of individual classes and properties.

5.1 4.1 Class: Quality Measure

DQV defines quality measures as specific instances of ~~DAQ~~ DQV observations, adapting the DAQ quality metrics framework : [ DaQ ], [ DaQ-RDFCUBE ]:

A Quality Dimension is a characteristic of a dataset relevant to the consumer (e.g., the availability of a dataset).
A Quality Metric is gives a procedure for measuring a data quality dimension, which is abstract, by observing a concrete quality indicator. There are usually multiple metrics per dimension; e.g., availability can be indicated by the accessibility of a SPARQL endpoint, or of an RDF dump. The value of a metric can be numeric (e.g., for the metric “human-readable labeling of classes, properties and entities”, the percentage of entities having an rdfs:label or rdfs:comment) or boolean (e.g. whether or not a SPARQL endpoint is accessible).

For example, a dimension could be "multilinguality" and two metrics could be "ratio of literals with language tags" and "number of different language tags".

The following ~~property~~ properties should be used on this class: ~~daq:metric . The following property may be used for this class:~~ dqv:hasMetric , dqv:value , qb:dataSet .

Issue 3

Should (and if yes, how) DQV represent multiple/derived values for a metric (e.g., average or normalized value)? ( Issue-222 )

Issue 4

Should (and if yes, how) DQV represent parameters for a metric applied for computing a specific quality measure (e.g.,a specific setting of weights)? ( Issue-223 )

~~Issue 9~~

RDF Class:	dqv:QualityMeasure
Definition:	A quality measure represents the evaluation of a given dataset (or dataset distribution) against a specific quality metric.
Subclass of:	~~daq:Observation (itself a subclass of~~ qb:Observation )
Equivalent class	~~There might be no need for a subclass link between dqv:QualityMeasure and daq:Observation . I.e., we could re-use~~ daq:Observation ~~directly. ( Issue-186 )~~

5.1.1 4.1.1 Property: metric hasMetric

RDF Property:	~~daq:metric~~ dqv:hasMetric
Definition:	Indicates the metric being observed.
Instance of:	qb:DimensionProperty
Domain:	qb:Observation
Range:	~~daq:Metric~~ dqv:Metric
~~Minimum cardinality:~~ Equivalent Property	1 daq:metric

5.1.2 4.1.2 Property: measure dataset dataSet

RDF Property:	qb:dataSet
Definition:	Indicates the ~~data set of which this observation is~~ dataset to a ~~part.~~ quality measure (which is an RDF Data Cube observation) belongs.
Domain:	qb:Observation
Range:	qb:DataSet

4.1.3 Property: computedOn

RDF Property:	dqv:computedOn
Definition:	Refers to the resource (e.g., a dataset, a linkset, a graph, a set of triples) on which the quality measurement is performed. In the DQV context, this property is generally expected to be used in statements in which objects are instances of dcat:Dataset and dcat:Distribution .
Instance of:	qb:DimensionProperty
Domain:	dqv:QualityMeasure
Equivalent property:	daq:computedOn
Inverse property:	dqv:hasQualityMeasure

5.2 4.1.4 Property: value

RDF Property:	dqv:value
Definition:	Refers to values computed by metric.
Instance of:	qb:MeasureProperty , owl:DatatypeProperty
Domain:	dqv:QualityMeasure
Equivalent property:	daq:value

4.2 Class: Metric

The following properties should be used on this class: dqv:hasDimension .

Issue 5

In daQ, the property daq:expectedDataType associates each metric to the expected data type for its observed value. Data types for observed values are restricted to xsd:anySimpleType (e.g. xsd:boolean, xsd:double etc…). Is the current practice of using daq:expectedDataType in daQ appropriate? Isn't the restriction to xsd:anySimpleType too narrow? ( Issue-224 )

RDF Class:	dqv:Metric
Definition:	A standard to measure a quality dimension. An observation (instance of dqv:QualityMeasure) assigns a value in a given unit to a Metric.
Equivalent class	daq:Metric

4.2.1 Property: hasDimension

RDF Property:	dqv:hasDimension
Definition:	~~The smallest unit of measuring~~ Represents the dimension a metric allows a measurement of.
Domain:	dqv:Metric
Range:	dqv:Dimension
Inverse:	daq:hasMetric
Usage note:	Dimensions are meant to systematically organize metrics. The Data Quality Vocabulary defines no specific cardinality constraints for dqv:hasDimension, since distinct quality ~~dimension is~~ frameworks might have different perspectives over a metric. A metric ~~belongs~~ may therefore be associated to ~~exactly~~ more than one dimension. However, those who define new quality measures should try to avoid this as much as possible and assign only one dimension to the metrics they define.

Issue 10 4.3 Class: Dimension

~~Do we want to keep the same occurrence constraints as defined in DAQ (for example, that every metric~~ The following properties should ~~belong to exactly one dimension)? In this specific case this may~~ be ~~demanding too much~~ used on this class: dqv:hasCategory .

RDF Class:	dqv:Dimension
Definition:	Represents criteria relevant for assessing quality. Each quality ~~data publishers: it could be that a metric does not clearly belong to a dimension,~~ dimension must have one or ~~that a~~ more metric to measure it. A dimension is ~~in scope for several dimensions. ( Issue-187~~ linked with a category using the dqv:hasDimension property.
Equivalent class	daq:Dimension )

5.2.1 4.3.1 Property: dimension hasCategory

RDF Property:	~~dqv:hasDimension~~ dqv:hasCategory
Definition:	Represents the ~~dimension a metric allows~~ category a ~~measurement of.~~ dimension is grouped in.
Domain:	~~daq:Metric~~ dqv:Dimension
Range:	~~daq:Dimension~~ dqv:Category
Inverse:	~~daq:hasMetric~~ daq:hasDimension
~~Minimum cardinality:~~ Usage note:	1 Categories are meant to systematically organize dimensions. The Data Quality Vocabulary defines no specific cardinality constraints for dqv:hasCategory, since distinct quality frameworks might have different perspectives over a dimension. A dimension may therefore be associated to more than one category. However, those who define new quality measures should try to avoid this as much as possible and assign only one category to the dimensions they define.

4.4 Class: Category

RDF Class:	dqv:Category
Definition:	Represents a group of quality dimensions in which a common type of information is used as quality indicator.
~~Maximum cardinality:~~ Equivalent class	1 daq:Category

4.5 Class: Quality Measure Dataset

RDF Class:	dqv:QualityMeasureDataset
Definition:	Represents a dataset of quality measures, evaluations of a given dataset (or dataset distribution) against a specific quality metric.
Subclass of:	qb:DataSet
Equivalent class	daq:QualityGraph

4.6 Class: Quality Annotation

RDF Class:	dqv:QualityAnnotation
Definition:	Represents quality annotations, including rating, quality certificate, feedback that can be associated to datasets or distributions. Quality annotations must have one oa:motivatedBy statement with an instance of oa:Motivation (and skos:Concept), which reflects a quality assessment purpose. We define this instance as dqv:qualityAssessment.
Subclass of:	oa:Annotation
Equivalent class	EquivalentClasses( dqv:QualityAnnotation ObjectHasValue( oa:motivatedBy dqv:qualityAssessment ) )

Note

To make the document more self-contained we might consider to describe some properties of oa:Annotation, such as hasBody, hasTarget.

4.7 Class: User Quality Feedback

RDF Class:	dqv:UserQualityFeedback
Definition:	Represents feedback users might want to associate to datasets or distributions.
Subclass of:	dqv:QualityAnnotation duv:UserFeedback

Issue 6

Should we exploit predefined instances of oa:Motivation to further characterize a user's feedback purposes? ( Issue-201 )
Combining the predefined instances of oa:Motivation with the dqv:qualityAssessment we could distinguish different kinds of for user feedbacks, for example:

dqv:qualityAssessment plus oa:editing might indicate a request for a modification or edit, which relates to the quality of the target dataset/distribution
dqv:qualityAssessment plus oa:questioning might express a question issued about specific quality of the dataset/distribution
dqv:qualityAssessment plus oa:classification might represent the assignment of a classification type, typically from a controlled vocabulary or list, to the target resource(s). For example, it could be used to classify a dataset/distribution against a rating system (e.g., the 5 Stars linked open data rating system).

4.8 Property: Has Quality Measure

RDF Property:	dqv:hasQualityMeasure
Definition:	Refers to the performed quality measurements. Quality measurements can be performed to any kind of resource (e.g., a dataset, a linkset, a graph, a set of triples). However, in the DQV context, this property is generally expected to be used in statements in which subjects are instances of dcat:Dataset and dcat:Distribution .
Range:	dqv:QualityMeasure
Inverse property:	dqv:computedOn

6. 5. Example Usage

This section is non-normative.

This section shows some examples to illustrate the application of the Dataset Quality Vocabulary.

Note

This section is still work in progress. Further examples will be provided as soon as some of the pending issues are resolved. We invite the public to contact the editors and submit relevant examples of quality data, even not yet represented in DQV. We welcome your input!

NB: in the remainder of this section, the prefix " : " refers to http://example.org/

6.1 5.1 Express a quality assessment with quality metrics

Let us consider a dataset


myDataset

, and its distribution


myDatasetDistribution

:myDataset 
	a  dcat:Dataset ;
	dct:title "My dataset" ; 

	dcterms:title "My dataset" ; 

	dcat:distribution :myDatasetDistribution
	.
:myDatasetDistribution
	a  dcat:Distribution ;
	dcat:downloadURL <http://www.example.org/files/mydataset.csv> ;
       	dct:title "CSV distribution of dataset" ;

       	dcterms:title "CSV distribution of dataset" ;

       	dcat:mediaType "text/csv" ;
       	dcat:byteSize "87120"^^xsd:decimal 
	.

An automated quality checker has provided a quality assessment with two (CSV) quality measures for myDatasetDistribution.

:myDatasetDistribution
	dqv:hasQualityMeasure :measure1, :measure2
	.
:measure1 
	a dqv:QualityMeasure ;
	# when daq:computedOn ranges into a dcat:Distribution/dcat:Dataset, 
	# dqv:hasQualityMeasure is likely to be the inverse of daq:computedOn ?
        # In any case we could remove this statement.  
	daq:computedOn :myDatasetDistribution ;
	daq:metric :cvsAvailabilityMetric ;
	daq:value "1.0"^^xsd:double 

	dqv:computedOn :myDatasetDistribution ;
	dqv:hasMetric :csvAvailabilityMetric ;
	dqv:value "1.0"^^xsd:double 

	.
     	
:measure2
	a dqv:QualityMeasure ;
	# when daq:computedOn ranges into a dcat:Distribution/dcat:Dataset, 
	# dqv:hasQualityMeasure is likely to be the inverse of daq:computedOn ?
        # In any case we could remove this statement.  
	daq:computedOn :myDatasetDistribution ;
	daq:metric :csvConsistencyMetric ;
	daq:value "0.5"^^xsd:double 
	.
:cvsAvailabilityMetric 
	a daq:Metric ;
	dqv:hasDimension :availability

	dqv:computedOn :myDatasetDistribution ;
	dqv:hasMetric :csvConsistencyMetric ;
	dqv:value "0.5"^^xsd:double 

	.
:csvConsistencyMetric
	a daq:Metric ; 
	dqv:hasDimension :consistency
	.
	
:availability
	a daq:Dimension ; 

#definition of dimensions and metrics
:availabity
	a dqv:Dimension ; 

	dqv:hasCategory :category1;
	.
:consistency
	a daq:Dimension ;

	a dqv:Dimension ;

	dqv:hasCategory :category2
	.
# Categories and  dimensions might be more extensively defined, for example, 
# by grounding them in the section 'Dimensions and metrics hints'.
# However, any quality framework is free to define its own dimensions and categories.

	
:csvAvailabilityMetric 
	a dqv:Metric ;
	dqv:hasDimension :availabity
	.
:csvConsistencyMetric
	a dqv:Metric ; 
	dqv:hasDimension :consistency
	.

Categories and dimensions might be more extensively defined, see in the section 'Dimensions and metrics hints' for further examples. Any quality framework is free to define its own dimensions and categories.

Issue 7

Is there any reason for turning the classes dqv:Dimension, dqv:Metric and dqv:Category as well as the properties dqv:hasDimension and dqv:hasCategory into "abstract" classes and properties as they were defined in daQ (see Section "Extending the daQ" here )? ( Issue-204 )

Issue 8

Should we represent dimensions and categories as instances of skos:Concept ? This would allow publishers of quality framework to express (hierarchical) relations between dimensions or categories. This could also enable to align with quality-focused categorizations less focused on metrics. Including the DWBP Best Practices dimensions, or even the parts of DQV about annotations. ( Issue-205 )

6.2 5.2 Document the provenance of the quality metadata

The ~~result~~ results of metrics obtained in the previous assessment are stored in the myQualityMetadata graph.

# myQualityMatadata is a graph 
# we are assuming dqv:QualityMetadata is an extension of daq:QualityGraph,
# otherwise we should probably define a proper qb:DataStructureDefinition 

:myQualityMetadata {
:myDatasetDistribution
	dqv:hasQualityMeasure :measure1, :measure2
	.
# The graph contains the rest of the statements presented in the previous example.
}
# myQualityMetadata has been created by: qualityChecker and it is the result of the :qualityChecking activity 
:myQualityMetadata 
	a dqv:QualityMetadata ;
	prov:wasAttributedTo :qualityChecker ;
	prov:generatedAtTime "2015-05-27T02:52:02Z"^^xsd:dateTime ;
	prov:wasGeneratedBy :qualityChecking 
	.
# qualityChecker is a service computing some quality metrics 	
:qualityChecker
	a prov:SoftwareAgent ;   
	rdfs:label "a quality assessment service"^^xsd:string
	# We should probably suggest to add more info about the services
	.
# the qualityChecking is the activity that has generated myQualityMetadata starting from  MyDatasetDistribution    
:qualityChecking
	a prov:Activity;
	rdfs:label "the checking of myDatasetDistribution's quality"^^xsd:string;
   	prov:wasAssociatedWith :qualityChecker;
   	prov:used              :myDatasetDistribution;
   	prov:generated         :myQualityMetadata;
   	prov:endedAtTime      "2015-05-27T02:52:02Z"^^xsd:dateTime
   	prov:startedAtTime     "2015-05-27T00:52:02Z"^^xsd:dateTime;

   	prov:endedAtTime      "2015-05-27T02:52:02Z"^^xsd:dateTime;
   	prov:startedAtTime     "2015-05-27T00:52:02Z"^^xsd:dateTime

	.

6.3 5.3 Document the provenance of single quality measurement

Note

The group has discussed provenance at different level of granularity (dqv:QualityMeasure and dqv:QualityMetadata), so we might consider to add an example of provenance for dqv:QualityMeasure.

5.4 Document the provenance of a dataset

Statements similar to the ones applied to the resource myQualityMetadata above can be applied to the resource myDataset to indicate the provenance of the dataset. I.e., a dataset can be generated by a specific software agent, be generated at a certain time, etc. The HCLS Community Profile for describing datasets ~~provide~~ provides further examples.

6.4 5.5 Express that a dataset received an ODI certificate

Let us express that an ODI certificate for the "City of Raleigh Open Government Data" dataset is available at the URL <https://certificates.theodi.org/en/datasets/393/certificate>.

<https://certificates.theodi.org/en/datasets/393> a dcat:Dataset ;
        dqv:hasQualityAnnotation :myDatasetQA .
:myDatasetQA 
	a dqv:QualityCertificate ;
	oa:hasTarget <https://certificates.theodi.org/en/datasets/393> ;
	oa:hasBody  <https://certificates.theodi.org/en/datasets/393/certificate> ;
        oa:motivatedBy dqv:qualityAssessment .

5.6 Express quality of SKOS concept schemes

Let’s consider myControlledVocabulary, a controlled vocabulary made available on the Web using the SKOS [ SKOS-reference ] and DCAT [ vocab-dcat ].

:myControlledVocabulary 
	a dcat:Dataset ;
	dcterms:title "My controlled vocabulary" 
	.
	
:myControlledVocabularyDistribution
	a  dcat:Distribution ;
	dcat:downloadURL <http://www.example.org/files/myControlledVocabulary.csv> ;
       	dcterms:title "SKOS/RDF distribution of my controlled vocabulary"" ;
       	dcat:mediaType "text/turtle" ;
       	dcat:byteSize "190120"^^xsd:decimal 
.

qSKOS is an open source tool, which detects quality issues affecting SKOS vocabularies [ qSKOS ]. It considers 26 quality issues including, for example, “Incomplete Language Coverage” and “Label Conflicts” which are grouped in the category “Labeling and Documentation issues”. Quality issues addressed by qSKOS can be considered as DQV quality dimensions, whilst the number of concepts in which a quality issue occurs can be the metric deployed for each quality dimension.

# definition of instances for some of the  metrics, dimensions and categories deployed in qSKOS. 
:numOfConceptsWithLabelConflicts 
	a dqv:Metric;
	rdfs:label "Conflicting concepts"@en ;
	rdfs:comment "Number of concepts having conflicting labels"@en ;
	dqv:hasDimension  :LabelConflicts 
	.
:numOfConceptsWithIncompleteLanguageCoverage
	a dqv:Metric;
	rdfs:label "Language incomplete concepts"@en ;
	rdfs:comment "Number of concepts having an incomplete language coverage"@en ;
	dqv:hasDimension  :incompleteLanguageCoverage .
:LabelConflicts
	a  dqv:Dimension;
	rdfs:label "Label Conflicts"@en ;
	rdfs:comment "Dimension corresponding to the label conflicts quality issue"@en ;
        dqv:hasCategory :labelingDocumentationIssues .
:incompleteLanguageCoverage
	a  dqv:Dimension;
	rdfs:label "Incomplete Language Coverage"@en ;
	rdfs:comment "Dimension corresponding to the incomplete language coverage issue"@en ;
        dqv:hasCategory :labelingDocumentationIssues .
:labelingDocumentationIssues
        a  dqv:Category ;
	rdfs:label "Labeling and Documentation Issues"@en ;
	rdfs:comment "Category grouping labeling and documentation issues"@en ;
.

DQV represents the qSKOS quality assessment on myControlledVocabulary for the dimensions “Incomplete Language Coverage” and “Label Conflicts”.

:myDatasetDistribution
	dqv:hasQualityMeasure :measure1, :measure2
	.

:measure1 
	a dqv:QualityMeasure ;
	dqv:computedOn :myControlledVocabulary ;
	dqv:hasMetric :numOfConceptsWithMissingValues ;
	dqv:value "1500"^^xsd:integer  
	.
     	
:measure2
	a dqv:QualityMeasure ;
	dqv:computedOn :numOfConceptsWithIncompleteLanguageCoverage ;
	dqv:hasMetric :csvConsistencyMetric ;
	dqv:value "450"^^xsd:integer 
	.

6.5 5.7 Express the quality of a linkset

(VoID) linksets are collections of (RDF) links between two datasets. Linksets are as important as datasets when it comes to the joint exploitation of independently served datasets in linked data. The representation of quality for a linkset offers a further example of how DQV can be exploited.

Let’s define three DCAT datasets, including one VoID linkset, which connects the two others:

:myDataset1 
	a dcat:Dataset ;
	dcterms:title "My dataset 1" 

	.

Issue
11

:myDataset2 
	a dcat:Dataset ;
	dcterms:title "My dataset 2"  

	.

:myLinkset 
	a dcat:Dataset, void:Linkset ;
	dcterms:title "A Linkset between My dataset 1 and My dataset 2"; 
	void:linkPredicate skos:exactMatch ;
 	void:target :myDataset1 ;
 	void:target :myDataset2  
	.

~~This section will be completed by examples coming from Riccardo's work on measuring~~ We can represent information about the quality of :myLinkset using the “Multilingual importing” [ MultilingualImporting ] linkset quality metric. This metrics works on linksets between datasets that include SKOS ~~concept schemes, from~~ concepts [ SKOS-reference ]. It quantifies the ~~perspective of~~ information gain when adding ~~multilingual~~ the preferred labels or the alternative labels of the concepts from a linked dataset to the descriptions of the concepts from the other dataset, which these ~~schemes. On~~ concepts have been matched with a skos:exactMatch statement from the ~~Web, linksets~~ linkset. We must first define the proper metric, dimension and category.

# Definition of instances for Metric, Dimension and Category. 
:importingForPropertyPercentage 
	a dqv:Metric;
	dqv:hasDimension  :completeness.
:completeness
	a  dqv:Dimension;
        dqv:hasCategory :complementationGain .
:complementationGain
        a  dqv:Category .

The quality assessment of the "label importing" can be made dependent on two extra parameters: property and language, respectively the SKOS property and the language tag. We extend DQV to represent these parameters.

Issue 9

We need to further evaluate the way we add extra parameters for the metric and extend the DAQ RDF-CUBE data structure (postponed issue)

:language
        a qb:DimensionProperty, owl:DataProperty ;
        rdfs:comment   "language on which label importing is assessed."@en ;
        rdfs:domain    dqv:QualityMeasure;
        rdfs:label     "label import assessment language"@en .
:property
        a qb:DimensionProperty, rdf:Property ;
        rdfs:comment 	"property which label importing is assessed."@en ;
        rdfs:domain 	dqv:QualityMeasure ;
        rdfs:label  	"label import assessment property"@en ;
        rdfs:range     	rdf:Property .

Let us add actual quality assessments:

:qualityMeasureDataset  a  dqv:QualityMeasureDataset ;
        qb:structure  :dsd .
:importingForPropertyPercentage
   # should dqv:hasObservation  be added as inverse of dqv:hasMetric?
        dqv:hasObservation  		:exactMatchaltLabelit1 , :exactMatchaltLabelit2 , 
			   		:exactMatchaltLabelen1 , :exactMatchaltLabelen2,
			    		:exactMatchprefLabelit1, :exactMatchprefLabelit2 .
	
#Adding quality observations 
## for Italian alternative labels
:measure_exactMatchAltLabelItDataset1 
       a 		dqv:QualityMeasure;
       dqv:computedOn  	:myLinkset ;
       dqv:value       	"1.0"^^xsd:double ;
       dqv:hasMetric      	:importingForPropertyPercentage ;
       qb:dataSet      	:qualityMeasureDataset;
       :language    	"it" ;
       :property    	skos:altLabel .
:measure_exactMatchAltLabelItDataset2 
       a 		dqv:QualityMeasure;
       dqv:computedOn  	:myLinkset ;
       dqv:value       	"1.0"^^xsd:double ;
       dqv:hasMetric      	:importingForPropertyPercentage ;
       qb:dataSet      	:qualityMeasureDataset;
       :language    	"it" ;
       :property    	skos:altLabel .
## for English alternative labels
:measure_exactMatchAltLabelEnDataset1 
       a 		dqv:QualityMeasure;
       dqv:computedOn  	:myLinkset ;
       dqv:value       	"0.1"^^xsd:double ;
       dqv:hasMetric      	:importingForPropertyPercentage ;
       qb:dataSet      	:qualityMeasureDataset;
       :language    	"en" ;
       :property    	skos:altLabel .
:measure_exactMatchAltLabelEnDataset2  
       a 		dqv:QualityMeasure;
       dqv:computedOn  	:myLinkset ;
       dqv:value       	"1.0"^^xsd:double ;
       dqv:hasMetric      	:importingForPropertyPercentage ;
       qb:dataSet      	:qualityMeasureDataset;
       :language    	"en" ;
       :property    	skos:altLabel .      
## for Italian preferred labels
:measure_exactMatchPrefLabelItDataset1 
       a 		dqv:QualityMeasure;
       dqv:computedOn  	:myLinkset ;
       dqv:value       	"0.5"^^xsd:double ;
       dqv:hasMetric      	:importingForPropertyPercentage ;
       qb:dataSet      	:qualityMeasureDataset;
       :language    	"it" ;
       :property    	skos:prefLabel .
:exactMatchprefLabelit2  
       a 		dqv:QualityMeasure;
       dqv:computedOn  	:myLinkset ;
       dqv:value       	"0.5"^^xsd:double ;
       dqv:hasMetric      	:importingForPropertyPercentage ;
       qb:dataSet      	:qualityMeasureDataset;
       :language    	"it" ;
       :property    	skos:prefLabel .

Let us specify the RDF Data Cube data structure:

:dsd  a     qb:DataStructureDefinition ;
##Copying the structure of daq:dsq
        qb:component  [ qb:dimension  dqv:computedOn ;
                        qb:order      2
                      ] ;
        qb:component  [ qb:measure  dqv:value] ;
        qb:component  [ qb:dimension  <http://purl.org/linked-data/sdmx/2009/dimension#timePeriod> ;
                        qb:order      3
                      ] ;
        qb:component  [ qb:dimension  dqv:hasMetric ;
                        qb:order      1
                      ] ;
       
	qb:component [ qb:measure dqv:value;];
	# Attribute (here: unit of measurement)
	qb:component [
		qb:attribute sdmx-attribute:unitMeasure
	;
	qb:componentRequired false ;
	qb:componentAttachment qb:DataSet ; ] ;
	
    
##Extending  the structure of lds:dsq with two new dimensions
        qb:component  [ qb:dimension  :property ;
                        qb:order      4
                      ] ;
        qb:component  [ qb:dimension  :language ;
                        qb:order      5
                      ] .

5.8 Express the conformance of a dataset's metadata with a standard

It is often desirable to indicate that metadata about datasets in a catalogue are compliant with a metadata standard, or an ~~especially interesting case~~ application profile of ~~datasets! we could~~ an existing metadata standard. A typical example is the GeoDCAT Application Profile [ GeoDCAT-AP ], an extension of the DCAT vocabulary [ vocab-dcat ] to represent metadata for geospatial data portals. GeoDCAT-AP enables to express that a dataset's metadata conforms to an existing standard, following the recommendations of ISO 19115, ISO 19157 and the EU INSPIRE directive. DCAT partly supports the expression of such metadata conformance statements. The following example illustrates how a (DCAT) catalog record can be said to be conformant with the GeoDCAT-AP standard itself.

ex:myDataset a dcat:Dataset;
ex:myDatasetRecord a dcat:CatalogRecord ;
 foaf:primaryTopic :myDataset ;
 dcterms:conformsTo :geoDCAT-AP .
ex:geoDCAT-AP a dcterms:Standard;
  dcterms:title "GeoDCAT Application Profile" ;
  dcterms:comment "GeoDCAT-AP is developed in the context of the Interoperability Solutions for European Public Administrations (ISA) Programme"@en;
  dcterms:issued "201X-XX-XX"^^xsd:date .

Note that this example does not include the metadata about the dataset ex:myDataset itself. We assume this is present in an RDF data source accessible via the URI ex:myDatasetRecord. We also ~~add examples from qSKOS .~~ assume that ex:geoDCAT-AP is a reference URI that denotes the GeoDCAT-AP standard, which can be re-used across many catalog record descriptions, not just a locally introduced URI.

Issue 10

Relation between DQV, ISO 19115/19157 and GeoDCAT-AP: DQV is already able to express the notion of "conformance" to a standard using the property dcterms:conformsTo. However, there were suggestion to be further compatible with ISO 19157:2013 and INSPIRE by adding respectively "Not conformant" and "Not evaluated" as possible properties or values. Should DQV be this expressive? ( ~~Issue-188~~ Issue-202 )

7. 6. Dimensions and metrics hints

This section is non-normative.

Note

This section will be ~~refined, especially considering public feedback.~~ refined as soon as Issue-204 and Issue-205 are solved. In particular, following the discussion on Issue-200 , we plan to align the DQV dimension classification with the ISO 25012 [ ISOIEC25012 ] and to provide the classification proposed in Zaveri Et Al. [ ZaveriEtAl ] as a further example. Suggestions on possible mappings between ISO 25012 and Zaveri et al.'s dimensions as well as any other well-known classification are welcome.

This section gathers relevant quality dimensions and ideas for corresponding metrics, which might be eventually represented as instances of daq:Dimension and daq:Metric . The goal is not to define a normative list of dimensions and metrics, rather, the section provides a set of examples starting from use cases included in the Use Cases & Requirements document and from the following sources:

Issue 11

Are the levels of granularity of dqv:Dimension and dqv:Category well-defined enough and fit for purpose? ( Issue-225 )

7.1 6.1 Statistics

The following table gives example on statistics that can be computed on a dataset and interpreted as quality indicators by the data consumer. Some of them can be relevant for the dimensions listed in the rest of this section. The properties come from the VoID extension created for the Aether tool .

Observation	Suggested term
Number of distinct external resources linked to	http://ldf.fi/void-ext#distinctIRIReferenceObjects
Number of distinct external resources used (including schema terms)	http://ldf.fi/void-ext#distinctIRIReferences
Number of distinct literals	http://ldf.fi/void-ext#distinctLiterals
Number of languages used	http://ldf.fi/void-ext#languages

~~Issue 12 Are statistics about a dataset a kind of quality info we need to include in the data quality vocabulary? ( Issue-164 )~~

~~Issue 13~~ Note

The Aether VoID extension represents statistics as direct statements that have a dataset as subject and an integer as object. This pattern, which can be expected to be rather common, is different from the pattern that DQV inherits from ~~DAQ (see examples ). This document will probably have to explain~~ DAQ. Guidance on how ~~the different patterns~~ DQV/daQ can work with other quality statistics vocabulary will be ~~reconciled, if indeed both should exist alongside. ( Issue-189 )~~ provided.

7.2 6.2 Availability

Can the data be accessed now and over time?

Yes/no, maybe with explanation why the data is not available (privacy, security, archived, lost, not yet captured etc.)
Open/restricted/registration, again possibly with explanation
For access/re-use
Indication of persistence and longevity

Since a dcat:Dataset is an abstract thing, it might be available at any point in time, past present or future. We already have dcterms:issued so two properties come to mind:

dcat:verifiedAvailableOn {date} (the last time someone/something checked that the dataset was accessible, probably applies to a dcat:Distribution, not dcat:Dataset)
dcat:availableUntilAtLeast {date} (Potentially a data on which the dataset is expected to be withdrawn)

Other questions that come to mind: how do we indicate that the dataset is expected to be available 'for the foreseeable future?'

7.3 6.3 Processability

Is the data machine readable ?

Level on the 5-star scale (although there were opinions that it is dangerous to attach value to the linking because the data might be good but link to ‘bad’ data)
Links to metadata standards used and data model/schema to enable automatic processing

7.4 6.4 Accuracy

is the data correctly representing the real-world entity or event?

7.5 6.5 Consistency

Is the data not containing contradictions?

Can I use it readily in an analysis tool? Can I open the dataset in R and do some statistical manipulations? Can I open it in Tableau and make a visualization without doing a lot of cleaning?

There could be some overlap with accuracy.

7.6 6.6 Relevance

Does the dataset include an appropriate amount of data?

It might be useful to include some information about the context (e.g., why was the data created and what purpose is it supposed to serve).

7.7 6.7 Completeness

Does the data include all data items representing the entity or event ?

7.8 6.8 Conformance

Is the data following accepted standards ?

7.9 6.9 Credibility

Is the data based on trustworthy sources ?

This is described using the provenance vocabulary PROV-O

7.10 6.10 Timeliness

Is the data representing the actual situation and it is published soon enough ?

Abstract

Status of This Document

Table of Contents

1. Introduction

2. Conformance As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative. 3. Namespaces

4. 3. Vocabulary Overview

5. 4. Vocabulary specification

5.1 4.1 Class: Quality Measure

5.1.1 4.1.1 Property: metric hasMetric

5.1.2 4.1.2 Property: measure dataset dataSet

4.1.3 Property: computedOn

5.2 4.1.4 Property: value

4.2 Class: Metric

4.2.1 Property: hasDimension

Issue 10 4.3 Class: Dimension

5.2.1 4.3.1 Property: dimension hasCategory

4.4 Class: Category

4.5 Class: Quality Measure Dataset

4.6 Class: Quality Annotation

4.7 Class: User Quality Feedback

4.8 Property: Has Quality Measure

6. 5. Example Usage

6.1 5.1 Express a quality assessment with quality metrics

6.2 5.2 Document the provenance of the quality metadata

6.3 5.3 Document the provenance of single quality measurement

5.4 Document the provenance of a dataset

6.4 5.5 Express that a dataset received an ODI certificate

5.6 Express quality of SKOS concept schemes

6.5 5.7 Express the quality of a linkset

5.8 Express the conformance of a dataset's metadata with a standard

7. 6. Dimensions and metrics hints

7.1 6.1 Statistics

7.2 6.2 Availability

7.3 6.3 Processability

7.4 6.4 Accuracy

7.5 6.5 Consistency

7.6 6.6 Relevance

7.7 6.7 Completeness

7.8 6.8 Conformance

7.9 6.9 Credibility

7.10 6.10 Timeliness

8. 7. Requirements

A. Acknowledgements

B. Change history

B. C. References

B.1 C.1 Normative Informative references