The Data Privacy Vocabulary [[DPV]] enables expressing machine-readable metadata about the processing of personal data and use of technologies such as AI. It provides representation of information to support regulatory compliance, such as that for [[[GDPR]]]. This document is the ‘Primer’ for DPV - and introduces fundamental concepts with examples of use-cases and applications as a starting point for adopters wanting to understand and use the DPV. The Primer contains:

The DPVCG is currently updating the specifications to v2. This document is a draft and may change as part of this process.

Contributing: The DPVCG welcomes participation to improve the DPV and associated resources, including expansion or refinement of concepts, requesting information and applications, and addressing open issues. See contributing page for further information.

DPV and Related Resources

[[[DPV]]]: is the base/core specification for the 'Data Privacy Vocabulary', which is extended for Personal Data [[PD]], Locations [[LOC]], Risk Management [[RISK]], Technology [[TECH]], and [[AI]]. Specific [[LEGAL]] extensions are also provided which model jurisdiction specific regulations and concepts . To support understanding and applications of [[DPV]], various guides and resources [[GUIDES]] are provided, including a [[PRIMER]].

[[DPV]] and related resources are published on GitHub. For a general overview of the Data Protection Vocabularies and Controls Community Group [[DPVCG]], its history, deliverables, and activities - refer to DPVCG Website. For meetings, see the DPVCG calendar.

The peer-reviewed article “Creating A Vocabulary for Data Privacy” presents a historical overview of the DPVCG, and describes the methodology and structure of the DPV along with describing its creation. An open-access version can be accessed here, here, and here. The article preprint Data Privacy Vocabulary (DPV) - Version 2 describes the changes made in DPV v2.

Introduction

The [[[DPVCG]]] was formed in 2018 through the [[[SPECIAL]]] with the ambition of providing a machine-readable and interoperable vocabulary for representing information about the use and processing of personal data, whilst inviting perspectives and contributions from a diverse set of stakeholders across computer science, IT, law, sociology, philosophy – representing academia, industry, policy-makers, and activists. It identified the following issues through the W3C Workshop on Privacy and Linked Data:

  1. lack of standardised vocabularies to represent concepts related to personal data, and who/how/where it is processed;
  2. lack of descriptive taxonomies that describe how purposes of processing personal data which are not restricted to a particular domain or use-case; and
  3. lack of machine-readable representations of concepts that can be used for technical interoperability of information.

The outcome of addressing these resulted in the creation of the [[[DPV]]], which provides a vocabulary and ontology for expressing information related to processing of personal data, entities involved and their roles, details of technologies utilised, relation to laws and legal justifications permitting its use, and other relevant concepts based on privacy and data protection. While it uses the EU’s [[[GDPR]]] as a guiding source for the creation and interpretation of concepts, the ambition and scope of DPV is to provide a broad globally useful vocabulary that can be extended to jurisdiction or domain specific applications.

People, organisations, laws, and use-cases have different perspectives and interpretations of concepts and requirements which cannot be modelled into a single coherent universal vocabulary. The aim of DPV is to provide a foundational framework of ‘common concepts’ that can be extended to represent specific laws, domains, or applications. This lets any two entities agree that a term, for example, PersonalData, refers to the same semantic concept, even though they might apply it differently within their own use-cases.

While most of DPV is focused on processing of personal data, it also supports representing uses of non-personal data and representing technologies such as cloud services and AI. Through these concepts, DPV enables supporting regulations and use-cases which affect use of both personal and non-personal data - such as the [[[DGA]]], and regulations which regulate technologies such as [[[NIS2]]] and [[[AIAct]]].

Using DPV

The motivation of DPV is to provide a 'data model' or an 'ontology' of concepts for interoperable representation and exchange of information about processing of (personal) data and the use of technologies. For this, the DPV specification defines concepts and relationships using the [[RDF]] standard, and which can additionally be implemented and applied using technologies appropriate to a use-case's specific requirements.

DPV Serialisations

In addition to being used as a semantic web resource, the DPV can also be used without (or alongside) semantic web by utilising a format such as [[JSON-LD]] that retains the semantics and provides convenience of using JSON, or through other formats such as a CSV or a flat-list of concepts which do not capture 'semantics'. This section provides an overview of such approaches where DPV can be used both with and without semantic web.

The following are four (non-exhaustive) ways DPV can be used based on the requirements of an use-case. For guidance on how to adopt DPV concepts within an use-case, refer to [[[GUIDES]]].

  1. As a taxonomy or collection of concepts: The [[DPV]] specification provides taxonomies of concepts (e.g. Purpose). This is useful where only the 'taxonomy' in DPV is needed, for example to populate forms or annotate information. For this, the default serialisation of [[SKOS]] is useful, or implementers can utilise other formats such as CSVs or JSON while retaining the IRIs for interoperability.
  2. As a 'schema' or 'lightweight-ontology': The [[DPV]] uses [[RDFS]] and [[SKOS]] as its default serialisation to define 'classes' and 'properties' as well as 'taxonomies' that can be used with it (e.g. hasPurpose and instances of Purpose taxonomy). The classes and properties form a 'data model' or 'schema' to represent how information should be structured and organised, and do not contain any complex restrictions (e.g. unions and intersections of concepts). It is suitable for cases where the use-case wants to use DPV as a schema or data model or to describe its activities, and where creating constraints, inferences, or reasoning is either implemented separately (e.g. SWRL or SHACL) or is not required.
  3. As an 'logic-based ontology': The [[DPV-OWL]] is a serialisation of the [[DPV]] specification using [[OWL]] language that contains the same concepts but is provided under a separate namespace. It enables the use of description logic in [[OWL]] for modelling knowledge and describing desired inferences through a logic-based reasoner. OWL offers more powerful (and complex) features compared to RDFS+SKOS regarding expression of information and its use to produce desired inferences in a coherent manner. It also restricts ways in which DPV concepts can be used - see example showing implications of using SKOS vs OWL. Also see the [[[GUIDE-OWL2]]].
  4. Other Uses: For cases where the above are not suitable or sufficient, an adopter can create their own serialisation of the DPV by implementing the [[DPV]] specification in RDF (or other semantics-aware languages) or for alternate formats and environments such as CSVs, programming APIs, and frameworks. When using DPV in such a manner, it is advised to retain compatibility (and interoperability) by either using the entire IRI (e.g. https://w3id.org/dpv#Purpose) or providing documentation for how the custom implementation aligns with the [[DPV]] specification (e.g. stating MyPurposeConcept is the same as dpv:Purpose). Doing this ensures that the data remains compatible and interoperable with the other uses and applications of DPV.

Areas of Application

The following is an illustrative, but non-exhaustive list of applications possible with the DPV:

See the community maintained [[[DPV-ADOPTION]]]

Semantics of DPV

DPV defines a broad notion of semantics for providing a conceptual model of concepts and relationships between them. As explained in the [[[#serialisations]]] section, [[DPV]] provides concepts which are represented using [[RDFS]] and [[SKOS]] which permits its use as a taxonomy or as a light-weight ontology. In addition to this, the same concepts are provided with [[OWL]] serialisation in a separate namespace to enable complex ontological reasoning. The following section introduces why we need 'concepts' and 'relationships' and how they are modelled in DPV.

Concepts and Relationships

[[DPV]] is a collection of concepts. Here the term 'concept' is broadly used as consisting of a term non-exhaustively representing any of the following: idea, thought, meaning, object, event, relations, class, or category. Thus, in DPV, 'concepts' consist of terms and relationships between them.

Concepts and Relationships

A ‘concept' in DPV is a 'term' representing information associated with that particular concept. For example, the concept Email refers to information about emails. This information may contain email addresses, aliases, signatures, and so on. While an intuitive use of Email may be taken to only refer to email address, within DPV concepts are defined with a strict scope as being representatives of all concepts that are inherently a part of it. Therefore, for emails, the concept Email is inclusive of email addresses, aliases, and so on from above. To specifically refer to 'email address', the concept Email Address should be used, which is 'narrower' or 'more specific' than the concept Email, or in terms of sets EmailAddress is a subset of Email, or if representing information as 'classes' we say EmailAddress is a 'subclass' of Email in terms of information. We use the term 'subtype' to indicate all such relationships consisting of 'broader/narrower' or 'superclass/subclass' or 'subset/superset' to enable different semantic interpretations when serialising the concepts using standards such as [[RDFS]], [[SKOS]], and [[OWL]] (e.g. 'is-a' or 'subclass').

Through this interpretation, the DPV is structured as a hierarchy of concepts where each parent or top or broader concept represents a broad set of information and its children or bottom or narrower concepts represent parts of that set. For example, the top concept Data has more specific subtypes Personal Data - which has a further subtype Sensitive Data.

In taking this view of concepts and relationships, DPV provides a way to agree upon what a term means and is intended to represent. For example, when two different use-cases use the concept Personal Data using DPV, both refer to the same concept. Similarly, when Email is declared as a subtype of Personal Data, another entity receiving and reading this information must interpret it in the same manner. DPV is thus intended to be a foundational model for terms and relationships when representing and exchanging information.

DPV as an Ontology

The use of DPV concepts in actual use-cases is often accompanied with additional information and a specific 'serialisation' that make it possible to use DPV in a given technological or theoretical framework. For example, consider the relation hasPersonalData used to indicate association or applicability of PersonalData subtypes/subclasses or instances. While this information about what concepts the relationship is being used with/for can be implicitly understood by humans based on the phrasing 'has personal data', it can also be explicitly declared as machine-readable information so as to: (i) express the inherent logic and interpretation of which concepts are related; (ii) enable verification that the object of relation is indeed a type of personal data; and (iii) provide hints or suggestions such as a list of personal data concepts in GUI when using the relation. To express such additional information that defines relations between concepts and constraints their uses, DPV must be specified as an 'ontology' using a serialisation that supports representing this and any other required information.

One option to represent ontologies is RDF ([[[RDF]]]) which provides a formal method for expressing information or facts, with RDFS ([[[RDFS]]]), SKOS ([[[SKOS]]]), and OWL ([[[OWL]]]) for representing a more detailed and logic-based assertion of the model in terms of relationships and restrictions. While there are other alternatives available to RDF for representing information, and to RDFS, SKOS, and OWL for representing taxonomies and ontologies, the DPVCG uses these to serialise the DPV specification as an ontology based on their status as standards.

Initially, DPV was only provided as an [[OWL]] ontology. This was expanded upon in DPV v1 which used custom [[SKOS]] extensions to define the 'core' vocabulary with serialisations in [[RDFS]]+[[SKOS]] and OWL2. In DPV v2, the custom [[SKOS]] extensions were removed in favour of [[RDFS]]+[[SKOS]] as the default serialisation with [[OWL]] as an alternative serialisation. The [[RDFS]]+[[SKOS]] serialisation defines concepts as [[RDFS]] classes and instances of a top-concept with [[SKOS]] used to represent the hierarchy, whereas the [[OWL]] serialisation uses subclasses to represent the hierarchy.

The table provides an overview of the expression of concepts across DPV serialisations.

Concept [[DPV]] [[DPV-OWL]]
Conforms with [[RDFS]], [[SKOS]] [[OWL]]
Concept rdfs:Class, skos:Concept owl:Class
is subtype of rdfs:subClassOf or skos:broader owl:subClassOf
is instance of rdf:type rdf:type
has concept rdf:Property owl:ObjectProperty
relationship subject or domain rdfs:domain, dcam:domainIncludes, schema:domainIncludes rdfs:domain, dcam:domainIncludes, schema:domainIncludes
relationship object or range rdfs:range, dcam:rangeIncludes, schema:rangeIncludes rdfs:range, dcam:rangeIncludes, schema:rangeIncludes

Extending Concepts for Use-Cases

Most of the concepts within DPV are provided as hierarchies of classes representing categories of information, which are intentionally generic or abstract or broad so as to permit their application across a diverse and varied landscape of real-world use-cases. In order to accurately reflect the particulars of an use-case, concepts within DPV would (most likely) need to be extended. The specifics for how this should be done depend on the manner in which DPV is utilised. For example, using the default [[DPV]] specification which contains [[RDFS]] and [[SKOS]] semantics, extending is done by declaring a new concept an instance of the top concept using rdf:type and then using skos:broader to denote where it fits within the hierarchy. In [[DPV-OWL]] which uses [[OWL]] semantics, rdfs:subClassOf relationship is used to create hierarchy of sub-classes. Where an exact concept is not present within the DPV and a broader concept exists for representing the same information - one should subtype or extend that broad concept to define the required information.

The mechanism for extending concepts (via both subclasses/subtypes and instances) is useful to align existing concepts or vocabularies with the DPV taxonomies, such as by declaring them as subclasses of a particular concept. This permits the creation of domain or jurisdiction specific extensions, such as [[[EU-GDPR]]] for expressing the legal bases provided by GDPR. Extensions also permit more accurate representations of a use-case by extending from multiple concepts to refine and scope the interpretation. This means each concept can have multiple parents representing the intersection of their respective sets.

It is not necessary to extend concepts unless one wishes to depict use-case specific information. For example, if in a use-case it is sufficient to (only) say some information is collected, then dpv:Collect can be directly used. However, where more specific information is needed, such as also specifying a method of collection (e.g. CollectViaWebForm), then it is recommended to extend the concept, for example as <CollectViaWebForm a dpv:Collect>. If there are lots of forms and they need to be 'grouped' together as collection methods, then one would subtype/subclass Collect as CollectViaWebForm and create instances of it for each form to be represented.

Though this example used a web form as a method of collection by directly mentioning it within the concept as CollectViaWebForm, this may not always be desirable. For example, that same web form may also need to be represented separately for logging purposes. DPV also provides the DataSource and Technology concepts for representing information regarding how concepts are implemented and the use of specific technological artefacts such as web forms, databases, along with their functions such as data storage and retrieval.

Maintaining Interoperability

DPV intends to provide a core or foundational framework for different entities to exchange information and interpret concepts for interoperability. When an adopter (e.g. an organisation using DPV) extends concepts to refine them for their own use-case, the concept is still (weakly) interoperable by relying on DPV’s broad taxonomies to provide a common point of reference.

Core Concepts

Structure of DPV

DPV (depicted as core) vocabulary providing taxonomies of concepts such as data and purpose, and with extensions further extending concepts such as for risk and technologies (tech)

DPV provides hierarchical taxonomies of concepts where each core concept represents the top-most abstract concept in a tree and each of its children provide a lesser abstract or more concrete concept. For example, consider the concept of PersonalData which is the abstract representation of personal data. It can be further refined or extended as SensitivePersonalData, and further as SpecialCategoryPersonalData and then as GeneticData and so on.

From this perspective, the top-most abstract concepts are collectively referred to as the core vocabulary within DPV. The goal of the DPV is to provide a rich collection of concepts for each of the top concepts so as to enable their application within real-world use-cases. The identification of what constitutes a core concept is based on the need to represent information about it in a modular and independent form, such as that required for legal compliance.

Each core concept is intended to be independent from other core concepts. For example, the Purpose (e.g. Optimisation) refers only to the purpose of why personal data is processed and is independent as a concept from the PersonalData (e.g. Location) or the Processing activities (e.g. Collect, Store) involved to carry out that purpose. Such separation is necessary in order to represent and answer questions such as:

The separation of concepts creates a modular structure for concept hierarchies within DPV, which in turn allows an adopter to use one particular concept taxonomy or module (e.g. list of purposes) independently without reusing the others, or to select only those concepts which are needed for their particular use-case. The separation also permits greater flexibility of representation and usage - such as using different combinations of core concepts as needed in use-cases. For example, a use-case can specify a single concept representing both Purpose and Processing by combining their respective concepts from DPV. The modular design of DPV also makes it possible to define domain and jurisdiction specific concepts in a separate namespace - such as the [[[DPV-NACE]]] purpose taxonomy providing a way for Purpose to indicate sectors using NACE taxonomy, and the [[[EU-GDPR]]] for using LegalBasis to represent the legal bases provided by [[GDPR]].

Overview of Core Concepts

Overview of concepts in DPV - those in red have been added in v2, those in blue have had their scope expanded to include data and technologies

Purpose

Indicating applicable or relevant Purpose

see more information: Primer | DPV spec

Representing the purpose for which personal data is processed, for e.g. ‘Personalisation’ as a broad category of purpose. Information about the purpose can be further specified by denoting information about its interpretation within a particular Sector, such as from standardised authoritative lists e.g. [[NACE]], to indicate domain-specific applications and interpretations, or to indicate applicability of sectorial laws.

Data and Personal Data

Indicating applicable or relevant PersonalData

see more information: Primer | DPV spec

‘Personal data’ refers to data about a natural person. ‘Personal data’ is also commonly referred to as ‘personally identifiable information (PII)’. However the terms should not be interchangeably used as based on definitions (e.g. those in GDPR), ‘personal data’ can be interpreted as a broader term than PII, and where PII may refer only to information that can directly identify a person. DPV’s definition of personal data is based on the broadest possible definition (i.e. from GDPR) as it covers a wider range of information considered ‘personal data’. Personal data can be declared as a category, such as ‘Email’, or an instance, such as ‘x@y.z’.

DPV defines the concept Data which has subtypes NonPersonalData and PersonalData, which are associated using the relation hasData. To specifically indicate involvement of personal data, DPV provides the relation hasPersonalData.

Processing

Indicating applicable or relevant Processing

see more information: Primer | DPV spec

Representing processing as in the actions or operations over personal data, for e.g. collect, use, share, store. To indicate the origin or source of data, the concept DataSource along with relation hasDataSource is provided. For additional contextual information regarding operations or processing, such as whether it include humans or automation, the concept ProcessingContext is provided which can be associated using the relation hasContext (description of Context is provided later in the document). Examples of ProcessingContext include conditions such as profiling, automated decision making, human involvement.

Processing and Storage Conditions

Indicating applicable or relevant temporal and geo-spatial information

Indicating information about conditions or limitations associated with processing (including storage) of personal data - such as its location, duration, deletion (e.g. erasure mechanisms), or restoration (e.g. backup availability).

Legal Basis

Indicating applicable or relevant LegalBasis

see more information: Primer | DPV spec

A legal basis is a law or a clause in a law that justifies or permits the processing of personal data or use of technologies in the specified manner. It is a jurisdictional concept given the scoping of laws to specified countries or regions, as well as a domain-specific concept given the specific laws enacted scoped to particular domains. A law, such as the GDPR, that regulates the use of personal data requires that every processing of personal data must be justified with some legal basis to ensure it is lawful, and to further assess its correctness, accountability, and impact based on the obligations applicable. However, what is considered a legal basis varies greatly across cultures, domains, use-cases, and laws themselves. The aim of DPV is therefore to provide an upper-level abstract taxonomy of categories of legal bases, such as consent and contract, that can be customised and applied as needed.

Entities

Indicating applicable or relevant Entities

see more information: Primer | DPV spec

Representing the ‘entities’ or ‘actors’ involved in the processing of personal data. DPV provides a broad categorisation of entities based on their relevance in jurisprudence (i.e. legal roles) as well as categorisation in real-world (e.g. organisation types).

Data Controller

Indicating applicable or relevant DataController

Representing the organisation(s) responsible for processing the personal data.

DataSubject

Indicating applicable or relevant DataSubject

Representing the categories or groups (e.g. Users of a Service), or instances (e.g. Jane Doe) of individual(s) whose personal data is being processed.

Recipient

Indicating applicable or relevant Recipient

Represents the entities that receive personal data, e.g. when it is being collected, or transferred, or shared.

Technical Organisational Measure

Indicating applicable or relevant TechnicalOrganisationalMeasure

see more information: Primer | DPV spec

DPV provides a taxonomy of technical and organisational measures for representing information about how the processing of personal data is technically and organisationally protected, safeguarded, secured, or otherwise managed. This is distinct from what technology is used for carrying out processing, and instead refers to what measures are in place (i.e. what the technology intends to provide in terms of features).

Technical and Organisational measures consist of activities, processes, or procedures used in connection with ensuring data protection, carrying out processing in a secure manner, and complying with legal obligations. Such measures are required by regulations depending on the context of processing involving personal data. For example, GDPR (Article 32) states implementing appropriate measures by taking into account the state of the art, the costs of implementation and the nature, scope, context and purposes of processing, as well as risks, rights and freedoms.

Location and Jurisdiction

Indicating applicable or relevant Jurisdiction

see more information: Primer | DPV spec

Representing the locations associated with entities, processing, data, and other information that is important to consider jurisdictions and from that understand the applicability of laws, involvement of authorities, and discover rights.

Risk

Indicating applicable or relevant Risk

see more information: Primer | DPV spec

Risk refers to potential negative events. DPV enables representing risk(s) associated with a concept, for e.g. risk of unauthorised data disclosure related to processing, technical measure, or vulnerability of data subjects. In addition to the risk, DPV also enables representing the consequences (e.g. denial of service) and their impacts for specific entities (e.g. right violated for data subject).

Technology

Indicating applicable or relevant Technology

Representing the technologies used to implement the processing, or associated with the processing. For example, software products, cloud services, or AI technologies. This also involves specifying who is doing the implementing i.e. a technology and its implementer.

Rights

Indicating applicable or relevant Right

The concept Right represents a normative concept for what is permissible or necessary in accordance with a system such as laws. To associate rights with concepts that are relevant or within which those rights occur, the relation hasRight is used. Rights can be passive, which means they are always applicable without requiring anything to be done, or active where they require some action to be taken to initiate or exercise them. To represent these concepts, DPV uses PassiveRight and ActiveRight respectively. Rights can be applicable to different contexts or entities. To differentiate rights applicable or afforded to data subjects, the concept DataSubjectRight is used.

Rules

Rules are relevant to explicitly denote how a system should implement operations, and enable associating specifics such as requirements, constraints, and other forms of 'rules' that are needed in order to control executions or affect interpretations or achieve compliance (e.g. with law). DPV defines the concept Rule and relation hasRule to enable representation of such conditions and requirements, and provides a minimal set of concepts for types of rules, namely - representing Permissions, Prohibitions, and Obligations. DPV does not define additional semantics for rules and limits its scope and focus to provide a simple way to specify common rules associated with personal data and its processing activities, with the recommendation to consider other richer and mature efforts dedicated to expression of conditions and rules, such as: [[ODRL]], [[SHACL]], and [[RuleML]].

Process

In legal terminology, it is common to refer to all information about how personal data is being processed using the colloquial term processing. This results in confusion between the use of processing as a concept referring to all information (i.e. purposes, personal data, collection, storage, etc.), and processing as a concept referring to (only) the specific actions or operations (e.g. collect, use).

To avoid this ambiguity and enable clarity of information, DPV defines a new concept called Process for representing how the core concepts are combined and applied for a particular use-case. The association of a concept to Process is made using the relationships or properties provided for each concept. For example, to indicate a Process includes personal data, the relationship hasPersonalData is used along with the concept PersonalData.

Nesting Process to express granular models

Instances of Process can be nested, which means one instance can contain other instances, much like a box with several smaller boxes inside. This permits breaking down complex or dense use-cases into more granular ones and representing them in a more precise and modular fashion. Such a representation also facilitates reuse of the granular or modular processes, or in defining 'templates' and 'patterns', for example to craft a single process representing collecting and storing email addresses and using it in different processes for different purposes.

From the earlier example, consider the situation where a single Process instance consists of two additional instances representing: (i) data is stored using a data processor, (ii) data is used for Marketing. While it is certainly possible to represent all of this information within one single instance of Process, the adopter may decide to create separate instances of Process based on requirements such as reflecting similar separations for legal documentation or accountability purposes.

Alternative Models

Process is intended to provide a convenient concept for tying the core concepts together, and DPV does not make its use binding, nor does it constrain the relationships to only be defined between Process and the other core concepts. This is so as to permit using DPV in alternate or differing models. For example, where a central concept already exists, such as when describing relevant information for a smartphone app, the concept for App can be a replacement for Process based on statements such as <App> hasPurpose <SomePurpose>. Even in such cases, Process can provide granular expression thereby enabling description of different contexts within which the app uses personal data, such as for registration or complaint resolution. Therefore unless necessary for the use-case, DPV recommends using Process or its subtype/subclass as a central concept for ensuring interoperability.

An example of where the adopter or use-case wants to use another concept in a way which is not compatible with Purpose is the use of Purpose to indicate it involves some data i.e. <SomePurpose hasPersonalData SomePersonalData>, or to indicate which legal basis is used for that purpose by using the hasLegalBasis relationship. While not explicitly prohibited by DPV, the implications of using Purpose in this manner is that the personal data and processing and other associated concepts are now strictly tied to the purpose instance (and implementation). Changing any of these would mean changing the purpose, and in addition to these, it is not possible to combine multiple purposes together or have nested purposes with different details in the same manner as with a Process. Therefore, DPVCG recommends the use of Process to ensure compatibility between use-cases as well as to ensure the use of concepts does not create ambiguity or restrict further use-cases from reusing existing information.

When using custom-defined restrictions and data models, it is important to note the consequences such models have on interpretation and interoperability of data defined using DPV. For example, consider a compliance assessment tool that takes DPV data as input. If the tool expects a Process with links to relevant information, using other alternate models and relationships can produce invalid or incorrect results. To avoid this, we recommend:

  1. Documenting alternate models to clearly indicate their interpretation and use of DPV semantics;

  2. Where possible, ensuring and providing mappings between the alternate models and the Process or equivalent concepts within DPV so that the data can be transformed for interoperability;

  3. Consider contributing your idea or implementation of an alternate model to DPVCG to create a ‘library of models’, which can act as documentation for adopters and provide better understanding of the model's impacts on requirements and interpretation of information specified using DPV. This exercise can also assist in selecting a common model as the 'default' and to provide mechanisms for conversion/interoperability between it and other models.

Taxonomies of Key Concepts

The following sections provide an overview of the taxonomies (i.e. hierarchies of concepts) provided by DPV for its core concepts.

Purpose

Overview of top-level concepts in Purpose taxonomy

see purpose section in DPV spec

DPV’s taxonomy of purposes is used to represent the reason or justification for processing of personal data. For this, purposes are organised within DPV based on how they relate to the processing of personal data in terms of several factors, such as: management functions related to information (e.g. records, account, finance), fulfilment of objectives (e.g. delivery of goods), providing goods and services (e.g. service provision), intended benefits (e.g. optimisations for service provider or consumer), and legal compliance.

It is important to note the following for real-world implications of Purpose:

  1. There is no universal definition for what constitutes a ’purpose’ or what attributes are associated with it.

  2. There are several distinct ways to model purposes, e.g. as a ‘goal’ such as ‘Delivery of Ordered Goods’; or as a statement explaining the processing of personal data, e.g. ‘Sending newsletters to Email’.

  3. DPV does not define requirements for what is a ‘valid purpose’ as these are defined externally, e.g. in laws such as [[GDPR]] Article.5-1b where purposes are required to be ‘explicit and legitimate’.

  4. Purposes have contextual interpretations within their application and domains (i.e. depending on how they are used in an use-case). For example, ServiceProvision is interpreted differently across the use-cases of an online website, a goods delivery outlet, and a medical centre - even if they use the same terms or phrasing.

Following from the above, practical uses of DPV will likely need to extend one of the concepts within DPV’s purpose taxonomy to ensure its purpose descriptions are specific and understandable within the context of that use-case. We therefore suggest, where possible and appropriate, to create a customised purpose as required within the use-cases by extending one or several purposes from the DPV taxonomy and to provide a human readable description to assist in its accurate interpretation (e.g. DPV prefers use of skos:prefLabel for its concepts and dct:title for other documentation).

Sector of Purpose Application

DPV provides Sector that can be used to indicate the relevant information to further clarify or indicate how a purpose should be interpreted. Sector, used with the hasSector relation, denotes the sector or domain of application, such as Manufacturing. This can be used alongside existing official sector taxonomies such as [[NACE]] (EU), [[NAICS]] (USA), or [[ISIC]] (UN), as well as commercial industry taxonomies such as [[GICS]] maintained by organisations MSCI and S&P. Multiple classifications can be used through mappings between sector codes such as the [[[NACE-NAICS]]] provided by EU.

DPVCG provides an interpretation of the NACE revision 2 codes which uses rdfs:subClassOf to specify the hierarchy between sector concepts. It is available as [[DPV-NACE]]. The NACE codes within this extension have the namespace nace and are represented as nace:NACE-CODE.

We are working on further alignments between the NACE codes and DPV's purpose taxonomy, and welcome contributions for the same.

While the use of Sector for restricting (personal data processing) purposes is an uncommon and undocumented practice in terms of legal enforcement, we provide this feature as the use of sector code can assist with identification and interpretation of information as well as legal or organisational obligations and policies. For example, indicating some purpose is to be implemented within manufacturing or scientific research facilities (e.g. medical centres) can assist in ensuring specific types of access control and policies are defined and implemented.

Data & Personal Data

Personal Data concepts within DPV and their extension in dpv-pd

goto spec: DPV spec DPV-OWL

DPV provides the concept PersonalData and the relation hasPersonalData to indicate categories or instances of personal data that are being processed. [[DPV]] only provides a few concepts for describing personal data, e.g. as being sensitive. For additional specific categories of personal data as required within use-cases, the [[[PD]]] extension provides a rich taxonomy that extends the DPV's personal data taxonomy. This separation is to enable adopters to decide whether the extension's concepts are useful to them, or to use other external vocabularies, or define their own.

Real-world and common usage of personal data is at both an abstract level as well as specific level. For example, consider the sentence "We use your Email information...", which uses "Email" to represent a reference to what personal data is involved. Here, one may interpret Email as representing only the email address, or as a broad set of possible information related to emails, such as email address, email senders and recipients list, email service provider, email usage statistics and so on.

For ensuring clarity and resolving any potential ambiguity, DPV recommends being as specific as possible. This means where there is ambiguity as to what the information may be associated with or within a concept, it is advisable to resolve that ambiguity - either by choosing a more accurate concept from the taxonomy and/or by creating one through extension of an existing concept.

In addition to above, it is also challenging to accurately represent how concepts function within real-world use in terms of their encapsulation within one another. For example, when establishing the DPV, we discussed the modelling of personal data categories based on the scenario where a picture of passport is initially collected, and from it various categories are extracted, such as - name, address, and photo. For representing this, merely stating the personal data as ‘passport photograph’ would not be entirely accurate as there is additional information within the photograph.

A solution was established whereby the use-case is expected to declare explicitly what information it intends to collect or use with sufficient details and clarity. For the passport photograph scenario, the use-case would use the concept PassportPhoto as the data it collects, and indicate that it extracts or derives Name and Age from it. Or, it directly declares that it collects all three concepts. This is necessary to ensure the interpretation that using PassportPhoto means having access to and using all of its subsequent personal data categories.

While this is one possible solution, other methods exist, such as explicitly declaring the data categories and their encapsulation within one another, such as by reusing hasPersonalData or creating additional properties (e.g. containsData) to indicate a personal data concept, i.e. the passport photo, contains information associated through the relation, i.e. name, age, etc. We welcome discussions regarding both these methods.

PII (Personally Identifiable Information) and Sensitivity of data are common concepts in relation to use of personal data. PII is a term with variable definitions depending on the particular interpretation of personal and identifiable. While ISO standards define PII as a concept closer to the personal data definition within DPV, this term can still result in confusion and ambiguity. DPV therefore defines IdentifyingPersonalData

to explicitly denote that some data 'identifies' a person.

DPV provides the SensitivePersonalData concept, and to indicate the degree of sensitivity, we recommend using the SensitivityLevel concept and associate it with hasSensitivityLevel.

Non-Personal and Synthetic Data

While the focus of DPV is on Personal Data, there may be a need to represent Non-Personal Data within the same contextual use-cases. For example, if the personal data has been fully, completely, and irreversibly anonymised, then it can no longer be said to be personal data. To enable this, and other representations, DPV provides the concept Data to represent any data, with subtypes PersonalData and NonPersonalData. Using these as annotations can assist in clearly indicating which data should be protected, or protected with more severe measures, or to determine the scope of regulations which only apply over operations involving personal data.

Data is further subtyped as SyntheticData - a new concept that represents generated data intended to mimic personal data within a system so as to aid in development and testing without using actual or real personal data. Since such synthetic data may be used in systems that assume it is personal data, it has not been declared as a specific category of personal or non-personal data to permit its use as either.

Categorisation based on Source

The concept DataSource refers to information associated with processing contexts for indicating how the data is sourced or obtained. In some cases, it may be desirable to directly express this information over the data itself, such as indicating a dataset is "collected personal data", or that a storage policy only applies over "inferred data". To enable such uses, DPV provides the following subtypes of personal data: CollectedPersonalData, DerivedPersonalData, InferredPersonalData, GeneratedPersonalData, and ObservedPersonalData. Here the terms derive and infer relate to creation of additional data based on existing data, whereas generate refers to creation of new data that is not derived or inferred.

Sensitive and Special Categories

For indicating personal data which is sensitive, the concept SensitivePersonalData is provided. For indicating special categories of data, the concept SpecialCategoryPersonalData is provided. In this, the concept sensitive indicates that the data needs additional considerations (and perhaps caution) when processing, such as by increasing its security, reducing usage, or performing impact assessments. Special categories, by contrast, are a 'special' type of sensitive personal data requiring additional considerations or obligations defined in laws (or through other forms) that regulate how they should be used or prohibit their use until specific obligations are met.

DPV currently categorises personal data as sensitive based on existing research and literature, and as special categories based on [[GDPR]] Article 9. Both are subject to expansion in the future based on requirements and technological progress, and we welcome well-formed proposals for the same.

The sensitivity of personal data can be universal, where that data is always sensitive, or contextual, which means a use-case needs to declare it as such. For indicating personal data is sensitive (or special), it is sub-typed or declared as an instance of SensitivePersonalData, as shown in the example below.

In using these concepts, it is important to note that DPV's modelling of sensitive and special categories is non-exhaustive and as such should not be taken as an authoritative fact or a 'source of truth'. To assist with better identifying sensitive concepts, work is ongoing within DPV to identify and provide a reference list of (potentially) sensitive and special categories, and we welcome contributions for the same.

Anonymised Data

To specify data is anonymised, DPV provides two concepts. AnonymisedData for when data is completely anonymised and cannot be de-anonymised, which is a subtype of NonPersonalData. And, PseudonymisedData for when data has only been partially anonymised or de-anonymisation is possible, which is a subtype of PersonalData.

It is important to note that these definitions can be contextually difficult to apply or interpret. For example, consider the case where some data is indicated as being anonymised by itself without any available information to de-anonymise it. Though this can be considered as anonymised data, if there were to exist an external method or dataset that when combined with the anonymised dataset provides de-anonymised information - then this does not fit the definition of anonymised data.

Therefore, when indicating AnonymisedData, the understanding is that it is completely anonymised. Otherwise, given that regulations targeting PersonalData do not apply over anonymised data, the labelling of pseudo-anonymised or contextually anonymised data may lead to misleading representation and violating obligations.

We are exploring the provision of the concept ContextuallyAnonymisedData as a subtype of PseudonymisedData to indicate situations where data is locally or contextually considered anonymised without any guarantees of its anonymity outside of that context.

Processing Operations

Overview of top-concepts in Processing taxonomy

see processing section in DPV spec

DPV’s taxonomy of processing concepts reflects the variety of terms used to denote processing activities or operations involving personal data, such as those from [[GDPR]] Article.4-2 definition of processing. Real-world use of terms associated with processing rarely uses these same terms, except in cases of specific domains and in legal documentation. On the other hand, common terms associated with processing are generally restricted to: collect, use, store, share, and delete.

DPV provides a taxonomy that aligns both the legal terminologies such as those defined by GDPR with those used commonly. For this, concepts are organised based on whether they subsume other concepts, e.g. Use is a broad concept indicating data is used, which DPV extends to define specific processing concepts for Analyse, Consult, Profiling, and Retrieving. Through this mechanism, whenever an use-case indicates it consults some data, it can be inferred that it also uses that data.

The definitions for describing and interpreting each processing concept is based on the following sources: language dictionaries (predominantly Oxford English), use of the term within legal documents (e.g. GDPR case law), and technology-specific interpretations such as for IT systems. Despite these, there may be distinct interpretations for what a term represents based on differences in practices, culture, language, and domains. In case an adopter or a use-case foresees such ambiguity or confusion, it is advisable to extend the relevant concepts and define them as needed, or create a separate extension.

Processing and Storage Conditions

Indicating conditions associated with Processing and Storage

see processing and storage conditions section in DPV spec

The processing taxonomy uses the concept Use and Store to indicate data is being used and stored. To specify additional information such as its location, erasure or deletion, the generic concepts and relations associated with processing (i.e. location and duration) can be used. However, to emphasise that information about storage - such as policies, conditions, rules, or documentation - are critical on considerations of data protection and privacy as well as legal compliance, DPV provides specific concepts related to these.

To indicate additional information about processing, such as where it takes place (location), or for how long (duration), DPV provides ProcessingCondition concept and the relation hasProcessingCondition to associate it. Specific processing conditions are represented as ProcessingLocation (also a subtype of dpv:Location) and ProcessingDuration (also a subtype of dpv:Duration).

The concept StorageCondition is a specific category of ProcessingCondition associated with the storage of data, and is associated using the relation hasStorageCondition. It is specialised to indicate StorageDuration, StorageDeletion, StorageRestoration, and StorageLocation. This enables a document to directly specify information such as: "storage duration is 6 months" or "storage restoration uses 3 geo-distinct backup servers".

Data Source

see data source section in DPV spec

For declaring the source of data, the DataSource concept along with hasDataSource relationship is provided to indicate where the data is collected or acquired from. For example, data can be obtained from the data subject directly (e.g. given via forms) or indirectly (e.g observed from activity, or inferred from existing data), or from another entity such as a third party.

It is important to understand the distinction between a data source and data origin. The source of data refers to the direct or indirect place, entity, or other concept from which the data was collected (in any manner). The origin of data refers to the specific entity or artefact which produced or created the data. For example, consider a company that collects data from a public database that is populated by government bodies who themselves collect that data from people. In this case, the origin of that data is ultimately the people, but the sources of this information are the people, the government bodies, and the public database.

Using such two synonymous terms (source and origin) can lead to ambiguity and confusion. Therefore, we suggest using data source to indicate information as contextually required within a use-case. In most cases, this would be the direct source of data (i.e. public database in above example). In other cases, it would be relevant to indicating whether data originated from the data subject.

Data can be sourced from a public or a non-public source. The distinction is important given that a public source has different implications (and justifications) for the availability of that data as well as how it can be used. To represent these, DPV uses sub-types of data source as PublicDataSource and NonPublicDataSource. Public data sources can be datasets published by authoritative bodies, or census reports, or (public) websites. Non-public data sources are anything that is not publicly available - so data subjects, third parties, etc.

Automation, Human Involvement, and Decision Making

see automation and human involvement section in DPV spec

Automation is a broad concept that refers to automated or reduced human involvement in a process. Most (if not all) processing operations can be considered to be automated given that they are operated by machines and utilise digital information and mediums. However, even within this, specific forms and descriptions of automation are more important than others. For example, if the processing operations are intended to produce an output that will result in prosecution - then information about the automation utilised in this process is needed to understand if the decisions are fair, correct, unbiased, or to understand whether there has been some human oversight or involvement at various stages.

DPV's concepts intentionally refer to "automation" rather than "artificial intelligence", where the former is considered a broader and more inclusive term than the latter. It also avoids delving into investigations of what is and how to define "AI". Given that AI is a form of automation, whether directly or indirectly applied, these terms within the DPV are also intended to supplement use-cases where AI is used, and to represent information regarding the degree of automation and involvement of humans within its processes.

DPV provides AutomationLevel to represent the degree of automation, and the relation hasAutomationLevel to associate it with contextual concepts. The levels of automation, in order from least automated to most, are: NotAutomated, AssistiveAutomation, PartialAutomation, ConditionalAutomation, HighAutomation, FullAutomation, and Autonomous.

To represent how humans are involved, the concept HumanInvolvement and relation hasHumanInvolvement are provided. Examples of human involvement include: HumanInvolvementForControl, HumanInvolvementForDecision, and HumanInvolvementForOversight.

In addition to 'human' involvement, DPV also provides various ways in which 'entities' (e.g. organisations, individuals) can be involved through the EntityInvolvement and the relation hasEntityInvolvement. These are categorised as permissive involvement where an entity can do something, and non-permissive involvement where an entity cannot do something.

To indicate the process contains some form of decision making, the processing context concept DecisionMaking is provided, and extended as AutomatedDecisionMaking to represent automation in decision making. To represent information about how the automation works or other relevant information, the concept AlgorithmicLogic is provided. Additionally, the concept EvaluationScoring is provided for indicating the processing evaluates or assigns scores (or metrics), InnovativeUseOfNewTechnologies to indicate there are innovative uses of novel technologies, and SystematicMonitoring to indicate the processing performs a systematic (or systemic) monitoring. These additional concepts are intended to model areas or topics that are considered sensitive or high-risk or require caution such as under [[GDPR]].

Technical and Organisational Measures

Overview of Technical and Organisational Measure concepts in DPV

see technical and organisational measures section in DPV spec

DPV's taxonomy of tech/org measures are structured into four groups representing: TechnicalMeasure for technical solutions, OrganisationalMeasure for organisational solutions, LegalMeasure for legally enforceable solutions, and PhysicalMeasure for material solutions. Each term has a dedicated taxonomy that expands upon the core idea to provide a rich list of measures that are intended to protect personal data (and its associated entities and consequences).

DPV is looking to enrich its taxonomy of technical and organisational measures through adoption of existing standards, best practices, and widely relevant practices. For this, we welcome contributions of concepts from sources such as ISO/IEC standards, ENISA, NIST, IETF, and others.

Technical Measures

Overview of Technical Measures in DPV (click to open in new window)

Technical Measures are implemented through technological means, such as machine-processing or automation or tools and services that are primarily technological in nature. To distinguish these with organisational measures, consider whether the measure is for human organisation and management (which makes it organisational) or an implementation detail (which makes it technical).

Examples of technical measures include use of specific access control methods, encryption, anonymisation, security protocols, and other similar concepts.

Organisational Measures

Overview of Technical Measures in DPV (click to open in new window)

Organisational measures are a corresponding counterpart to technical measures, and are intended to be implemented or realised through human action, whether directly by an individual, teams, or through an organisation's management (hence the term organisational). Implementing such measures may include use of technology or a tool, for example - a security training exercise that is carried out using some software, or to use information systems such as dashboards to keep track of information. However, the concepts themselves are structured as organisational based on who or what has to decide or implement the action. If it is to be performed through a technological means, then it is a technical measure. If it is to be performed through human or organisation management, then it is an organisational measure.

Examples of organisational measures include staff training, policies, notices, and other such concepts - which indicate that reflect organisational decisions and actions (e.g. privacy notices, policy for how to train new recruits).

Physical Measures

Physical measures are implementations that have a physical material presence or address the physical safety and security. For example, physical authorisation methods such as door locks, network security using physical means such as locked server rooms, and physical device security. Physical measures are represented using the concept PhysicalMeasure and associated with using the relation hasPhysicalMeasure.

Policies

A Policy is an organisational measure (given that it is decided and enabled by humans) that can be used to describe procedures or encode actions. It may be implemented manually (e.g. by employees) or technologically (e.g. by software or agents). Policies are an important aspect of personal data processing, and can be associated with a wide variety of concepts - such as processing operations, purposes, specific data categories, or legal bases. To enable such uses, DPV provides the relation hasPolicy and isPolicyFor to link or associate policies with their respective subjects or topics.

DPV does not provide the concept PrivacyPolicy, but instead suggests to use the better expressed and less ambiguous term - PrivacyNotice. This is to explicitly denote that the role of what is considered common as a "privacy policy" is actually a "notice" intended for end users and other individuals, instead of being an internal policy document for how the company should approach 'privacy'. More information about notices is provided in the next section.

Common policies provided by DPV include: InformationSecurityPolicy for how information is secured or safeguarded, and RiskManagementPolicy for how risks should be managed. In the future, we expect there to be more concepts added for dedicated policies as regulations and the general culture of privacy and data protection progresses.

Notices

A Notice is an artefact intended to provide information, most commonly to individuals who are viewing, visiting, or otherwise using a service. Legally, a 'notice' is provision of information with the intention of imparting knowledge. DPV represents notices through the concept Notice as a form of Organisational Measure, with the relation hasNotice enabling use or association of notice within some context.

Notices may contain only information, or also have interactive components intended to make decisions, offer choices and controls, or otherwise carry out processes that go beyond mere provision of information. Currently, PrivacyNotice and ConsentNotice are provided as specific forms of notices.

Records

Records, or storing of information with the intention to use it in the future, are an important obligation for several legal as well as other obligations related to data protection and privacy. To represent these, DPV provides the RecordsOfActivities concept for records in general, and DataProcessingRecords for records that relate to the processing of personal data. The concept ROPA, based on [[GDPR]] Art.30, refers to 'records of processing activities' which are like an index of data processing activities. Where consent is used as the legal basis, the concept ConsentRecord relates to records related to such consent and its collection / use for processing of personal data.

DPV also contains the Record concept as a type of Processing operation, and RecordManagement as a type of Purpose. The former refers to recording of personal data as a means to obtain it (e.g. record a conversation), while the latter relates to the use of personal data towards creating records and managing them as a purpose (e.g. record consent was given). These are distinct, though relevant to the organisational measures related to record keeping.

Record keeping may require further vocabularies to represent details such as various temporal annotations, provenance, statuses, or other contextual information that is not possible or provided for by DPV's concepts. In such cases, we suggest utilising other standardised vocabularies where applicable.

Security

All technical and organisational measures are intended, by definition, to provide better security and handling of personal data and its associated processing and other activities. In DPV's taxonomy, some measures directly and specifically relate to security as their topic, whilst others provide their intended benefit indirectly. For example, the concept SecurityAssessments is an organisational measure relating to how security is assessed (and thus ultimately improved) - and is directly associated with security as a topic. Whereas a concept such as ProfessionalTraining relates to measures that are not directly tied to security, but can be associated in cases where the training is related to security or specific security measures or risks (e.g. cybersecurity data breach mitigations). The purpose EnforceSecurity provides a common umbrella term for personal data that is utilised for enacting and enforcing security measures, such as for authorisation and authentication.

Technical measures that relate specifically to security include SecurityMethod for providing security, and its subtypes for DocumentSecurity, FileSystemSecurity, HardwareSecurityProtocols, IntrusionDetectionSystem, MobilePlatformSecurity, NetworkSecurityProtocols, OperatingSystemSecurity, WebBrowserSecurity, WebSecurityProtocols, and more. Organisational measures that relate specifically to security include SecurityProcedure, and its subtypes for BackgroundChecks, CybersecurityAssessments, CybersecurityTraining, SecurityAssessments, and more.

Data Processing Agreements

The term Data Processing Agreement refers to a broad concept related to contracts or agreements between entities representing conditions regarding the processing of (personal-)data. This can include ad-hoc 'data handling' policies such as NDAs, embargoes, and enforcement of practices, as well as more formal and legal binding contractual obligations such as those between a Controller and a Processor.

To represent such concepts, DPV provides LegalAgreement, along with subtypes for NDA (Non-disclosure agreements), ContractualTerms, and DataProcessingAgreement. In these, it is important to remember that while contract can also be as a form of legal basis, the concept represented here is not necessarily the same contract as that is used to justify the processing of personal data with a data subject. Instead, contracts are a broad category representing contractual terms governing data handling within or with an entity.

For representing specific agreements between entities (other than those with data subjects - which are covered in Legal Basis taxonomy), DPV provides the following types of agreements:

  • ControllerProcessorAgreement: An agreement between a Controller and a Processor, where the Controller instructs the Processor(s) to carry out processing on its behalf.
  • JointControllersAgreement: An agreement between two or more Controllers to act as a 'Joint Controller'.
  • SubProcessorAgreement: An agreement between two or more Processors where one Processor instructs another to carry out processing on its behalf.
  • ThirdPartyAgreement: An agreement between a Data Controller or a Data Processor, and a Third Party. Note that this is a loosely defined concept, as depending on the jurisdiction, this relationship may result in the Third Party being a Data Controller or a Joint Data Controller.

To indicate the entities involved in an agreement, the relation hasEntity can be used, or relations associated with specific roles to indicate contextuality. For example, using hasDataController with a ControllerProcessorAgreement denotes the Data Controller for that agreement.

Data Transfer Safeguards

While all technical and organisational measures are intended to safeguard personal data and its associated activities, there may be contextual or use-case requirements to explicitly indicate safeguards against or for specific criteria. To enable such use, DPV provides the concept Safeguard and its subtype SafeguardForDataTransfer for indicating application when data is being transferred. Through these, it is possible to represent aspects such as policies for data transfers, specific measures such as encryption being applied, and other pertinent information in combination with DPV's concepts from technical and organisational measures.

Impact Assessments

Types of Impact Assessments in DPV

DPV provides the concept Assessment to represent various assessments and related procedures and processes that an organisation or entity may undertake. An important subtype of such assessments is the ImpactAssessment which refers to calculating or determining the likelihood of impact of an existing or proposed process and its involved risks or detriments. This could be inward facing - such as impact to the organisation, or outward facing - regarding impact to stakeholders such as individuals.

To represent privacy related impact assessments, the concept PIA (Privacy Impact Assessment) is provided. Similarly, the concept DPIA is provided for Data Protection Impact Assessment. Without getting into specifics of jurisdictional nomenclature (more specifically GDPR), DPVCG considers PIA and DPIA to be distinct terms based on their topic of focus. The PIA process is based on privacy as its focal point whereas the DPIA process considers the processing of personal data. Both refer to impacts (e.g. individuals affected), and may contain overlapping processes and outcomes. DPVCG suggests using the concept most suitable or applicable for a given use-case, or which matches the terminology of an obligation. For example, the concept DPIA would be more suitable for systems based on GDPR's requirements. It is also possible to utilise both terms to refer to the same process, for example to specify that an assessment satisfies both PIA and DPIA criteria (as suggested by CNIL - the French DPA).

Other assessments represented within DPV include: DataTransferImpactAssessment for impacts arising from data transfers, LegitimateInterestAssessment for determining the suitability of legitimate interest as a lawful basis, and SecurityAssessments to identify gaps, vulnerabilities, risks, and effectiveness of controls.

Location and Jurisdiction

see location and jurisdiction section in DPV spec

To represent location, the concept Location along with relations hasLocation is provided. For geo-political locations, the concepts such as Country and SupraNationalUnion are subtyped, with hasCountry and ThirdCountry with hasThirdCountry provided for convenience in common uses (e.g. data storage, transfers).

To define contextual location concepts, such as there being several locations, or that the location is 'local' to an event, DPV provides two concepts. LocationFixture specifies whether the location is 'fixed' or 'deterministic', with subtypes for fixed single, fixed multiple, and variable locations. LocationLocality specifies whether the location is 'local' within the context, with subtypes for local, remote, within a device, or in cloud.

To represent locations as jurisdictions, the relation hasJurisdiction is provided. The concept Law represents an official or authoritative law or regulation created by a government or an authority. To indicate applicability of laws within a jurisdiction, the relation hasApplicableLaw is provided.

The [[[LOC]]] extension provides taxonomies extending these concepts, such as to represent specific countries and regions, and the [[[LEGAL]]] extension further provides laws, authorities, memberships, adequacy decisions, and other 'legal' information associated with the locations as 'jurisdictions'.

Contextual Information

see processing and storage conditions section in DPV spec

For indicating additional information regarding how the expressed information should be interpreted, or how it applies within a particular context, the Context concept along with the hasContext relationship can be used. Context refers to a generic collection of concepts that assist in indicating information such as the necessity, importance, environment - which aid in the interpretation or application of other core concepts.

Importance and Necessity

DPV provides two subtypes of concepts to denote contextual Importance and Necessity, which can be applied to specific contexts such as Process, Purpose, PersonalData.

Importance is similar in application to Necessity, and provides a way to indicate how central or significant the indicated operation(s) are to the context (e.g. to the Controller). Subtypes of importance are PrimaryImportance to indicate 'main' or 'central' or 'primary' importance, and SecondaryImportance to indicate 'auxiliary' or 'peripheral' or 'secondary' importance.

Necessity enables specifying whether the contextual information is Required, is Optional, or is NotRequired. These can be used to indicate, for example, which parts of processing operations (e.g. purposes, personal data) are optional, and whether a particular processing operation is required to be carried out.

Duration and Frequency

To express the duration of events or operations, such as how long processing will take or the validity of consent, the concept Duration can be used. Duration is indicated using the relation hasDuration, and has the following subtypes:

  • TemporalDuration - indicating a relative temporal duration, e.g. 6 months.
  • UntilTimeDuration - indicating duration that occurs until the end of specified time, e.g. until 31 DEC 2022.
  • UntilEventDuration - indicating duration that occurs until the end of specified event, e.g. until account is closed.
  • FixedOccurrencesDuration - a duration that is based on number of occurrences, e.g. until you view it 3 times
  • EndlessDuration - indicating a duration without an end condition or temporal notation.

Frequency indicates how frequently something occurs. Statistically, this can be expressed as the combination of number of occurrences and a time period, which can further be expressed as a probabilistic value or a percentage. For example, for something occurring once every year, the frequency is: 1 or 100% for 1 year. While such quantified representations are important for determining metrics and performing operations, DPV focuses on the qualitative labelling of such representations within a specific context.

The relation hasFrequency associates a frequency with a context, and can be expressed using the following subtypes:

  • ContinuousFrequency - indicates things occurring continuously, e.g. location collection happens continuously.
  • SporadicFrequency - indicates things occurring sporadically or rarely or not often, e.g. collecting system usage logs every month.
  • OftenFrequency - indicates things happen often or regularly or commonly, e.g. online status is reported every 5 mins.
  • SingularFrequency - indicates things happen only once.

Scope and Justification

Scope, associated using the relation hasScope, indicates the extent or range or boundaries associated with(in) a context. For example, where processing only takes place for a specific service or within a jurisdictional framework.

Justification, associated using hasJustification, is another generic concept representing the argument or justification or reason provided to explain or document information within the specific context. For example, where an audit was rejected the justification for this rejection can be associated. Or, if processing was decided to be continued despite an assessment showing high-risk criteria, the outcome can express a justification.

Data and Processing Scales

Scale, associated using hasScale, refers to a measurement along some dimension. DPV provides (qualitative) scales for expressing Data Volume, Data subjects, and Geographical Coverage of processing. Along with these, DPV also provides a Processing Scale to express combinations of these. NOTE: The actual meaning or quantified amounts for each concept are not defined due to their interpretation based on contextual factors such as legislations, guidelines, domains, and variations across industries.

DataVolume refers to the volume or amount of data in the form of a scale with the following subtypes: HugeDataVolume, LargeDataVolume, MediumDataVolume, SmallDataVolume, SporadicDataVolume, SingularDataVolume, and is associated using hasDataVolume.

DataSubjectScale refers to the volume or amount of data subjects in the form of a scale with the following subtypes: HugeScaleOfDataSubjects, LargeScaleOfDataSubjects, MediumScaleOfDataSubjects, SmallScaleOfDataSubjects, SporadicScaleOfDataSubjects, SingularScaleOfDataSubjects, and is associated using hasDataSubjectScale.

GeographicCoverage refers to the volume or amount of geographical area covered by the processing in the form of a scale with the following subtypes: GlobalScale, NearlyGlobalScale, MultiNationalScale, NationalScale, RegionalScale, LocalityScale, LocalEnvironmentScale, and is associated using hasGeographicScale.

ProcessingScale, also associated using hasScale, represents an interpretation of the other scales to express whether the combination entails a specific threshold for qualifying as 'large scale'. Specific subtypes defined for these are: LargeScaleProcessing, MediumScaleProcessing, SmallScaleProcessing.

Statuses

see statuses section in DPV spec

To assist with expressing the state or status associated with various activities, DPV provides the Status concept that can be associated contextually using the hasStatus relation. Specific subtypes are provided as ActivityStatus, ComplianceStatus including Lawfulness, AuditStatus, ConformanceStatus, and RequestStatus.

ActivityStatus represents a state or status of an activity's operations and lifecycle, which includes ActivityProposed, ActivityOngoing, ActivityHalted, ActivityCompleted, and ActivityNotCompleted.

ComplianceStatus represents status associated with compliance with some norms, objectives, or requirements. Types include Compliant, PartiallyCompliant, NonCompliant, ComplianceViolation, ComplianceUnknown, ComplianceIndeterminate. The association with a law or objective can be specified using hasApplicableLaw or hasPolicy directly for the status or indirectly through the concept whose status is being represented.

Lawfulness represents a special type of ComplianceStatus which relates to legal compliance, or lawfulness, and has types Lawful, Unlawful, and LawfulnessUnkown.

AuditStatus represents the state or status of an audit, where the term audit is loosely defined, and may or may not relate to legal compliance - for e.g. for impact assessments, or as part of certification, or organisational quality assurance processes. Types of audits include AuditApproved, AuditConditionallyApproved, AuditRejected, AuditRequested, AuditNotRequired, and AuditRequired.

ConformanceStatus represents the status of conformance, which is defined distinctly from compliance by considering voluntary association or following of a guideline, requirement, standard, or policy, and where compliance is related to the (legal or other systematically defined) conformity of a given system or use-case with rules which may dictate obligations and prohibitions that must be followed. To provide an illustrative example, consider conformance with a standard on best practices regarding security may assist in the demonstration of compliance with a legal norm requiring organisational measures of security. Types of conformance defined are: Conformant and NonConformant.

RequestStatus represents the state or status of requests, which can be between entities such as data subjects and controllers regarding exercising of rights, or between controllers and processors regarding processing operations, or between authorities and controllers regarding compliance related communications. Types of request statues are: RequestInitiated, RequestAcknowledged, RequestAccepted, RequestRejected, RequestFulfilled, RequestUnfulfilled, RequestRequiresAction, RequestRequiredActionPerformed, RequestActionDelayed, and RequestStatusQuery.

Risk Assessment

see risk assessment section in DPV spec

For risk management, DPV's provides a lightweight risk ontology based on commonly utilised concepts regarding risk mitigation and risk management. While these concepts permit rudimentary association of risks and mitigations within a use-case, it is important to note that DPV (currently) does not provide comprehensive concepts for risk management.

For more developed representations of risk assessment, mitigation, and management vocabularies, we suggest the adoption of relevant standards, such as the ISO/IEC 31000 series, and the use of [[[RISK]]] which further expands on the DPV risk assessment concepts to define incidents, data breaches, their associate reports and notices, risk matrices, and other risk management concepts.

Risk and Mitigation

The central concepts within DPV's risk management vocabulary are Risk (associated using hasRisk) and its mitigation through RiskMitigationMeasure (associated using mitigatesRisk and conversely isMitigatedByRisk). Through these, risk can be associated with specific concepts (e.g. data categories) or contexts (e.g. process).

To express quantified and qualified attributes associated with risk, such as levels and severity, DPV provides the following concepts: RiskLevel (associated using hasRiskLevel) to indicate the 'level' or 'magnitude' of risk; Severity (associated using hasSeverity) to indicate the magnitude of being unwanted or causing unwanted impacts, and Likelihood (associated using hasLikelihood) to indicate the probability of it taking place.

To express remaining risk after mitigation, the concept ResidualRisk (associated using hasResidualRisk and conversely isResidualRiskOf) is provided. To represent the management of risk and the procedures and methods associated with it, the concept RiskManagementProcess is defined as part of the Technical and Organisational Measures.

Consequences and Impacts

To represent the consequences and impacts of a risk event taking place, DPV provides the following concepts: Consequence arising from the context (e.g. data breach or unauthorised access to data) and the Impact caused (e.g. identity theft).

Consequences are associated using hasConsequence, and subtyped to indicate whether the consequence was due to the event successfully taking place (ConsequenceOfSuccess) or due to its failure in successfully completing or not taking place (ConsequenceOfFailure) or as side-effects (ConsequenceAsSideEffect).

Impacts are associated using hasImpact, with the specific entity being impacted indicated using hasImpactOn. Impacts are subtyped to represent: Benefit, Detriment, Damage (MaterialDamage, NonMaterialDamage), and Harm

.

Rights and Rights Exercise

The concept Right represents a normative concept for what is permissible or necessary in accordance with a system such as laws. To associate rights with concepts that are relevant or within which those rights occur, the relation hasRight is used. Rights can be passive, which means they are always applicable without requiring anything to be done, or active where they require some action to be taken to initiate or exercise them. To represent these concepts, DPV uses PassiveRight and ActiveRight respectively. Rights can be applicable to different contexts or entities. To differentiate rights applicable or afforded to data subjects, the concept DataSubjectRight is used.

The information regarding hwo to exercise a right is provided through RightExerciseNotice and associated using the isExercisedAt relation. This information can specify contextual information through use of other concepts such as Process to denote a necessary Purpose of IdentityVerification as part of the rights exercise.

A RightExerciseActivity represents a concrete instance of a right being exercised. It can include contextual information such as timestamps, durations, entities, etc. that can be part of record-keeping. An activity can be a single step related to rights exercise -- such as the initial request to exercise that right, or its acknowledgement, or the final step taken to fulfil the right (e.g. provide some information), or it can also be a single activity describing the entire rights exercise process(es). To collate related activities associated with a rights exercise (e.g. associated with a specific data subject or a specific request), the concept RightExerciseRecord is useful. The information provided to describe or in fulfilment of a right exercise is represented by RightFulfilmentNotice and that associated when a right exercise cannot be fulfilled is represented by RightNonFulfilmentNotice.

To indicate contextual information about Right Exercise activities, DPV suggests reuse of existing relations, such as those from DPV itself and [[[DCT]]]. For example, dct:accessRights can be used to specify constraints or requirements regarding access (e.g. log in required), or dct:hasPart and dct:isPartOf to express records and its contents, dct:valid to express validity constraints on the exercising being made available, foaf:page to specify the location or provision of notice, and hasStatus to represent the status of an activity.

When rights require the provision of information which beyond a static common notice, for example a document personalised to the individual's information, or a dataset containing the individual's data, DPV recommends using [[[DCAT]]] to model the contents as a dcat:Resource or other relevant concepts from [[DCAT]] and [[DCT]] such as dct:format, dct:accessRights, and dct:valid.

Rules

DPV provides the concept Rule to specify requirements, constraints, and other forms of 'rules' that are associated with specific contexts (e.g., processing activities) using the relation hasRule. DPV provides three forms of Rules to represent Permission, Prohibition and Obligation, and their corresponding relations hasPermission, hasProhibition and hasObligation, to indicate a Rule that specifies whether something is permitted, prohibited or an obligation, respectively. DPV does not define additional semantics for rules and limits its scope and focus to provide a simple way to specify permissions, prohibitions, and obligations as common rules associated with personal data and its processing activities. For a more extensive and richer set of semantics and concepts to represent rules, DPVCG suggests looking towards other languages, such as [[ODRL]], [[SHACL]], and [[RuleML]] that have been developed with the specific goal of representing and applying rules. We welcome contributions for aligning DPV with these, and for providing guidance on how to complement DPV's rule-based concepts with external languages.

In representing Rules, DPV only provides the concept and does not express any inherent semantics on what those rules mean in relation to each other. For example, DPV does not express Permission to be non-compatible or disjoint from Prohibition. This is to separate the interpretation and application of rules based on the necessities of a use-case. For example, in a legal investigation it may be prudent to specify permission and prohibition can never occur together, but this may not be true if there are different legal requirements that allow a prohibition to be resolved or deferred, such as through another permission that overrides the prohibition.

DPV does not specify 'default' in relation to rules, i.e. it does not provide an interpretation of whether some rules apply automatically unless otherwise declared. For example, in declaring an instance of Process, the assumption is that the activities are modelled for what is happening or what is intended/planned to happen. The explicit annotation using a Permission rule adds information about whether some activity is permitted (and its associated information). Instead, if the use-case is using DPV to only document activities that are permitted, there is no need to explicitly specify the permissions. Similarly, just because something is happening or planned to happen, it cannot be assumed to be permitted (e.g., from evaluation of legal requirements).

To associate a rule with a specific context, which can be a Process or PersonalData or Purposes, the relations hasPermission, hasProhibition and hasObligation are provided. Additional types of rules can be added to DPV by extending the Rule Concept (e.g., :MyRule rdfs:isSubClassOf dpv:Rule).

Extensions

To supplement the concepts and taxonomies in [[DPV]] for specific applications, use-cases, or to provide separation for better management of terms, we provide several extensions to the DPV.

Personal Data (PD)

[[[PD]]] provides additional concepts that extend the DPV's personal data taxonomy based on an opinionated structure contributed by R. Jason Cronk from EnterPrivacy. This separation is to enable adopters to decide whether the extension's concepts are useful to them, or to use other external vocabularies, or define their own.

Concepts within [[PD]] are broadly structured in top-down fashion by utilising their relevance and origin as:

Locations (LOC)

[[[LOC]]] provides additional concepts regarding locations such as countries and regions based on the ISO 3166 standards. It enables representing information such as processing takes place within Ireland, represented by loc:IE, or within European Union (EU) by using loc:EU. We are working on expanding this list to also specify regions, cities, and other pertinent location details, and welcome participation and contributions for this.

Risk Management (RISK)

[[[RISK]]] builds on top of the lightweight risk framework within DPV by providing the following extensive concepts related to risk assessment and management. We are in the process of identifying additional concepts and taxonomies for the risk extension, such as for risk management procedures and the creation of a risk ontology based on ISO standards.

Technologies (TECH)

[[[TECH]]] extends the DPV's terms to represent further specific details regarding technologies, their management, and relevance to actual real-world tools and systems. It provides concepts for the following:

The intention and aim of developing the TECH extension is to describe real-world tools and services, such as a specific cloud storage provider, and provide categorisation and metadata to connect it to DPV's concepts, such as to indicate the cloud storage instance features encryption at rest as a technical measure. Through these, the management and documentation of use-cases can be made easier by providing the relationships between tools/services and technical measures as a 'knowledge graph'.

Artificial Intelligence (AI)

[[[AI]]] is an extension under development which will further extend the [[TECH]] extension to represent concepts associated with AI. These will include representation of:

Justifications

[[[JUSTIFICATIONS]]] provides concepts for use as 'justifications' with DPV. For example, where a right cannot be fulfilled, a justification such as 'identity could not be verified' is represented using a specific concept.

Notes

This document is based on inspiration from the following:

Funding Acknowledgements

Funding Sponsors

The DPVCG was established as part of the SPECIAL H2020 Project, which received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 731601 from 2017 to 2019.

Harshvardhan J. Pandit was funded to work on DPV from 2020 to 2022 by the Irish Research Council's Government of Ireland Postdoctoral Fellowship Grant#GOIPD/2020/790.

The ADAPT SFI Centre for Digital Media Technology is funded by Science Foundation Ireland through the SFI Research Centres Programme and is co-funded under the European Regional Development Fund (ERDF) through Grant#13/RC/2106 (2018 to 2020) and Grant#13/RC/2106_P2 (2021 onwards).

Funding Acknowledgements for Contributors

The contributions of Beatriz Esteves have received funding through the PROTECT ITN Project from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 813497.

The contributions of Harshvardhan J. Pandit have been made with the financial support of Science Foundation Ireland under Grant Agreement No. 13/RC/2106_P2 at the ADAPT SFI Research Centre.