Primer

The Data Privacy Vocabulary [[DPV]] enables expressing machine-readable metadata about the processing of personal data and use of technologies such as AI. It provides representation of information to support regulatory compliance, such as that for [[[GDPR]]]. This document is the ‘Primer’ for DPV - and introduces fundamental concepts with examples of use-cases and applications as a starting point for adopters wanting to understand and use the DPV. The Primer contains:

A high-level conceptual explanation of the DPV and its modelling of concepts;
Self-contained examples that illustrate how the concepts and data models provided by DPV can represent information associated with processing of personal data and use of technologies; and
Guidance towards application of DPV in use-cases and technologies.

[[PRIMER-concise]] is a shorter version (2 pages) of the primer intended for a quick introduction.

This document is provided as part of the BETA release. We welcome reviews, comments, and other feedback till 31st July. If there are no major issues, we will continue with the planned release.

Contributing: The DPVCG welcomes participation to improve the DPV and associated resources, including expansion or refinement of concepts, requesting information and applications, and addressing open issues. See contributing guide for further information.

DPV and Related Resources

[[[DPV]]]: is the base/core specification for the 'Data Privacy Vocabulary', which is extended for Personal Data [[PD]], Locations [[LOC]], Risk Management [[RISK]], Technology [[TECH]], and [[AI]]. Specific [[LEGAL]] extensions are also provided which model jurisdiction specific regulations and concepts . To support understanding and applications of [[DPV]], various guides and resources [[GUIDES]] are provided, including a [[PRIMER]]. A Search Index of all concepts from DPV and extensions is available.

[[DPV]] and related resources are published on GitHub. For a general overview of the Data Protection Vocabularies and Controls Community Group [[DPVCG]], its history, deliverables, and activities - refer to DPVCG Website. For meetings, see the DPVCG calendar.

The peer-reviewed article “Creating A Vocabulary for Data Privacy” presents a historical overview of the DPVCG, and describes the methodology and structure of the DPV along with describing its creation. An open-access version can be accessed here, here, and here. The article preprint Data Privacy Vocabulary (DPV) - Version 2 describes the changes made in DPV v2.

DPV Specification

Structure of DPV

DPV provides hierarchical taxonomies of concepts where each core concept represents the top-most abstract concept in a tree and each of its children provide a lesser abstract or more concrete concept. For example, consider the concept of PersonalData which is the abstract representation of personal data. It can be further refined or extended as SensitivePersonalData, and further as SpecialCategoryPersonalData and then as GeneticData and so on.

From this perspective, the top-most abstract concepts are collectively referred to as the core vocabulary within DPV. The goal of the DPV is to provide a rich collection of concepts for each of the top concepts so as to enable their application within real-world use-cases. The identification of what constitutes a core concept is based on the need to represent information about it in a modular and independent form, such as that required for legal compliance.

Each core concept is intended to be independent from other core concepts. For example, the Purpose (e.g. Optimisation) refers only to the purpose of why personal data is processed and is independent as a concept from the PersonalData (e.g. Location) or the Processing activities (e.g. Collect, Store) involved to carry out that purpose. Such separation is necessary in order to represent and answer questions such as:

Q: What data is being processed?
Ans: dpv:PersonalData → dpv:Email
Q: Why is the data being processed?
Ans: dpv:Purpose → dpv:Marketing
Q: What operations are done with the data?
Ans: dpv:Processing → dpv:Collect, dpv:Store
Q: What justification is used to do the processing?
Ans: dpv:LegalBasis → dpv:Consent, dpv:Contract
Q: How is the data directly or indirectly being protected?
Ans: dpv:TechnicalOrganisationalMeasure → dpv:AccessControlMethod, dpv:PrivacyByDesign
Q: What rights are associated with the processing of data?
Ans: dpv:Right → dpv:ActiveRight, dpv:PassiveRight
Q: What are the risks associated with processing of data?
Ans: dpv:Risk → dpv:RiskLevel, dpv:Severity
Q: What is the contextual information associated with processing?
Ans: dpv:Context → dpv:Duration, dpv:Scale

The separation of concepts creates a modular structure for concept hierarchies within DPV, which in turn allows an adopter to use one particular concept taxonomy or module (e.g. list of purposes) independently without reusing the others, or to select only those concepts which are needed for their particular use-case. The separation also permits greater flexibility of representation and usage - such as using different combinations of core concepts as needed in use-cases. For example, a use-case can specify a single concept representing both Purpose and Processing by combining their respective concepts from DPV. The modular design of DPV also makes it possible to define domain and jurisdiction specific concepts in a separate namespace - such as the [[[DPV-NACE]]] purpose taxonomy providing a way for Purpose to indicate sectors using NACE taxonomy, and the [[[EU-GDPR]]] for using LegalBasis to represent the legal bases provided by [[GDPR]].

Overview of Core Concepts

Overview of concepts in DPV 2.0 - red indicates new concepts and blue indicates expansion of scope to include (non-personal) data and technologies

Purpose

Representing the purpose for which personal data is processed, for e.g. ‘Personalisation’ as a broad category of purpose. Information about the purpose can be further specified by denoting information about its interpretation within a particular Sector, such as from standardised authoritative lists e.g. [[NACE]], to indicate domain-specific applications and interpretations, or to indicate applicability of sectorial laws.

Data and Personal Data

‘Personal data’ refers to data about a natural person. ‘Personal data’ is also commonly referred to as ‘personally identifiable information (PII)’. However the terms should not be interchangeably used as based on definitions (e.g. those in GDPR), ‘personal data’ can be interpreted as a broader term than PII, and where PII may refer only to information that can directly identify a person. DPV’s definition of personal data is based on the broadest possible definition (i.e. from GDPR) as it covers a wider range of information considered ‘personal data’. Personal data can be declared as a category, such as ‘Email’, or an instance, such as ‘x@y.z’.

DPV defines the concept Data which has subtypes NonPersonalData and PersonalData, which are associated using the relation hasData. To specifically indicate involvement of personal data, DPV provides the relation hasPersonalData.

Processing

Representing processing as in the actions or operations over personal data, for e.g. collect, use, share, store. To indicate the origin or source of data, the concept DataSource along with relation hasDataSource is provided. For additional contextual information regarding operations or processing, such as whether it include humans or automation, the concept ProcessingContext is provided which can be associated using the relation hasContext (description of Context is provided later in the document). Examples of ProcessingContext include conditions such as profiling, automated decision making, human involvement.

Processing and Storage Conditions

Specifying temporal and geo-spatial context associated with processing and storage using DPV

Indicating information about conditions or limitations associated with processing (including storage) of personal data - such as its location, duration, deletion (e.g. erasure mechanisms), or restoration (e.g. backup availability).

Legal Basis

A legal basis is a law or a clause in a law that justifies or permits the processing of personal data or use of technologies in the specified manner. It is a jurisdictional concept given the scoping of laws to specified countries or regions, as well as a domain-specific concept given the specific laws enacted scoped to particular domains. A law, such as the GDPR, that regulates the use of personal data requires that every processing of personal data must be justified with some legal basis to ensure it is lawful, and to further assess its correctness, accountability, and impact based on the obligations applicable. However, what is considered a legal basis varies greatly across cultures, domains, use-cases, and laws themselves. The aim of DPV is therefore to provide an upper-level abstract taxonomy of categories of legal bases, such as consent and contract, that can be customised and applied as needed.

Entities

Representing the ‘entities’ or ‘actors’ involved in the processing of personal data. DPV provides a broad categorisation of entities based on their relevance in jurisprudence (i.e. legal roles) as well as categorisation in real-world (e.g. organisation types).

Data Controller

Representing the organisation(s) responsible for processing the personal data.

DataSubject

Representing the categories or groups (e.g. Users of a Service), or instances (e.g. Jane Doe) of individual(s) whose personal data is being processed.

Recipient

Represents the entities that receive personal data, e.g. when it is being collected, or transferred, or shared.

Technical Organisational Measure

Specfiying Technical, Organisational, Legal, and Physical measures using DPV

DPV provides a taxonomy of technical and organisational measures for representing information about how the processing of personal data is technically and organisationally protected, safeguarded, secured, or otherwise managed. This is distinct from what technology is used for carrying out processing, and instead refers to what measures are in place (i.e. what the technology intends to provide in terms of features).

Technical and Organisational measures consist of activities, processes, or procedures used in connection with ensuring data protection, carrying out processing in a secure manner, and complying with legal obligations. Such measures are required by regulations depending on the context of processing involving personal data. For example, GDPR (Article 32) states implementing appropriate measures by taking into account the state of the art, the costs of implementation and the nature, scope, context and purposes of processing, as well as risks, rights and freedoms.

Location and Jurisdiction

Representing the locations associated with entities, processing, data, and other information that is important to consider jurisdictions and from that understand the applicability of laws, involvement of authorities, and discover rights.

Risk

Risk refers to potential negative events. DPV enables representing risk(s) associated with a concept, for e.g. risk of unauthorised data disclosure related to processing, technical measure, or vulnerability of data subjects. In addition to the risk, DPV also enables representing the consequences (e.g. denial of service) and their impacts for specific entities (e.g. right violated for data subject).

Technology

Representing the technologies used to implement the processing, or associated with the processing. For example, software products, cloud services, or AI technologies. This also involves specifying who is doing the implementing i.e. a technology and its implementer.

Rights

The concept Right represents a normative concept for what is permissible or necessary in accordance with a system such as laws. To associate rights with concepts that are relevant or within which those rights occur, the relation hasRight is used. Rights can be passive, which means they are always applicable without requiring anything to be done, or active where they require some action to be taken to initiate or exercise them. To represent these concepts, DPV uses PassiveRight and ActiveRight respectively. Rights can be applicable to different contexts or entities. To differentiate rights applicable or afforded to data subjects, the concept DataSubjectRight is used.

Rules

Rules are relevant to explicitly denote how a system should implement operations, and enable associating specifics such as requirements, constraints, and other forms of 'rules' that are needed in order to control executions or affect interpretations or achieve compliance (e.g. with law). DPV defines the concept Rule and relation hasRule to enable representation of such conditions and requirements, and provides a minimal set of concepts for types of rules, namely - representing Permissions, Prohibitions, and Obligations. DPV does not define additional semantics for rules and limits its scope and focus to provide a simple way to specify common rules associated with personal data and its processing activities, with the recommendation to consider other richer and mature efforts dedicated to expression of conditions and rules, such as: [[ODRL]], [[SHACL]], and [[RuleML]].

Process

In legal terminology, it is common to refer to all information about how personal data is being processed using the colloquial term processing. This results in confusion between the use of processing as a concept referring to all information (i.e. purposes, personal data, collection, storage, etc.), and processing as a concept referring to (only) the specific actions or operations (e.g. collect, use).

To avoid this ambiguity and enable clarity of information, DPV defines a new concept called Process for representing how the core concepts are combined and applied for a particular use-case. The association of a concept to Process is made using the relationships or properties provided for each concept. For example, to indicate a Process includes personal data, the relationship hasPersonalData is used along with the concept PersonalData.

Nesting `Process` to express granular models

Instances of Process can be nested, which means one instance can contain other instances, much like a box with several smaller boxes inside. This permits breaking down complex or dense use-cases into more granular ones and representing them in a more precise and modular fashion. Such a representation also facilitates reuse of the granular or modular processes, or in defining 'templates' and 'patterns', for example to craft a single process representing collecting and storing email addresses and using it in different processes for different purposes.

From the earlier example, consider the situation where a single Process instance consists of two additional instances representing: (i) data is stored using a data processor, (ii) data is used for Marketing. While it is certainly possible to represent all of this information within one single instance of Process, the adopter may decide to create separate instances of Process based on requirements such as reflecting similar separations for legal documentation or accountability purposes.

Consider the example where Acme, as a DataController, maintains records of its processing activities using Process to represent one of its services. In this, it collects email, uses it for internal analyses based on Legitimate Interests, and also sends marketing information by using a processor based on the data subject's consent. Using nesting of processes, the information can be expressed at a granular level representing service, individual purposes, and so on.

ex:Acme rdf:type dpv:DataController .
ex:AcmeMarketing rdf:type dpv:Process ;
    dpv:hasProcess ex:InternalAnalytics ;
    dpv:hasProcess ex:SendingNewsletters .

ex:InternalAnalytics rdf:type dpv:Process ;
    dpv:hasPersonalData pd:Email ;
    dpv:hasProcessing dpv:Collect, dpv:Store ;
    dpv:hasPurpose dpv:InternalResourceOptimisation ;
    dpv:hasLegalBasis dpv:LegitimateInterestOfController .

ex:FooTech rdf:type dpv:DataProcessor .
ex:SendingNewsletters rdf:type dpv:Process ;
    dpv:hasPersonalData pd:Email ;
    dpv:hasProcessing dpv:Share ;
    dpv:hasPurpose dpv:Marketing ;
    dpv:hasDataProcessor ex:FooTech ;
    dpv:hasLegalBasis dpv:Consent .

(META Example ID: E0006; link: source file; contributed by Harshvardhan J. Pandit on 2024-06-10)

Alternative Models

Process is intended to provide a convenient concept for tying the core concepts together, and DPV does not make its use binding, nor does it constrain the relationships to only be defined between Process and the other core concepts. This is so as to permit using DPV in alternate or differing models. For example, where a central concept already exists, such as when describing relevant information for a smartphone app, the concept for App can be a replacement for Process based on statements such as <App> hasPurpose <SomePurpose>. Even in such cases, Process can provide granular expression thereby enabling description of different contexts within which the app uses personal data, such as for registration or complaint resolution. Therefore unless necessary for the use-case, DPV recommends using Process or its subtype/subclass as a central concept for ensuring interoperability.

An example of where the adopter or use-case wants to use another concept in a way which is not compatible with Purpose is the use of Purpose to indicate it involves some data i.e. <SomePurpose hasPersonalData SomePersonalData>, or to indicate which legal basis is used for that purpose by using the hasLegalBasis relationship. While not explicitly prohibited by DPV, the implications of using Purpose in this manner is that the personal data and processing and other associated concepts are now strictly tied to the purpose instance (and implementation). Changing any of these would mean changing the purpose, and in addition to these, it is not possible to combine multiple purposes together or have nested purposes with different details in the same manner as with a Process. Therefore, DPVCG recommends the use of Process to ensure compatibility between use-cases as well as to ensure the use of concepts does not create ambiguity or restrict further use-cases from reusing existing information.

When using custom-defined restrictions and data models, it is important to note the consequences such models have on interpretation and interoperability of data defined using DPV. For example, consider a compliance assessment tool that takes DPV data as input. If the tool expects a Process with links to relevant information, using other alternate models and relationships can produce invalid or incorrect results. To avoid this, we recommend:

Documenting alternate models to clearly indicate their interpretation and use of DPV semantics;
Where possible, ensuring and providing mappings between the alternate models and the Process or equivalent concepts within DPV so that the data can be transformed for interoperability;
Consider contributing your idea or implementation of an alternate model to DPVCG to create a ‘library of models’, which can act as documentation for adopters and provide better understanding of the model's impacts on requirements and interpretation of information specified using DPV. This exercise can also assist in selecting a common model as the 'default' and to provide mechanisms for conversion/interoperability between it and other models.

Semantics of DPV

DPV defines a broad notion of semantics for providing a conceptual model of concepts and relationships between them. As explained in the [[[#serialisations]]] section, [[DPV]] provides concepts which are represented using [[RDFS]] and [[SKOS]] which permits its use as a taxonomy or as a light-weight ontology. In addition to this, the same concepts are provided with [[OWL]] serialisation in a separate namespace to enable complex ontological reasoning. The following section introduces why we need 'concepts' and 'relationships' and how they are modelled in DPV.

Concepts and Relationships

[[DPV]] is a collection of concepts. Here the term 'concept' is broadly used as consisting of a term non-exhaustively representing any of the following: idea, thought, meaning, object, event, relations, class, or category. Thus, in DPV, 'concepts' consist of terms and relationships between them.

Semantic relationships between concepts used in DPV - generalisation and specialisation (arrowhead), instantiation, and association

Consider the example scenario where we want to express the following: Alpha is a DataController that collects Email as PersonalData, stores it, and shares it with BetaInc (a Data Processor), all for official correspondence.

In this, the 'core' concepts are: Data Controller, Personal Data, Processing, Data Processor, and Purpose. Email is another concept that is a specific 'type' or 'subset' or 'category' of Personal Data. Similarly, Processing has 'collects', 'stores', and 'shares'. Alpha is a specific 'instance' of a Data Controller, and similarly BetaInc of a Data Processor.

Here the difference between 'Email' and 'Alpha' is that the former could be further described in terms of the same concept, e.g. Email Address is a specific part of Email as Personal Data; while the latter is 'final' in the sense that it cannot be 'extended' further without losing its meaning. For example, if 'Alpha' has a department or subsidiary, then either of those are not a 'Data Controller' automatically.

In addition to concepts, the above example also requires expressing these relationships: (i) a concept is a 'subtype' or 'subset' of another concept; (ii) a concept is an 'instance' of another concept; (iii) indicating a concept is applicable or involved. These relationships are expressed as is subtype of, is instance of, and has concept respectively.

Combining all of these together, we say the following in DPV:

PersonalData, Purpose, Processing, Data Controller, Data Processor are instances of Concept
Email is a subtype of Personal Data. Collect, Store, and Share are subtypes of Processing. Official Correspondence is a subtype of Purpose.
Alpha is an instance of a Data Controller. BetaInc is an instance of a Data Processor.
The use-case "has personal data := Email"; "has processing := Collect, Store, Share" ; and so on...
has personal data is a relation linking to Personal Data, has processing is a relation linking to a Processing, and so on ...

A ‘concept' in DPV is a 'term' representing information associated with that particular concept. For example, the concept Email refers to information about emails. This information may contain email addresses, aliases, signatures, and so on. While an intuitive use of Email may be taken to only refer to email address, within DPV concepts are defined with a strict scope as being representatives of all concepts that are inherently a part of it. Therefore, for emails, the concept Email is inclusive of email addresses, aliases, and so on from above. To specifically refer to 'email address', the concept Email Address should be used, which is 'narrower' or 'more specific' than the concept Email, or in terms of sets EmailAddress is a subset of Email, or if representing information as 'classes' we say EmailAddress is a 'subclass' of Email in terms of information. We use the term 'subtype' to indicate all such relationships consisting of 'broader/narrower' or 'superclass/subclass' or 'subset/superset' to enable different semantic interpretations when serialising the concepts using standards such as [[RDFS]], [[SKOS]], and [[OWL]] (e.g. 'is-a' or 'subclass').

Through this interpretation, the DPV is structured as a hierarchy of concepts where each parent or top or broader concept represents a broad set of information and its children or bottom or narrower concepts represent parts of that set. For example, the top concept Data has more specific subtypes Personal Data - which has a further subtype Sensitive Data.

In taking this view of concepts and relationships, DPV provides a way to agree upon what a term means and is intended to represent. For example, when two different use-cases use the concept Personal Data using DPV, both refer to the same concept. Similarly, when Email is declared as a subtype of Personal Data, another entity receiving and reading this information must interpret it in the same manner. DPV is thus intended to be a foundational model for terms and relationships when representing and exchanging information.

DPV as an Ontology

The use of DPV concepts in actual use-cases is often accompanied with additional information and a specific 'serialisation' that make it possible to use DPV in a given technological or theoretical framework. For example, consider the relation hasPersonalData used to indicate association or applicability of PersonalData subtypes/subclasses or instances. While this information about what concepts the relationship is being used with/for can be implicitly understood by humans based on the phrasing 'has personal data', it can also be explicitly declared as machine-readable information so as to: (i) express the inherent logic and interpretation of which concepts are related; (ii) enable verification that the object of relation is indeed a type of personal data; and (iii) provide hints or suggestions such as a list of personal data concepts in GUI when using the relation. To express such additional information that defines relations between concepts and constraints their uses, DPV must be specified as an 'ontology' using a serialisation that supports representing this and any other required information.

One option to represent ontologies is RDF ([[[RDF]]]) which provides a formal method for expressing information or facts, with RDFS ([[[RDFS]]]), SKOS ([[[SKOS]]]), and OWL ([[[OWL]]]) for representing a more detailed and logic-based assertion of the model in terms of relationships and restrictions. While there are other alternatives available to RDF for representing information, and to RDFS, SKOS, and OWL for representing taxonomies and ontologies, the DPVCG uses these to serialise the DPV specification as an ontology based on their status as standards.

Initially, DPV was only provided as an [[OWL]] ontology. This was expanded upon in DPV v1 which used custom [[SKOS]] extensions to define the 'core' vocabulary with serialisations in [[RDFS]]+[[SKOS]] and OWL2. In DPV v2, the custom [[SKOS]] extensions were removed in favour of [[RDFS]]+[[SKOS]] as the default serialisation with [[OWL]] as an alternative serialisation. The [[RDFS]]+[[SKOS]] serialisation defines concepts as [[RDFS]] classes and instances of a top-concept with [[SKOS]] used to represent the hierarchy, whereas the [[OWL]] serialisation uses subclasses to represent the hierarchy.

The table provides an overview of the expression of concepts across DPV serialisations.

Concept	[[DPV]]	[[DPV-OWL]]
Conforms with	[[RDFS]], [[SKOS]]	[[OWL]]
Concept	`rdfs:Class, skos:Concept`	`owl:Class`
is subtype of	`rdfs:subClassOf or skos:broader`	`owl:subClassOf`
is instance of	`rdf:type`	`rdf:type`
has concept	`rdf:Property`	`owl:ObjectProperty`
relationship subject or domain	`rdfs:domain, dcam:domainIncludes, schema:domainIncludes`	`rdfs:domain, dcam:domainIncludes, schema:domainIncludes`
relationship object or range	`rdfs:range, dcam:rangeIncludes, schema:rangeIncludes`	`rdfs:range, dcam:rangeIncludes, schema:rangeIncludes`

This example highlights the implications of using DPV with RDFS+SKOS and OWL semantics. The RDFS+SKOS model is suitable for when DPV concepts have to be used 'directly' e.g. an use-case wants to use 'Encryption' from DPV as a technical measure without creating a specific 'instance' representing their implementation of the encryption. RDFS+SKOS model is also suitable for when an use-case wants to further create an hierarchical taxonomy of the different 'Encryption' concepts they are using. Such use of DPV concepts covers both T-box and A-box, and does not restrict or require the underlying ontology/schema to be constantly changed as new concepts are identified in an use-case.

In comparison, the OWL2 model defines concepts as classes and instances with a clear separation between the two i.e. the Tbox and Abox are separate. This means the concept 'Encryption' cannot be directly used with OWL2 semantics and an instance of it must first be created. If instead DPV had provided 'Encryption' as an instance (of Technical Measure) then the further expansion and creation of a hierarchy of encryption concepts described earlier is not feasible in OWL2 as it results in a complex graph which is not efficient for reasoning. Further, making changes such as promoting 'Encryption' from instance to class requires changing the ontology/schema - which may not always be possible or feasible in use-cases.

Thus, the relative strengths and weaknesses of RDFS+SKOS and OWL2 serialisations dictate which should be used. RDFS+SKOS is 'lightweight' and better if the use-case only requires rudimentary or simple reasoning. OWL2 is better suited for semantic reasoners that can perform complex discovery and validation processes but require more strict and restricted use of concepts. By providing both serialisations, DPV enables the adopters to choose the most suitable serialisation that supports their use-case and/or existing implementations, and retains semantic interoperability based on converting between SKOS and OWL2 (see Using OWL and SKOS (2008)).

# Example: A Company is running 'Direct Marketing' campaigns for
# a purpose (CampaignA), within which it runs another special
# campaign (CampaignB). This may envolve into another campaign C.
# RDFS+SKOS modelling
dpv:Marketing a rdfs:Class, dpv:Purpose .
dpv:DirectMarketing a dpv:Purpose; skos:broader dpv:Marketing .
ex:CampaignA a dpv:Purpose; skos:broader dpv:DirectMarketing .
ex:CampaignB a dpv:Purpose; skos:broader ex:CampaignA .
# valid statements as all objects are instances of dpv:Purpose
ex:Documentation a dpv:Process; dpv:hasPurpose dpv:DirectMarketing .

# OWL2 modelling
dpv:Marketing a owl:Class; rdfs:subClassOf dpv:Purpose .
dpv:DirectMarketing a owl:Class; rdfs:subClassOf dpv:Marketing .
ex:CampaignA a dpv:DirectMarketing .
# problem: instance of instance
ex:CampaignB a ex:CampaignA .
# solution: change CampaignA to subclass 
ex:CampaignA rdfs:subClassOf dpv:DirectMarketing .
# problem: DirectMarketing is not an instance of dpv:Purpose
ex:Documentation a dpv:Process; dpv:hasPurpose dpv:DirectMarketing .
# solution: blank nodes, but not a good solution
ex:Documentation a dpv:Process; dpv:hasPurpose [ a dpv:DirectMarketing ] .
# solution: use the instances rather than the class
ex:Documentation a dpv:Process; dpv:hasPurpose ex:CampaignA .
# problem: does not reflect the terminology used by Company/Law

(META Example ID: E0001; link: source file; contributed by Harshvardhan J. Pandit on 2024-06-10)

Extending Concepts for Use-Cases

Most of the concepts within DPV are provided as hierarchies of classes representing categories of information, which are intentionally generic or abstract or broad so as to permit their application across a diverse and varied landscape of real-world use-cases. In order to accurately reflect the particulars of an use-case, concepts within DPV would (most likely) need to be extended. The specifics for how this should be done depend on the manner in which DPV is utilised. For example, using the default [[DPV]] specification which contains [[RDFS]] and [[SKOS]] semantics, extending is done by declaring a new concept an instance of the top concept using rdf:type and then using skos:broader to denote where it fits within the hierarchy. In [[DPV-OWL]] which uses [[OWL]] semantics, rdfs:subClassOf relationship is used to create hierarchy of sub-classes. Where an exact concept is not present within the DPV and a broader concept exists for representing the same information - one should subtype or extend that broad concept to define the required information.

DPV defines the (broad) concept of Marketing in its Purpose hierarchy to represent information about (purposes related to) marketing activities and topics. For a use-case which requires representing purposes (note: plural) related to marketing of new products, the broad Marketing concept is extended as a child or subclass concept for representing the intended purpose as, e.g. MarketingNewProducts.

Note: Here the prefix ex: represents http://example.com# - which is a convention for representing additional examples or concepts, such as those created as part of the use-case. The rest of the document follows this nomenclature and convention.

# Example using DPV (RDFS+SKOS)
# Case1: Where further categories are required to 'group' related purposes
# creating a new subclass or category of Marketing for use-case
ex:MarketingOfNewProducts a dpv:Purpose ;
    skos:broader dpv:Marketing ;
    skos:prefLabel "Marketing of New Products" .

# more specific purposes under group ‘Marketing of New Products’
ex:NewslettersOffers a dpv:Purpose ;
    skos:broader ex:MarketingOfNewProducts ;
    skos:prefLabel "Newsletters about Offers" .
ex:EmailsSeasonalOffers a dpv:Purpose ;
    skos:broader ex:MarketingOfNewProducts ;
    skos:prefLabel "Emails about Seasonal Offers" .

# Case2: A single final and definite purpose within EmailSeasonalOffers
ex:MarketingSeasonalOffer2021 a dpv:Purpose ;
    skos:broader dpv:Marketing ;
    skos:prefLabel "Sending Email Newsletters with Seasonal Offers" .

# Example using DPV-OWL (OWL2)
# Case1: Where further categories are required to 'group' related purposes
# creating a new subclass or category of Marketing for use-case
ex:MarketingOfNewProducts rdfs:subClassOf dpv:Marketing ;
    skos:prefLabel "Marketing of New Products" .

# more specific categories of group ‘Marketing of New Products’
ex:NewslettersOffers rdfs:subClassOf ex:MarketingOfNewProducts ;
    skos:prefLabel "Newsletters about Offers" .
ex:EmailsSeasonalOffers rdfs:subClassOf ex:MarketingOfNewProducts ;
    skos:prefLabel "Emails about Seasonal Offers" .

# Case2: A single final and definite purpose within EmailSeasonalOffers
ex:MarketingSeasonalOffer rdf:type dpv:Marketing ;
    skos:prefLabel "Sending Email Newsletters with Seasonal Offers" .

(META Example ID: E0002; link: source file; contributed by Harshvardhan J. Pandit on 2024-06-10)

The mechanism for extending concepts (via both subclasses/subtypes and instances) is useful to align existing concepts or vocabularies with the DPV taxonomies, such as by declaring them as subclasses of a particular concept. This permits the creation of domain or jurisdiction specific extensions, such as [[[EU-GDPR]]] for expressing the legal bases provided by GDPR. Extensions also permit more accurate representations of a use-case by extending from multiple concepts to refine and scope the interpretation. This means each concept can have multiple parents representing the intersection of their respective sets.

For example, two TV companies (AliceCo and BobCo) extend the concept Optimisation to reflect their respective purposes. When exchanging information about their use-cases with each other (or with a third party), by following the chain of use-case specific concepts it is possible to deduce that both AliceCo and BobCo are doing optimisations for consumers. Thus a common language or interface can be developed based on using DPV as a point of interoperability and commonality which can be used by adopters to define the specifics of their use-case. For example, in the above use-case, a common notice generation algorithm could be created and used to inform users of both services the purposes each company is using data for.

# Example in DPV (RDFS+SKOS)
# Method 1: Ambiguous regarding independence of Collect and Use
ex:ActivityA a dpv:Process ;
    dpv:hasProcessing ex:Collect, ex:Use .

# Method 2: Accuracy regarding combination of Collect and Use
ex:CollectAndUse a dpv:Processing
    # rdfs:subClassOf dpv:Collect, dpv:Use ; # -- RDFS+SKOS permits this
    skos:broader ex:Collect, ex:Use ;
    skos:prefLabel "Collect and Use data using User Device" .
ex:Activity a dpv:Process ;
    dpv:hasProcessing ex:CollectAndUse . # -- property range is correct

# Example in DPV-OWL
ex:CollectAndUse rdfs:subClassOf dpv:Collect, dpv:Use ;
    skos:prefLabel "Collect and Use data using User Device" .
ex:Activity a dpv:Process ;
    # dpv:hasProcessing ex:CollectAndUse  # -- property range is incorrect
    dpv:hasProcessing ex:CU1 . # hence, an instance is needed for OWL2
ex:CU1 a ex:CollectAndUse .

(META Example ID: E0003; link: source file; contributed by Harshvardhan J. Pandit on 2024-06-10)

It is not necessary to extend concepts unless one wishes to depict use-case specific information. For example, if in a use-case it is sufficient to (only) say some information is collected, then dpv:Collect can be directly used. However, where more specific information is needed, such as also specifying a method of collection (e.g. CollectViaWebForm), then it is recommended to extend the concept, for example as <CollectViaWebForm a dpv:Collect>. If there are lots of forms and they need to be 'grouped' together as collection methods, then one would subtype/subclass Collect as CollectViaWebForm and create instances of it for each form to be represented.

Though this example used a web form as a method of collection by directly mentioning it within the concept as CollectViaWebForm, this may not always be desirable. For example, that same web form may also need to be represented separately for logging purposes. DPV also provides the DataSource and Technology concepts for representing information regarding how concepts are implemented and the use of specific technological artefacts such as web forms, databases, along with their functions such as data storage and retrieval.

Maintaining Interoperability

DPV intends to provide a core or foundational framework for different entities to exchange information and interpret concepts for interoperability. When an adopter (e.g. an organisation using DPV) extends concepts to refine them for their own use-case, the concept is still (weakly) interoperable by relying on DPV’s broad taxonomies to provide a common point of reference.

For example, two TV companies (AliceCo and BobCo) extend the concept Optimisation to reflect their respective purposes as follows:

# Example in DPV (RDFS+SKOS)
# AliceCo’s optimisation related to better services for users’ infrastructure
exA:TVServiceOptimisaion a dpv:Purpose;
    skos:broader dpv:OptimisationForConsumer ;
    skos:prefLabel "Optimise Service for Users’ Infrastructure" .

# BobCo’s optimisation related to more efficient signals for users’ TV sets
exB:TVSignalOptimisation a dpv:Purpose;
    skos:broader dpv:OptimisationForConsumer ;
    skos:prefLabel "Optimise Signal for Consumer TV Set" .

# Example in DPV-OWL
# the common ancestor is:dpv:OptimisationForConsumer ; 
# Using this as context to compare:
# (either manually, or based on data used, etc.)

# 1: BobCo's optimisations are found to be broader than AliceCo's
exA:TVServiceOptimisation skos:broader exB:TVServiceOptimisation .

# 2: BobCo's optimisations are found to be the same as AliceCo's
exA:TVServiceOptimisation skos:exactMatch exB:TVServiceOptimisation .

# 3: BobCo's optimisations are found to be similar to AliceCo's
exA:TVServiceOptimisation skos:closeMatch exB:TVServiceOptimisation .<F25>

(META Example ID: E0004; link: source file; contributed by Harshvardhan J. Pandit on 2024-06-10)

Extensions

To supplement the concepts and taxonomies in [[DPV]] for specific applications, use-cases, or to provide separation for better management of terms, we provide several extensions to the DPV.

Personal Data (PD)

[[[PD]]] provides additional concepts that extend the DPV's personal data taxonomy based on an opinionated structure contributed by R. Jason Cronk from EnterPrivacy. This separation is to enable adopters to decide whether the extension's concepts are useful to them, or to use other external vocabularies, or define their own.

Concepts within [[PD]] are broadly structured in top-down fashion by utilising their relevance and origin as:

Internal (within the person): e.g. Preferences, Knowledge, Beliefs
External (visible to others): e.g. Behavioural, Demographics, Physical, Sexual, Identifying
Household: e.g. personal or household activities
Social: e.g. Family, Friends, Professional, Public Life, Communication
Financial: e.g. Transactional, Ownership, Financial Account
Tracking: e.g. Location, Device based, Contact
Historical: e.g. Life History

Locations (LOC)

[[[LOC]]] provides additional concepts regarding locations such as countries and regions based on the ISO 3166 standards. It enables representing information such as processing takes place within Ireland, represented by loc:IE, or within European Union (EU) by using loc:EU. We are working on expanding this list to also specify regions, cities, and other pertinent location details, and welcome participation and contributions for this.

Risk Management (RISK)

[[[RISK]]] builds on top of the lightweight risk framework within DPV by providing the following extensive concepts related to risk assessment and management. We are in the process of identifying additional concepts and taxonomies for the risk extension, such as for risk management procedures and the creation of a risk ontology based on ISO standards.

Risk Controls - categories of measures such as those related to risk source, likelihood, consequence, vulnerability, as well as the intended effect in terms of monitoring, controlling, halting, removing, or reducing.
Consequences and Impacts - list of consequences such as data breaches, costs, identity theft and several others that are categorised based on DPV's impact framework i.e. damage, harm, or detriment.
Scale for Risk Levels, Severity, and Likelihood - a 7 point qualitative scale to express concepts associated with levels, severity, and likelihood of risk and its consequences.
Risk Matrix - an encoded form of risk matrices based on combinations of severity and likelihood along with the resulting risk level. Risk matrix nodes and values are provided for dimensions 3x3, 5x5, and 7x7.
Incidents, Reports, and Notices - specifying incidents such as security incidents or data breaches, documenting information about them, and notices used to communicate with other relevant entities such as authorities and data subjects.
Risk Management - risk management concepts based on ISO 31000 series.

Technologies (TECH)

[[[TECH]]] extends the DPV's terms to represent further specific details regarding technologies, their management, and relevance to actual real-world tools and systems. It provides concepts for the following:

Provision method: System, Component, Algorithm, Service, Goods, Product, Subscription, Fixed Use
Communication method: WiFi, Bluetooth, GPS, Cellular Network
Actors: Developer, Provider, User, Subject, etc.
Intended Use: what the technology was/is intended to be used for
Documentation: technical and user manuals and other documentation
Status: whether the technology has been released, has been provided, and other statuses
Tools: databases, cookies, etc.

The intention and aim of developing the TECH extension is to describe real-world tools and services, such as a specific cloud storage provider, and provide categorisation and metadata to connect it to DPV's concepts, such as to indicate the cloud storage instance features encryption at rest as a technical measure. Through these, the management and documentation of use-cases can be made easier by providing the relationships between tools/services and technical measures as a 'knowledge graph'.

Artificial Intelligence (AI)

[[[AI]]] is an extension under development which will further extend the [[TECH]] extension to represent concepts associated with AI. These will include representation of:

Techniques such as machine learning and natural language programming
Capabilities such as image recognition and text generation
Lifecycle such as data collection, training, fine-tuning, etc.
Risks such as data poisoning, statistical noise and bias, etc.
Risk Measures to address the AI specific risks
Documentation such as Data Sheets and Model Cards
Actors such as AI Developer and AI Deployer
Status associated with AI development

Justifications

[[[JUSTIFICATIONS]]] provides concepts for use as 'justifications' with DPV. For example, where a right cannot be fulfilled, a justification such as 'identity could not be verified' is represented using a specific concept.

Legal Concepts (LEGAL)

[[[LEGAL]]] provides concepts to represent laws, authorities, and other legal concepts in various jurisdictions. It is structured to create a separate namespace for each country or jurisdiction by using the ISO 3166-2 code, for example IE represents Ireland and EU represents the European Union. Within this namespace, the specific laws and authorities for that jurisdiction are defined.

At the moment, the following jurisdictions are defined:

[[[LEGAL-DE]]]
[[[LEGAL-EU]]]
[[[LEGAL-GB]]]
[[[LEGAL-IE]]]
[[[LEGAL-IN]]]
[[[LEGAL-US]]]

At the moment, the following EU laws are defined:

EU DGA

The [[[EU-DGA]]] extension provides concepts for the [[[DGA]]].

EU AI Act

The [[[EU-AIAct]]] extension provides concepts for the [[[AIAct]]].

EU NIS2

The [[[EU-NIS2]]] extension provides concepts for the [[[NIS2]]].

Notes

This document is based on inspiration from the following:

RDF 1.1 Primer https://www.w3.org/TR/rdf11-primer/
OWL 2 Primer https://www.w3.org/TR/owl2-primer/
PROV Model Primer https://www.w3.org/TR/prov-primer/

Funding Acknowledgements

Funding Sponsors

The DPVCG was established as part of the SPECIAL H2020 Project, which received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 731601 from 2017 to 2019.

Harshvardhan J. Pandit was funded to work on DPV from 2020 to 2022 by the Irish Research Council's Government of Ireland Postdoctoral Fellowship Grant#GOIPD/2020/790.

The ADAPT SFI Centre for Digital Media Technology is funded by Science Foundation Ireland through the SFI Research Centres Programme and is co-funded under the European Regional Development Fund (ERDF) through Grant#13/RC/2106 (2018 to 2020) and Grant#13/RC/2106_P2 (2021 onwards).

Funding Acknowledgements for Contributors

The contributions of Beatriz Esteves have received funding through the PROTECT ITN Project from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 813497.

The contributions of Harshvardhan J. Pandit have been made with the financial support of Science Foundation Ireland under Grant Agreement No. 13/RC/2106_P2 at the ADAPT SFI Research Centre.