This document acts as a guide presenting how [[[DPV]]], through its OWL2 encoding (i.e. [[DPV-OWL]]) can be used as an OWL2 vocabulary by easily encoding it in a low-complexity profile of OWL2 called OWL2-PL.
Contributing: The DPVCG welcomes participation to improve the DPV and associated resources, including expansion or refinement of concepts, requesting information and applications, and addressing open issues. See contributing guide for further information.
[[[DPV]]]: is the base/core specification for the 'Data Privacy Vocabulary', which is extended for Personal Data [[PD]], Locations [[LOC]], Risk Management [[RISK]], Technology [[TECH]], and [[AI]]. Specific [[LEGAL]] extensions are also provided which model jurisdiction specific regulations and concepts . To support understanding and applications of [[DPV]], various guides and resources [[GUIDES]] are provided, including a [[PRIMER]]. A Search Index of all concepts from DPV and extensions is available.
[[DPV]] and related resources are published on GitHub. For a general overview of the Data Protection Vocabularies and Controls Community Group [[DPVCG]], its history, deliverables, and activities - refer to DPVCG Website. For meetings, see the DPVCG calendar.
The peer-reviewed article “Creating A Vocabulary for Data Privacy” presents a historical overview of the DPVCG, and describes the methodology and structure of the DPV along with describing its creation. An open-access version can be accessed here, here, and here. The article Data Privacy Vocabulary (DPV) - Version 2, accepted for presentation at the 23rd International Semantic Web Conference (ISWC 2024), describes the changes made in DPV v2.
DPV’s concept are treated as OWL2 classes, and DPV’s properties as OWL2 object properties. For example, the term Location represent the class of all points on the earth’s surface; the term Collect denotes the class of all possible data collection operations. The term Marketing represents the class of all possible varieties of purposes related to marketing, including all kinds of DirectMarketing, Advertising, etc.
The reason for encoding DPV concepts as classes is simple: they are meant to be extended and refined for specific domains and use cases, as described in section 4 of the primer. For example, the purpose AcademicResearch may be specialized by introducing two subclasses, one for medical research and one for AI research. The processing category Collect, that denotes all personal data collection operations, can be specialized by two subclasses, one for the direct collection from the data subject, and one for the import of personal data from a third party, such as Open-ID or a Solid repository, for example. The new subclasses, in turn can be further specialized to support extreme extensibility and granularity. For example, the class Location can be refined by a class EULocation representing all the coordinates that fall within the borders of the European Union; in turn, EULocation can be refined by further terms, one for each country belonging to the Union; such terms can be further refined by classes that represent increasingly specific areas, such as regions, cities, streets, buildings, and even rooms in a building.
OWL2 provides strong interoperability guarantees: its direct semantics constitutes a crisp semantic specification, that guarantees that all compliant software agents understand and process DPV and its extensions in the same way.
The formal semantics of OWL2 provides also other guarantees, for example it can be formally proved that the compliance checking algorithms never return any false positive or negative. This robustness property and flexibility, extensibility and interoperability, make OWL2 the recommended machine-understandable encoding for automated compliance checking (of privacy policies – or records of processing – with respect to data protection regulations and the data subjects’ consent to personal data processing).
OWL2 does not mean poor performance: for example, using the specialized Java compliance checker developed in SPECIAL a compliance check takes only a few hundred μ-seconds.
It is not necessary to master OWL2 in its full complexity, to encode DPV in OWL2; a few simple features suffice, namely:
subclass assertions (to state – say – that DataSubjectRight is a subclass of Right)
property range assertions (e.g. to say that the values of the hasPurpose property are instances of Purpose)
No complex classes are needed, only class names. Optionally, DPV’s semantics can be refined by asserting formally that two classes are disjoint:
disjoint classes assertions (e.g. Purpose and Recipient have no instance in common)
Such disjointness assertions may help in identifying wrong uses of DPV (like the assignment of a recipient to property hasPurpose).
To give a concrete example, let us show how the term MedicalHealth is encoded in OWL2. In the following examples we use the Manchester syntax of OWL2.
In a few cases, it may be appropriate to introduce new terms that are instances of a class, in particular when the new term represents:
a single, specific recipient or controller,
a single location, such as those specified by GPS coordinates,
other single data values, such as specific email addresses (like “jane.doe@provider.org”), specific social security numbers, and the like.
The rule of thumb to decide whether a term should be a class or an instance is based on the question: can the new term possibly be further refined? If the answer is “yes”, then make it a class, otherwise it can be made an instance. In case of doubt, making the new term a class is the safest option.
Instances can be added to the ontology with:
instance assertions (e.g. stating that the ACME company is an instance of Recipient).
The OWL2 keyword for declaring instances is “Individual”. Its use is illustrated in the following example:
The low-complexity profile OWL2-PL allows to define properties and classes with little more than the OWL2 keywords illustrated in the previous section. The complete list of keywords that can be used for this purpose is the following:
No other OWL2 keywords are accepted in the OWL2-PL ontologies that formalize terminologies such as the DPV and its extensions.
In the above list, only the last feature has not yet been explained. It is applied in the use cases related to policies and compliance in order to define functional properties of the data processing operations (that is, properties that may have only one value). This task is under the responsibility of the policy language designer. The users who extend the DPV, as illustrated in the previous section, do not need to use this feature.
The OWL2 encoding of DPV can be used to encode data usage policies. A data usage policy can be:
the privacy policy of a company or organization,
the record of processing maintained by a company or organization,
the consent statement of a data subject,
the objective part of a personal data protection regulation such as the GDPR.
Data usage policies are modeled as classes. More precisely, a policy is nothing but the class of permitted data processing operations. A policy P1 (like the privacy policy of a company) complies with a policy P2 (like the consent policy of a data subject, or the objective part of the GDPR) if all the operations permitted by P1 are also permitted by P2, that is,
P1 complies with P2 if, and only if, P1 is a subclass of P2.
A policy can be specified in a compact way by describing the properties of its permitted operations.
In a personal data protection use case, the relevant properties of such operations include (not exclusively) the category of data being processed, the purpose of the processing, the kind of processing (e.g. data collection, data analysis, etc.), the recipients to which the data is transferred, and other properties related to how data is stored and protected, and the legal basis of the processing.
For example, let Policy1 be a privacy policy that permits the collection of email addresses for advertising purposes, based on the consent of the data subjects. This policy – seen as a class of operations – contains all the possible operations with the following DPV features:
their hasProcessing attribute is some kind of data collection (i.e. an instance of Collection)
their hasPersonalData attribute is some email address
their hasPurpose attribute is some kind of advertising (i.e. an instance of Advertising)
their hasLegalBasis attribute is some kind of consent (either explicit or implicit)
This class of operations is encoded in OWL2-PL as follows:
where A6-1-a-consent is a superclass of A6-1-a-explicit-consent and A6-1-a-non-explicit-consent. In a similar way, one can encode a consent statement that permits all the operations that collect contact information for marketing purposes:
Now, since EmailAddress is a subclass of Contact and Advertising is a subclass of Marketing, all the operations that belong to Policy1 satisfy also the requirements for being in Policy2 as well. Then Policy1 is a subclass of Policy2, that is, the privacy policy complies with the data subject’s consent.
It is also easy to say that some properties of a policy must have a single value (an instance). For example, the policy Policy3 stating that the email address “user@provider.com” (and only that mail address) can be transferred to the ACME company (and only that company) for any marketing purposes is encoded in OWL2-PL as follows:
The expressions {user@provider.com} and {myext:ACME} denote the classes that contain only user@provider.com and ACME, respectively. Therefore, all the operations permitted by Policy3 process only the address “user@provider.com” and share it only with ACME, as required. Any other operation is forbidden.
Interestingly, every OWL2 reasoner can be used to check compliance, because they are all able to check whether a class is a subclass of another class. Of course, the reasoners specialized for OWL2-PL are in general more efficient than generic OWL2 reasoners. The part of OWL2-PL aimed at encoding policies supports the above syntax, that is, policies can be defined using the keywords “EquivalentTo”, “and”, and “some” as shown above. Each policy must have a unique definition.
This kind of automated compliance checking has several interesting applications, including the following:
Consent re-use: when a new application or business line is deployed, its data usage policy can be automatically checked for compliance with the consent statements that are already available; in this way, the users whose consent already covers the new processing need not be asked again for consent.
Data re-purposing: an organization wants to use its data for a purpose different from that declared when the data was collected. Similarly to consent re-use, it can be checked automatically which data can be used for the new purpose based on the available consent.
Data sharing: The personal information contained in a container like a Solid pod or an Open-ID account can be labeled with a consent policy that determines whether a third party can access it, based on the data usage policy of the requester. This method improves the existing access control language by taking into account additional relevant information such as purposes and transfers to third parties.
Dynamic query control: when an employee queries personal data, the purpose of the query and the data categories involved in the query can be checked on-the-fly for compliance with the available consent.
Compliance with the regulations: the personal data usage policies of a company can be checked for compliance with the objective part of the data protection regulations. Every single compliance check verifies a number of restrictions, including (not exclusively): (i) verifying whether international data transfers comply with GDPR’s restrictions; (ii) verifying whether special data categories are protected with the technical and organizational measures prescribed by the GDPR.
As it should be expected, modeling the GDPR is more complex, but its encoding into OWL2 and OWL2-PL is the responsibility of knowledge engineers, and it does not concern end users. The interested reader may find more details about the OWL2 encoding of the GDPR in several papers and documents, such as [1] and [2].
The complete grammar of the policy language of OWL2-PL can be found in [3], together with a corresponding JSON version designed for the developers that are not familiar with OWL2.
The DPVCG was established as part of the SPECIAL H2020 Project, which received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 731601 from 2017 to 2019.
Harshvardhan J. Pandit was funded to work on DPV from 2020 to 2022 by the Irish Research Council's Government of Ireland Postdoctoral Fellowship Grant#GOIPD/2020/790.
The ADAPT SFI Centre for Digital Media Technology is funded by Science Foundation Ireland through the SFI Research Centres Programme and is co-funded under the European Regional Development Fund (ERDF) through Grant#13/RC/2106 (2018 to 2020) and Grant#13/RC/2106_P2 (2021 onwards).
OWL2-PL and the semantic policy framework have been developed by the H2020 projects SPECIAL and TRAPEZE, funded by the European Union under grants n. 731601 and 883464, respectively.
[1] Piero A. Bonatti, Sabrina Kirrane, Iliana M. Petrova, Luigi Sauro: Machine Understandable Policies and GDPR Compliance Checking. Künstliche Intell. 34(3): 303-315 (2020)
[2] P.A. Bonatti, L. Ioffredo, G. Luongo, S. Mosi, I.M. Petrova, L. Sauro: Formalization of the GDPR and of the Pilots’ Policies. SPECIAL Report D5. Available at: https://specialprivacy.ercim.eu/images/documents/report-D5.pdf
[3] P.A. Bonatti, J. Langens, L. Sauro: Policy Language – v1. TRAPEZE Report D2.1. Available at: https://trapeze-project.eu/resources/