This document acts as a guide presenting how [[[DPV]]], through its OWL2 encoding (i.e. [[DPV-OWL]]) can be used as an OWL2 vocabulary by easily encoding it in a low-complexity profile of OWL2 called OWL2-PL.

DPV Family of Documents

Related Links

This document is published by the Data Privacy Vocabularies and Controls Community Group (DPVCG) as a deliverable and report of its work in creating and maintaining the Data Privacy Vocabulary (DPV).

Contributing to the DPV and its extensions The DPVCG welcomes participation regarding the DPV, including expansion or refinement of its terms, addressing open issues, and welcomes suggestions on their resolution or mitigation. For further information, please see the contribution section.

Introduction

DPV’s concept are treated as OWL2 classes, and DPV’s properties as OWL2 object properties. For example, the term Location represent the class of all points on the earth’s surface; the term Collect denotes the class of all possible data collection operations. The term Marketing represents the class of all possible varieties of purposes related to marketing, including all kinds of DirectMarketing, Advertising, etc.

The reason for encoding DPV concepts as classes is simple: they are meant to be extended and refined for specific domains and use cases, as described in section 4 of the primer. For example, the purpose AcademicResearch may be specialized by introducing two subclasses, one for medical research and one for AI research. The processing category Collect, that denotes all personal data collection operations, can be specialized by two subclasses, one for the direct collection from the data subject, and one for the import of personal data from a third party, such as Open-ID or a Solid repository, for example. The new subclasses, in turn can be further specialized to support extreme extensibility and granularity. For example, the class Location can be refined by a class EULocation representing all the coordinates that fall within the borders of the European Union; in turn, EULocation can be refined by further terms, one for each country belonging to the Union; such terms can be further refined by classes that represent increasingly specific areas, such as regions, cities, streets, buildings, and even rooms in a building.

OWL2 provides strong interoperability guarantees: its direct semantics constitutes a crisp semantic specification, that guarantees that all compliant software agents understand and process DPV and its extensions in the same way.

The formal semantics of OWL2 provides also other guarantees, for example it can be formally proved that the compliance checking algorithms never return any false positive or negative. This robustness property and flexibility, extensibility and interoperability, make OWL2 the recommended machine-understandable encoding for automated compliance checking (of privacy policies – or records of processing – with respect to data protection regulations and the data subjects’ consent to personal data processing).

OWL2 does not mean poor performance: for example, using the specialized Java compliance checker developed in SPECIAL a compliance check takes only a few hundred μ-seconds.

Modeling and extending DPV with OWL2-PL

It is not necessary to master OWL2 in its full complexity, to encode DPV in OWL2; a few simple features suffice, namely:

No complex classes are needed, only class names. Optionally, DPV’s semantics can be refined by asserting formally that two classes are disjoint:

Such disjointness assertions may help in identifying wrong uses of DPV (like the assignment of a recipient to property hasPurpose).

To give a concrete example, let us show how the term MedicalHealth is encoded in OWL2. In the following examples we use the Manchester syntax of OWL2.

In a few cases, it may be appropriate to introduce new terms that are instances of a class, in particular when the new term represents:

The rule of thumb to decide whether a term should be a class or an instance is based on the question: can the new term possibly be further refined? If the answer is “yes”, then make it a class, otherwise it can be made an instance. In case of doubt, making the new term a class is the safest option.

Instances can be added to the ontology with:

The OWL2 keyword for declaring instances is “Individual”. Its use is illustrated in the following example:

A complete definition of OWL2-PL’s ontology language

The low-complexity profile OWL2-PL allows to define properties and classes with little more than the OWL2 keywords illustrated in the previous section. The complete list of keywords that can be used for this purpose is the following:

No other OWL2 keywords are accepted in the OWL2-PL ontologies that formalize terminologies such as the DPV and its extensions.

In the above list, only the last feature has not yet been explained. It is applied in the use cases related to policies and compliance in order to define functional properties of the data processing operations (that is, properties that may have only one value). This task is under the responsibility of the policy language designer. The users who extend the DPV, as illustrated in the previous section, do not need to use this feature.

Use Case: Automated Compliance Checking

The OWL2 encoding of DPV can be used to encode data usage policies. A data usage policy can be:

Data usage policies are modeled as classes. More precisely, a policy is nothing but the class of permitted data processing operations. A policy P1 (like the privacy policy of a company) complies with a policy P2 (like the consent policy of a data subject, or the objective part of the GDPR) if all the operations permitted by P1 are also permitted by P2, that is,

P1 complies with P2 if, and only if, P1 is a subclass of P2.

A policy can be specified in a compact way by describing the properties of its permitted operations.

In a personal data protection use case, the relevant properties of such operations include (not exclusively) the category of data being processed, the purpose of the processing, the kind of processing (e.g. data collection, data analysis, etc.), the recipients to which the data is transferred, and other properties related to how data is stored and protected, and the legal basis of the processing.

For example, let Policy1 be a privacy policy that permits the collection of email addresses for advertising purposes, based on the consent of the data subjects. This policy – seen as a class of operations – contains all the possible operations with the following DPV features:

This class of operations is encoded in OWL2-PL as follows:

where A6-1-a-consent is a superclass of A6-1-a-explicit-consent and A6-1-a-non-explicit-consent. In a similar way, one can encode a consent statement that permits all the operations that collect contact information for marketing purposes:

Now, since EmailAddress is a subclass of Contact and Advertising is a subclass of Marketing, all the operations that belong to Policy1 satisfy also the requirements for being in Policy2 as well. Then Policy1 is a subclass of Policy2, that is, the privacy policy complies with the data subject’s consent.

It is also easy to say that some properties of a policy must have a single value (an instance). For example, the policy Policy3 stating that the email address “user@provider.com” (and only that mail address) can be transferred to the ACME company (and only that company) for any marketing purposes is encoded in OWL2-PL as follows:

The expressions {user@provider.com} and {myext:ACME} denote the classes that contain only user@provider.com and ACME, respectively. Therefore, all the operations permitted by Policy3 process only the address “user@provider.com” and share it only with ACME, as required. Any other operation is forbidden.

Interestingly, every OWL2 reasoner can be used to check compliance, because they are all able to check whether a class is a subclass of another class. Of course, the reasoners specialized for OWL2-PL are in general more efficient than generic OWL2 reasoners. The part of OWL2-PL aimed at encoding policies supports the above syntax, that is, policies can be defined using the keywords “EquivalentTo”, “and”, and “some” as shown above. Each policy must have a unique definition.

This kind of automated compliance checking has several interesting applications, including the following:

As it should be expected, modeling the GDPR is more complex, but its encoding into OWL2 and OWL2-PL is the responsibility of knowledge engineers, and it does not concern end users. The interested reader may find more details about the OWL2 encoding of the GDPR in several papers and documents, such as [1] and [2].

The complete grammar of the policy language of OWL2-PL can be found in [3], together with a corresponding JSON version designed for the developers that are not familiar with OWL2.

Acknowledgments

OWL2-PL and the semantic policy framework have been developed by the H2020 projects SPECIAL and TRAPEZE, funded by the European Union under grants n. 731601 and 883464, respectively.

[1] Piero A. Bonatti, Sabrina Kirrane, Iliana M. Petrova, Luigi Sauro: Machine Understandable Policies and GDPR Compliance Checking. Künstliche Intell. 34(3): 303-315 (2020)

[2] P.A. Bonatti, L. Ioffredo, G. Luongo, S. Mosi, I.M. Petrova, L. Sauro: Formalization of the GDPR and of the Pilots’ Policies. SPECIAL Report D5. Available at: https://specialprivacy.ercim.eu/images/documents/report-D5.pdf

[3] P.A. Bonatti, J. Langens, L. Sauro: Policy Language – v1. TRAPEZE Report D2.1. Available at: https://trapeze-project.eu/resources/