This document provides guidance for representing information associated with data breaches using the [[[DPV]]]. Such information includes details about the breach event, its detection, preliminary and subsequent investigations, notifications and communications, risk and impact assessments, and mitigations.
Contributing: The DPVCG welcomes participation to improve the DPV and associated resources, including expansion or refinement of concepts, requesting information and applications, and addressing open issues. See contributing guide for further information.
[[[DPV]]]: is the base/core specification for the 'Data Privacy Vocabulary', which is extended for Personal Data [[PD]], Locations [[LOC]], Risk Management [[RISK]], Technology [[TECH]], and [[AI]]. Specific [[LEGAL]] extensions are also provided which model jurisdiction specific regulations and concepts - see the complete list of extensions. To support understanding and applications of [[DPV]], various guides and resources [[GUIDES]] are provided, including a [[PRIMER]]. A Search Index of all concepts from DPV and extensions is available.
[[DPV]] and related resources are published on GitHub. For a general overview of the Data Protection Vocabularies and Controls Community Group [[DPVCG]], its history, deliverables, and activities - refer to DPVCG Website. For meetings, see the DPVCG calendar.
The peer-reviewed article “Creating A Vocabulary for Data Privacy” presents a historical overview of the DPVCG, and describes the methodology and structure of the DPV along with describing its creation. An open-access version can be accessed here, here, and here. The article Data Privacy Vocabulary (DPV) - Version 2, accepted for presentation at the 23rd International Semantic Web Conference (ISWC 2024), describes the changes made in DPV v2.
A data breach is an event and has temporal and causal properties to indicate when and why it took place. Some of this information may or may not be known - for example, when a breach is first detected the start and end times may be unknown, with subsequent investigations identifying this information. In addition, information may not be accurate or may be incorrect - for example, a preliminary investigation may state that the affected data was encrypted whereas subsequent investigation may discover that it was unencrypted. Therefore, the information about a breach continues to change as the investigations progress.
How the changes themselves should be recorded is outside the scope of this guidance since it relates to the organisation's governance practices. For example, the organisation may keep a single record of the data breach and update it as new information is available, or alternatively the organisation may create distinct records of the data breach to represent discrete information available at different stages of investigation. Similarly, the investigation of a breach's impact is also separate to the analysis of the cause of the breach and may be carried out at various points based on the availability of information. For example, availability of new information may trigger an additional requirement to conduct an impact assessment based on the perceived changes in risk and consequences.
This guide therefore is limited to the following information about data breaches:
While the [[DPV]] and this guide are written in a jurisdiction agnostic manner, it is heavily based on the [[GDPR]] and [[NIS2]] definitions and requirements regarding data breaches and incidents. Where relevant, GDPR and NIS2 specific information and requirements are provided separately within the respective sections.
The following sources were used in the development of this document:
DataBreachHandlingProcedure
to add to Organisational Measures.
The following namespaces are used throughout this document:
prefix | URI |
base (default) | https://w3id.org/dpv/databreach# |
dpv | https://w3id.org/dpv# |
dpv-gdpr | https://w3id.org/dpv/dpv-gdpr# |
risk | https://w3id.org/dpv/risk# |
dct | http://purl.org/dc/terms/ |
ex | https://example.com/ |
A data breach incident starts from when a breach is suspected or detected, and ends with a final report detailing all impacts and mitigation measures taken in connection with it. In addition to these, organisational processes may involve unclear or ambiguous information, for example - where sufficient information is not available to state whether a breach exists, or where a breach is suspected but not yet confirmed.
Between the suspicion or detection of a breach and the end of handling it, there are various stages that correlate with: the availability of information, carrying out investigations based on available information, communicating with other entities such as authorities and data subjects in connection with the breach, and carrying out assessments as part of investigations regarding risk management and impacts on data subjects. Each of these stages may require record-keeping and documentation outlining the facts at that point in time.
To represent these stages and their associated information, the following are the lifecycle stages as modelled within the [[RISK]] extension regarding incidents and data breaches:
In the above, the breach incident is separated from its investigation or 'breach handling process', which can occur at any stage depending on when the breach has been detected as well as the effectiveness of the response to it. Based on the breach, the following documented reports are typically expected by the [[GDPR]] and [[NIS2]] in connection with a data breach and are used to communicate the breach to entities e.g. to authorities.
In the process of handling a data breach, various information will be communicated - for example to notify the authority or controller about a breach detection, or to indicate to data subjects how they should safeguard themselves. Such communications are termed as Data Breach Notices based on the relevant phrase Data Breach Notification. Categories of notifications communicated throughout the lifecycle of the breach are as below.
The structure and form of information within notices varies based on the actors involved and the availability of information to them. For example, notices sent between Processors and Controllers have differences in terms of the information they communicate and its use in assessments. The processor's data breach assessment may be unaware about the exact nature of data or the extent of data subjects affected, which also limits the identification of impacts. Similarly, notices sent to the data subjects do not have detailed assessments about the breach, but instead will only provide a subset of the total information, and will typically include contextual information such as what measures the data subjects should take in order to safeguard themselves.
The concept risk:DataBreach
represents the event of a data breach. It is a subtype of dpv:Risk
to denote the common use and reference of referring to "risk of data breach". This allows risk assessment and management concepts to be applied to data breaches, and to associate consequences and impacts arising from it.
The temporal properties associated with a breach include its occurrence i.e. when the breach started and ended - as its duration. This is indicated using the relation dct:temporal
to specify the start (if known) and end (if known). If both are not available, then dpv:NotAvailable
is used as the value. There can be more than one duration for the same breach, for example to represent three different non-continuous events considered a part of a single data breach incident. Whether to represent this as a single or three different breaches is a contextual choice and decision. For example, where the three events accessed different data or systems or need different mitigation measures - it may be sensible to treat them as separate breaches.
The concepts for representing risk assessment in the context of data breach is currently being discussed. Its outcomes need to be integrated in this document in terms of the specified concepts and their usage in examples.
The 'source' of a breach is the event which led to the breach taking place. It is represented using the concept dpv:RiskSource
and associated with the breach using dpv:hasRiskSource
relation. An example of a source of data breach is where an employee left their office computer unattended without locking it. The lack of security in securing office machines is a vulnerability, represented using the concept risk:Vulnerability
and associated using the relation risk:hasVulnerability
. Vulnerabilities represent a weakness or limitation that was exploited in order to create the risk source event which then leads to the data breach. Vulnerabilities are not necessary for a risk source to realise, for example an accident can occur without a vulnerability.
The 'cause' of a breach is the actor or agent that caused the breach to take place through intentional or unintentional means, and regardless of malicious intent or accidents. It is represented by risk:ThreatActor
and associated using risk:hasThreatActor
In the above case of an unattended office computer, the threat actor is whomever accessed the system after the employee left - this could be another employee, a cleaner, or an external person entering the premises. The existence of an actor is not a necessity for a breach to take place. For example, a disk drive containing sensitive data being thrown away is considered a data breach regardless of whether anyone actually accesses the data on it or not.
The 'status' of a data breach refers to whether the breach has been concluded or is still ongoing. This is represented by DataBreachStatus
and associated using dpv:hasStatus
. The following statuses are defined to represent commonly encountered stages:
DataBreachStatusUnknown
: the status of a data breach is unknownDataBreachSuspected
: the state where a data breach is suspected, but has not yet been confirmed. This can be due to lack of information, or because the process of detection and investigation is still ongoing.DataBreachOngoing
: the data breach is ongoing i.e. still activeDataBreachHalted
: the data breach has halted or paused with a high likelihood of resuming or recurringDataBreachConcluded
: the data breach has stopped or finished or concluded without any active mitigation and with a low likelihood of resuming or recurringDataBreachTerminated
: the data breach has been stopped or terminated through the use of a mitigation or deterrent measure with a low likelihood of resuming or recurringDataBreachMitigated
: the data breach has been mitigated against future recurrences i.e. a measure has been applied to prevent the same or similar data breach from recurringIn these statuses, the use of 'halted' refers to a breach being stopped but with uncertainty regarding it being resumed - where if it resumes the 'ongoing' status is applicable again. The status 'concluded' refers to situations where the breach has stopped without active effort from the breached entity, whereas 'terminated' means the actions of the breached entity have caused it to be stopped, and 'mitigated' means it has also been prevented from happening again. These distinctions are relevant as the measures used in termination can be found ineffective, or may be effective only for a short duration or in limited cases, whereas the concluded breach has no mitigation associated with it being stopped - and hence there is a risk of the breach happening again. Mitigated is distinct from these as it also refers to future breaches of the same type being prevented from occurring. The measures used to stop an ongoing breach (termination) may be distinct from those used to prevent it in the future (mitigation).
The 'type' of a data breach refers to categorisation based on the effect of the breach on the affected data. The Data Protection Authorities suggest three categories - 'confidentiality breach' where there is unauthorised or accidental disclosure of or access to data; 'integrity breach' where there is unauthorised or accidental alteration or modification of data; and 'availability breach' where there is unauthorised or accidental loss of access or loss of data (e.g. via destruction or transformation). These explanations are simplified, and their origin is in the 'CIA model' used in information security to categorise incidents using these three categories. Specifying the type of breach is important as it represents what 'definition' of a breach should be interpreted and informs the follow-up investigations and reporting. The typical notion of a data breach only occurring when someone else has access to data is only one of these definitions (confidentiality). The type of breach is indicated using rdf:type
with values dpv-breach:ConfidentialityBreach
, dpv-breach:AvailabilityBreach
, and dpv-breach:IntegrityBreach - which are defined as subtype of dpv-breach:DataBreach
. A single data breach incident can have one or more types i.e. these types are not mutually exclusive.
A data breach can have several identifiers - for example those within an organisation, across organisations, or in correspondence with authorities. To indicate these, the concept dpv-breach:DataBreachIdentifier
is used and associated with using dpv:hasIdentifier
. The distinct concept allows further categorisation of identifiers, and more importantly to distinguish who has provided which identifier, e.g. using dct:creator
and dct:publisher
.
Reports about a data breach document information about the breach as well as its investigation and handling activities. They are represented by DataBreachReport
, and are separated into types to represent detection of a breach and stage of investigation/breach handling - preliminary, ongoing, concluded. These are represented by DataBreachPreliminaryReport
, DataBreachOngoingReport
and DataBreachConcludingReport
respectively. Reports can be embedded or linked to each other. For example, a single larger report as an instance of DataBreachReport
can reference instances representing the different stages through dct:hasPart
, for example to represent a comprehensive or grouped reporting of data breach when communicating with authorities.
A breach detection report is the documentation detailing specifies the reporting of a data breach being detected, along with any pertinent details about the detection itself. It is meant to be generated at specific points in time for record-keeping purposes, and to be communicated to other entities where needed for compliance purposes. The detection information is necessary to be separated from the information about a data breach (earlier section) to indicate contextual information - for example to denote when an entity became 'aware' of the breach as separate from the temporal properties of the breach itself. As there can be multiple entities involved in a breach, e.g. processor and controller, they will each have their own detection report.
To represent the breach detection report, the concept dpv-breach:DataBreachDetectionReport
is to be used. The usual DCMI properties are to be utilised here, e.g. dct:subject
to indicate which data breach is the subject of this report, dct:created
to indicate when the report was created - and hence when was a breach first 'detected', and dct:creator
to indicate who created the report. To further report updates, dct:modified
is available to indicate further changes.
To indicate the source of information, for example in connection with who reported the breach, the property dpv:hasDataSource
should be used, e.g. with values Employees, specific Data Subject, link to a news item, etc.
To specify any communications providing information about the breach (detection), the property dpv:hasNotice
should be used. This can be incoming information (entity is recipient) or outgoing (entity is sender). To specify information contents of a notice as a form of communication, schema:Message
should be used to document details such as sender and recipient. To report on the status of detection (as a form of investigation), the existing dpv:ActivityStatus
concepts should be used.
Following from detection, a preliminary investigation report needs to be drafted for cases when the breach has to be reported, e.g. within 72 hours. This is represented by dpv-breach:DataBreachPreliminaryReport
, which is a subclass of dpv-breach:DataBreachReport
. As with dpv-breach:DataBreachDetectionReport
, the properties of dct:subject
, dct:created
, and dct:creator
are applicable here. In a preliminary investigation report, more details are expected to be present than at the time of detection.
To indicate what personal data has been affected, the relation dpv:hasPersonalData
is to be used, with values of type dpv:PersonalData
or its categories such as dpv:SpecialCategoryPersonalData
. To indicate the scale of data, dpv:hasDataVolume
is to be used. Qualifiers can reuse the existing DPV concepts, e.g. dpv:HugeDataVolume
, and quantifiers where indicating the actual number of data records affected is necessary can directly specify the value or use an instance or subtype of dpv:DataVolume
with rdf:value
to denote the quantity.
To indicate data subjects affected, the relation dpv:hasDataSubject
is to be used to denote the category of data data subjects, and the relation dpv:hasDataSubjectScale
to be used to denote their scale. Similar to how Data Volume can be a qualifier or a quantifier, the scale of data subjects can be based on dpv:DataSubjectScale
or a quantity.
To indicate processing activities, the relation dpv:hasProcessing
is to be used to denote the specific types of processing taking place, and the relation dpv:hasProcessingScale
is to be used to indicate its scale - with dpv:ProcessingScale
or a quantifier to indicate number of data records. To indicate the geographic coverage, the relation dpv:hasJurisdiction
is be used to indicate the affected jurisdictions. To only indicate locations without associating jurisdiction, the relation dpv:hasLocation
can be used.
dpv:isImplementedUsingTechnology
is to be used, and to specify who implemented it the relation dpv:isImplementedByEntity
is to be used.
In addition to scale of data, data subjects, and processing, the Data Protection Authorities also require reporting whether there is any cross-border context to the data breach including data subjects or processing activities in multiple jurisdictions. While this information can be derived from the values of dpv:hasJurisdiction
, the concept CrossBorderDataBreach
is provided as a subtype of DataBreach
to explicitly indicate this information. Here, 'cross-border' can mean one or more of the following: data subjects from multiple jurisdictions are involved, processing (including storage and transfer) takes place in multiple jurisdictions, or the organisation is based on or operates in multiple jurisdictions. To distinguish between these, no further subtype are defined as, instead the information is to be represented directly in its appropriate context, e.g. through dpv:hasJurisdiction
to directly indicate affected jurisdictions, or to indicate jurisdictions in connection with values of dpv:hasDataSubject
, dpv:hasProcessing
or dpv:ProcessingContext
, and associating jurisdictions with an entity.
The personal data, data subjects, processing, and other details can be grouped using dpv:PersonalDataHandling
to indicate separation, such as for jurisdictions affected, or technologies affected. The granularity of this information is unbound as the 'graph' of what was affected can be as large or small as required. From the data breach reporting forms present on authority websites, in most cases it is sufficient to indicate the abstract or summary information at the preliminary stage.
To indicate risks the relation dpv:hasRisk
is to be used, for consequence the relation dpv:hasConsequence
is to be used, and for impact the relation dpv:hasImpact
is to be used. The [[RISK]] extension provides taxonomies that can be used with these. The risks, consequences, and impacts can be specified directly in connection with the breach or the report, or through an impact assessment. The impact assessment associated with a data breach is represented by DataBreachImpactAssessment
as a subtype of dpv:hasImpactAssessment
. Note that the concept of impact assessment here refers to impact on or for data subjects - the internal assessment of impact (e.g. loss of business) is not part of this impact assessment and could be represented separately through a risk assessment.
The data breach reporting requires explicitly indicating whether there will be an impact to fundamental rights - most commonly as a Yes / No option. To indicate this, the concept risk:ImpactOnFundamentalRights
from [[RISK]] is to be used with the relation dpv:hasImpact
. To express whether risks or impacts will materialise i.e. take place or not, the relation risk:hasLikelihood
is to be used. Here, rather than specifying the likelihood as impossible or 0, it is better to specify the lowest value as as the likelihood being extremely unlikely - which in [[RISK]] is modelled as the value 0.01
. If any information is not available, then dpv:NotAvailable
should be used.
Data breach reporting also requires information on what technical and organisational measures were in place before the breach, deficiencies identified, and any changes or additional measures taken to address the breach. To specify the affected TOMs, the relations dpv:hasTechincalMeasure
and dpv:hasOrganisationalMeasure
are to be used. To specify their limitations or failures, the relation risk:hasVulnerability
is to be used, with dpv:isMitigatedByMeasure
and dpv-breach:DataBreachMitigationMeasure
to indicate how the data breach has been addressed.
The earlier examples showed messages being passed to inform other entities about data breaches, for examples from Processor to Controller, and from Controller to the DPA. The message contents are expected to be accompanied with a report of appropriate form, e.g. for detection, preliminary or final investigation, or containing the impact assessment. Other than these, the notifications to data subjects are also of a similar form but with different contents - where there may be information on steps to be taken for the data subjects to mitigate impacts.
To distinguish between these different types of communications, the following subtype of dpv:Notice
are defined:
AuthorityDataBreachNotice
: A Notice sent to an Authority from a Processor or Controller or an Authority or Data Subject.ProcessorDataBreachNotice
: A Notice sent to a Processor from another Processor or a Controller or an Authority.ControllerDataBreachNotice
: A Notice sent to a Controller from a Processor or another Controller or an Authority.DataSubjectBreachNotice
: A Notice sent to one or more Data Subjects from a Controller.The contents of notices between Controllers, Processors, and Authorities will typically include information about the breach which are represented here as various reports. The difference in notifications to the data subjects is that there may be actions to be taken, for example to safeguard themselves against adverse impacts. To indicate actions to be taken by the data subjects, the simplest representation would be to directly indicate dpv:hasRisk
or dpv:hasImpact
, along with dpv:Likelihood
and dpv:Severity
, and then using dpv:isMitigatedByMeasure
to indicate what can be done.
Metadata about the notices, as forms of communication, is to be represented using the DCMI and Schema.org vocabularies. For example, to indicate the medium of the notification, dct:medium
should be used, with dct:format
, dct:instructionalMethod
, and dct:language
providing further information about the contents. Similarly, schema.org is to be used to denote the correspondence of notices as instances of schema:Message
. This enables use of schema:dateSent
and schema:dateReceived
to indicate the timestamps associated with the notice, schema:sender
and schema:recipient
to record entities involved, and schema:messageAttachment
to link to information or documents exchanged.
To indicate whether some notifications are being planned, are ongoing, or have been completed - dpv:ActivityStatus
is to be used. This is relevant to indicate and differentiate between planned activities and those that have been completed - which is relevant for demonstration of legal obligations.
Reporting a breach requires information about the organisation. This information should also be represented using the same vocabulary. For example, the Irish Data Protection Commission's data breach reporting form asks for name (dpv:hasName
), address (dpv:hasAddress
), whether there are EU/EEA establishments (see next para), sector (one of Public, Private, Charity, Voluntary - use entity types from DPV), sub-sector (using dpv:hasSector
and the NACE taxonomy), and an internal ID for breach (using dpv:hasIdentifier
with DataBreachIdentifier
).
For EU/EEA establishments, DPV contains the concept dpv:Establishment
which is to be used along with the relation dpv:isMainEstablishmentFor
to indicate the main establishment for a given jurisdiction, and the relation dpv:isEstablishmentOf
to indicate parent company.
A Data Breach Impact Assessment (DBIA) is similar to a Data Protection Impact Assessment (DPIA) in that it is undertaken to assess the risks and impacts to data subjects, where the DPIA is intended to address a planned process and the DBIA relates to a data breach that has taken place. A DBIA is essential in order to determine whether to notify authorities and data subjects regarding the breach, and to plan the necessary measures to handle the breach. If the DBIA indicates that the breach is likely to result in a risk to the rights and freedoms of individuals - then the authorities have to be notified. If the DBIA indicates a high risk to the rights and freedoms of individuals - then both the authorities and data subjects have to be notified.
A DBIA is based on the information available at a given time. Therefore, it can be undertaken at any stage in the Data Breach handling process - from preliminary where fewer information is known to final where all information is available. As such, the same DBIA (i.e. addressing the same data breach) can evolve over time in terms of its identified risks and impacts based on the corresponding changes in information.
To represent a DBIA, the concept DataBreachImpactAssessment
is used. The DCMI metadata terms are available for annotating it with relevant information and descriptions (e.g. title, dates, provenance) as well as to associate it with a specific data breach using dct:subject
.
A Data Breach Impact Assessment (DBIA) typically contains the following information, which here is categorised into three distinct categories:
The information about the breach in terms of what data or individuals have been affected is the same as that present in a Data Breach Report, and therefore can be included through a link to the report itself or explicitly added to the assessment in a manner similar to that of the report.
Information about the risk assessment includes the specifics of what risks are applicable, the consequences and impacts arising from it, their severity and likelihood, and the impacted stakeholders. In these, the specific risks and impacts to the rights and freedoms of individuals are an important consideration that must be present. Additionally, specific risks commonly associated with data breaches should also be mentioned. Such risks include (re-)identification of individuals through breached data e.g. because the data contained identifying information, or could be combined with other available data; or the risk of breached data not being safeguarded sufficiently (e.g. weak encryption) which could lead to its potential misuses.
A typical risk assessment undertaken for data breach incidents informs about risks associated with data (e.g. unauthorised access, restoration) and individuals (e.g. identifiability, harms). To represent these, we use [[[RISK]]] concepts. The following examples demonstrate risk assessments based on data breach incidents.
Examples
How to express risk to fundamental rights - present, not present
Examples
How to show mitigations
Examples
How to document that a data breach has been prevented - using risk assessment
Risk of data breach - mitigated by measure
Examples
Common causes of data breaches - guidance and mitigations
Examples
What is a Data Breach Register? Incident record. Why to maintain it? How to maintain it?
What information is present in a breach register?
How to record breaches? status? outcomes?
Examples
The DPVCG was established as part of the SPECIAL H2020 Project, which received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 731601 from 2017 to 2019.
Harshvardhan J. Pandit was funded to work on DPV from 2020 to 2022 by the Irish Research Council's Government of Ireland Postdoctoral Fellowship Grant#GOIPD/2020/790.
The ADAPT SFI Centre for Digital Media Technology is funded by Science Foundation Ireland through the SFI Research Centres Programme and is co-funded under the European Regional Development Fund (ERDF) through Grant#13/RC/2106 (2018 to 2020) and Grant#13/RC/2106_P2 (2021 onwards).
The contributions of Harshvardhan J. Pandit and Rob Brennan have been made with the financial support of Science Foundation Ireland under Grant Agreement No. 13/RC/2106_P2 at the ADAPT SFI Research Centre.