This note highlights some of the unique characteristics of spatial data within the broader realm of ethical use of data. A brief analysis of the relationship between law and ethics explains that responsible use is not mandatory. Nevertheless, both legal and ethical frameworks play an important role in shaping what can be considered “responsible”. As do the perspectives of those who interact closely with spatial data: the developers, the users and the regulators. Therefore this note not only provides an insight into the relevant legislation and ethics guidelines, but also considers the principles of ethical data sharing from each of these three perspectives. The principles are made practical by providing concrete communication guidelines and showing examples of good practice.

This draft document is not intended to be normative, but rather aims to promote discussion of the responsible use of spatial data. Comments regarding the document are most welcome. Please file issues directly on Github, or send them to our mailing list (subscribe, archives).

Introduction

The purpose of this document is to raise awareness of the ethical responsibilities of both providers and users of spatial data on the web. While there is considerable discussion of data ethics in general, this document illustrates the issues specifically associated with the nature of spatial data and both the benefits and risks of sharing this information implicitly and explicitly on the web.

spatial data may be seen as a fingerprint: For an individual every combination of their location in space, time, and theme is unique. The collection and sharing of individuals spatial data can lead to beneficial insights and services, but it can also compromise citizens' privacy. This, in turn, may make them vulnerable to governmental overreach, tracking, discrimination, unwanted advertisement, and so forth. Hence, spatial data must be handled with due care. But what is careful, and what is careless? Let's discuss this.

Context

The use of data is accelerating, not only owing to increasing technical possibilities like AI and earth observation, but also as a result of crises such as COVID-19 and climate change which accelerate the deployment of data and technology. This is happening on a small and local scale, as well as on a large and global one. Precisely because these data are potentially personal, and its use is becoming commonplace, it is urgent to internalize shared principles for the responsible use of data to achieve greater common value, better data and better products. These are preferably intrinsic principles that guarantee the safety and privacy of people, our social values and human dignity.

This note is a living document, ready to be enriched with ethical wisdom from across the world. Current ideas presented in the note are framed by a western perspective, so further international collaboration is desired.

Too often, data ethics is presented as a solution to avoid the unacceptable consequences of data misuse. This note will aim to demonstrate that this conversation is not only necessary out of fear of misuse, but more importantly, to unlock full potential of spatial data. Users will only contribute their location data if they trust the systems collecting these data and drawing inferences from them. These data may, in turn, improve the well-being and sustainability of our societies.

Data Ethics

The interest and concerns around data ethics have grown over the past few years. Universities offer data ethics courses; companies and organizations design and conduct data ethics codes; government institutions implement data ethics governance. But what does data ethics mean? A simple search provides a myriad of data ethics frameworks, codes and guidelines. They all have the same purpose: to safeguard the responsible use of data.

Not all data are the same: there are varying degrees of sensitivity. In most jurisdictions, the law accounts for the protection of sensitive data, particularly personal data. But can we rely on privacy laws to ensure responsible use of data? Perhaps more importantly, does the law determine what is considered responsible?

During the first wave of COVID-19 pandemic, governments across the world turned to data specialists for help. Seeing as COVID-19 is highly contagious, scientists have advised policies that ensure (social) isolation of COVID-19 patients. To control the spread of the virus, many countries prioritized the identification and location of these patients. The approach towards this policy varied. Some governments, e.g., in China, Israel, Poland, Singapore, South Korea, and Japan invited or legally obliged citizens to use track and trace COVID-19 apps. The apps give government institutions (indeterminate) access to the users’ personal location data and record their potential COVID-19 symptoms. The level of intrusion varies per app, but all of them hold sensitive information as personal location data are, by definition, sensitive.

Not only governments are relying on these tracking apps. Private entities have been acquiring both real time and non-real time location information for years. However, as the technology improves, the relationship between its application and the societal impact changes. Location data have become a commodity. They are in high demand and profitable, but at what cost? And who pays the price? A commoditization of location data has the potential to benefit society, but that requires strong legal or communal pressure. Pressure to collect, process, visualize, use, and remove the data in a responsible way. So, where should this pressure come from? Legislation? Companies relying on location data? Ethical officers employed by these companies?

Every role or function that interacts with location data has a responsibility to show ethical leadership. That is the only way to incorporate ethics by design into the entire project/process. Based on panel discussions, webinars, and workshops about this topic, it is evident that the spatial community is eager to show ethical leadership. The only question left is: How?

Spatial is Special

All domains are special. Space and time, however, are special in how they cut across domains. They are the glue that helps us organize knowledge about the world around us. We may be interested in events, such as landslides, that appeared within a region to mitigate future risks, or we may be interested in events that happen during the same time but in a different region, such as country-specific social distancing measures to slow the spread of COVID-19. In the first case, space remains invariant; in the second case, time is invariant while we are interested in spatial differences.

Spatial data are special in many regards. The regional variability in the examples above illustrates two competing properties of spatial phenomena: First, spatial heterogeneity describes the challenge to find observations that are prototypical for a particular region. For instance, COVID-19 transmission certainly follows specific patterns, but a region may contain an uneven distribution of COVID-19 hotspots. In terms of location privacy, a region may display an uneven distribution of characteristics that lead to different needs for privacy and require different degrees of obfuscation. Second, spatial dependence between observations violates traditional statistical theory. Put differently, samples are not independent; they covary as a function of their distance. For instance, nearby sensors are likely to observe similar amounts of rainfall. With regards to privacy, this implies that revealing information about one individual or one location, also reveals information about unsampled nearby people and locations.

These two competing properties hsimultaneously drive the abilities and risks associated with spatial analytics, e.g., with regards to human movement. We assume that people frequenting the same places (or place types) may have other similarities, such as common interests or demographic characteristics. However, great care has to be taken when trying to assign properties averaged over a region to individuals, a problem known as the ecological fallacy. This is particularly critical as modern information retrieval and recommender system utilize a person's location history as a predictive feature. Frequently visiting a health care facility is not only indicative of health issues but also of being a health care worker and those two cases may be difficult to distinguish.

Location information also increasingly triggers actions in the physical world around us , e.g., via geo-fencing, or determines which information we receive, creating local information bubbles known as hyperlocal media. Unfortunately, masking or perturbing location information comes with its own associated risks. For example, it may place a user into a geographic area associated with another person. Similarly, users that share their location, e.g., bike trajectory, may be mistaken for another person of the substantially larger population of people that do not share their location in real time such as in a recent case involving a racially motivated attack by a biker. Finally, the value of spatial and temporal information does not scale linearly with size, once a certain spatial and temporal resolution has been achieved it becomes possible to monitor people and infer their future behavior on a very fine grained level. This includes positive aspects such as finding missing people or preventing crime, but also highly problematic cases such as targeting individuals, e.g., for political or cultural reasons.

The Nature of Spatio-Temporal Data

Spatial data, data about the physical nature of the world and about the social and economic activities that take place may be described as Spatio-Temporal data, in that it reflects the nature of the world at particular locations at particular times. In simple terms an atlas of Europe published in 1912 would represent now historic national boundaries, place names etc, in the same way a smartphone map of traffic conditions for San Francisco represents the amount of congestion on roads a few seconds previously.

With the advent of smartphones and ability to collect and share information about individuals locations has become mainstream, it is the ability to place someone at a specific location at a specific time that illustrates both the value and the potential risks of such information. It is useful to think of time an space and two sides of a coin, an individual can be at only one location at a specific time, and vice versa at a specific location at a particular time. In general terms if these data are abstracted or reduced in resolution both in terms of time and definition of location the information value is decreased along with the inherent risk of exposing an individuals location.

Use Cases

Use Case 1: Use of social media geotagging to raise awareness of issues

spatial data can be associated with other data in a process called geotagging. When social media posts are geotagged to report issues, it transforms the general public into journalists, bringing awareness to things like natural disasters, crimes, etc., often before conventional media reports them.

Early awareness affords people the ability to prevent worsening of issues. It also enables us to improve our response by accelerating it, broaden it's reach, or applying the innovation of the masses.

  • Raises awareness of issues, potentially faster than any other reporting alternative.
  • Enables integration of social media with issue management in new and unforeseen ways.

Use Case 2: Realtime Traffic Data

For the last decade or more mobile maps applications and online map tools have displayed “real time” traffic information as different coloured roads indicating the relative speed of traffic. This speed information often comes from anonymous contributions of speed data gathered from the sensors of individuals' mobile phones who have opted in to share these data with application developers. The widespread use of turn by turn navigation which takes into account traffic congestion can significantly reduce journey times and the resulting pollution.

  • Survey results ( suggest that the average commumter was able to save 6 hours of annual travel time on public transport, and 13 hours of annual travel time by private car.
  • The total value of time savings as a result of using traffic datat on maps was estimated to have surpassed US$260 billion in 2016

Misuse Cases

Misuse Case 1:Inadvertent location sharing

Smartphones and other personal devices which are both able to estimate their location and connect to mobile data networks have the ability of share the operators location explicitly and implicitly with other users connected to the same network. Beyond the users own awareness of location sharing activities there may be unintended consequences of personal location data becoming public, the exif data added automatically by the camera app of a smart phone will pinpoint the phone at a particular location and time, meaning the user cannot be at other location.

  • Sharing your location via photogrpahs for example, could indicate to criminals that you are not at home and increase the risk of burglary
  • Use of "Find my device" applications on smartphones may allow stalking of the phones owners by abusive partners without their expicit knowlwdge

Existing Ethical Frameworks

Worldwide there is a large range of existing ethical frameworks related to data. Developed by national governments (like the UK Gov data ethics Framework and the USA GSA Data Ethics Framework); by governmental organizations (like the United Nations: Data Privacy, Ethics and Protection Guidance Note and the Eurocities principles on citizen data); by universities (like the Oxford-Munich Code of conduct for professional data scientists); by communities (like the Tech pledge); by companies (like the planet.com code of ethics); by NGOs (like Unicef’s Ethical Considerations When Using Geospatial Technologies for Evidence Generation); and so on.

The purpose, audience and scope of each framework differ a lot. From very abstract to very specific, from general principles to detailed instructions, from a wide audience to a single target group.

It’s relevant to notice that at the highest level, the core values of these frameworks generally correspond. They all recognize the importance of transparency, privacy and security, accountability, inclusiveness and more. However, they also share the same blind spot: none of the frameworks acknowledge that spatial data are special. Even though spatial data are both increasingly topical and necessary, considering the speed at which it is developing.

A noteworthy exception to this are the Unicef ethical considerations. To protect children in unsafe regions, the use of spatial data is both essential and risky. Therefore a critical and detailed assessment of data is crucial.

More Information

For more information on these ethical frameworks, you can follow the links below.

Principles of Ethical Data Sharing - The Developer Perspective

In this section a framework is presented which identifies some approaches which application and online services developers may take to minimize the risk associated with the collection of spatial data collected as a result of the movements of individuals in time and space, Mobility Data.

Privacy preserving techniques should be used to minimize the risk of misuse or disclosure of information while at the same time obtaining the benefits associated with a knowledge of individuals locations to offer location based services and to provide crucial safety information during emergency situations. A developer should ask themselves these questions:

Efficacy of Mobility Data

With the widespread adoption of smartphones and other devices which have direct or indirect abilities to be geolocated, the interest in developing new applications and services which make use of this valuable information have rapidly increased. The Location Based Services market grew rapidly in the last decade but not all applications of mobility data were successful, in many cases the expected benefits were not achieved.

For example an early use case in mobile marketing conceived of a potential customer walking past a shop and automatically receiving notifications from the shop owner offering special offers and other incentives. The reason this did not happen is that intrusive advertising would not work and would only annoy potential customers. In this instance, the use case would not produce the results desired, it is an issue of Efficacy – In simple terms would it even work?

Other issues may result from the difficulty of using Global Navigation Satellite Systems (GNSS) indoors or in dense Urban environments, the power consumption and impact of GNSS receivers on smartphone batteries preventing continuous tracking applications. During the COVID-19 pandemic, contact tracing applications were limited in efficacy because of concerns that citizen location my have been collected, even though these applications were not location based in most cases.

To prevent these type of issues realistic operational testing and the development of extensive user testing should be undertaken to prove the efficacy of the application before the application is introduced and spatial data captured as a result.

Extensive User testing required to prove viability of the application.

There is context to efficacy of course, you might want to try a technique that is unproven if circumstances are severe, a global pandemic might be such an example? If so would it be acceptable to experiment first to gather data using a time limited application?

Linked to this would be the efficacy of an application over time. While it might be acceptable to collect information during an emergency, for example monitoring the location of the population during a hurricane evacuation, the location data would and should have no value when the storm has passed.

This level of specificity of use is a general requirement of most data protection legislation, in that data should only be collected for a particular purpose – so you would be prevented from using the data later for any other purpose.

Data collection for specific purposes should be limited in coverage and resolution.

Related to the temporal aspect of such data collection of course is the spatial context, an application collecting information on the movements of a commuter in Paris should not collect data when the user is on holiday in London or even Nice. Likewise an app developed to monitor adherence to a quarantine requirement should only operate for the duration of the specific event.

Equibility

It’s easy to imagine the development of an application that uses a device's location to validate financial transactions to minimize fraud. Transactions will only be valid if the device is at the same location as the retail purchase. Such applications have been developed although they have not been widely deployed because of questions of efficacy as noted above.

But is it acceptable to expect everyone to have a smartphone with Ambient location technology to be able to make purchases?

Access to services should not rely on access to expensive sophisticated devices. An alternative needs to be available for those without or unwilling to use smartphones for example.

Solutions should be accessible to all members of society.

Access to technology may also vary for other societal or cultural differences and this may not be easily understood or unexpected. It there is an urgent need to build solutions based on ambient location, active measures at a governmental level may be needed for example subsidizing the purchase of smartphones and or data plans. Another approach may be making available dedicated devices as in the case of Singapore’s TraceTogether tokens https://www.tracetogether.gov.sg

It may be the case that for some applications and systems restricting the use of location data collection may result in a reduced user experience for example the incognito mode in Google maps cannot access your past search history or preferred mode of transport preferences. However such limitations should be explained and the decision left to an individual's choice.

Design Choices

For many developers of applications or services that need to implement spatial data collection and sharing discussions around the ethical approach to take may appear rather esoteric or at least academic in nature. How one builds an app or service that uses ambient location in a responsible way is of course a question of engineering design choices

Here a number of design principles are presented, that are hopefully clear and pragmatic. There will of course always be scope for compromises and seldom are choices clearly right or wrong, there must be the ability to use a nuanced approach.

These design principles are by no means comprehensive but a useful starting point.

Location Data collection and/or sharing should be voluntary.

It should be clear the collection and sharing of location data are different things. There are many use cases that might require a user's location to be obtained, but the data do not need to be stored on device or on a server after the location is used.

Clearly a ride sharing application needs to be able to access your phone's location to arrange the dispatch of the closest car. However, the collection of your location data while you are walking about for analytical purposes is not necessary for the operation of the service and you should be able to opt out of this form of collection if you wish.

There should be an explicit mechanism to obtain user consent to the collect and then share Ambient Location Information.

Even if the collection and sharing of spatial data is not optional there should be an explicit notification and ongoing reaffirmation of the user's agreement. This is important particularly if location sharing is a background process. Under these circumstances, a user interface indication should be provided. Section [[[#design-language]]] describes symbols for this purpose.

Of course the user should be able to change their mind and temporarily or permanently stop sharing at a time of their choosing.

Again reinforcing the first principle collection and sharing are separate activities and should require separate user consent.

The purpose of data collection and/or sharing must be explained.

This is already a key foundation of most good data protection regulations, you need to explain clearly why you are collecting location information and how it will be used.

Many public transport providers collect journey information for resource planning purposes. Although it might be useful to know passenger movement prior to making their journey, if they don’t state they will use the data for that purpose, they must not use it!

Data Collection/Sharing should be limited in scope.

Again a key data protection principle is to only collect the minimum amount of data required, there is no allowable concept of “nice to have in case we need it”.

In spatial terms there is a particular issue with resolution both in terms of time and space, there are very few applications outside of turn by turn navigation that require precise real time location data.

For your hyperlocal weather forecasting app Wi-Fi or cell based positioning to within a hundred metres is easily good enough!

Differential privacy provides a promising privacy preserving technique for spatial data. It is based on injecting random noise into spatial data so that both the level of privacy and usefulness of the data is quantified and controlled.

Data must be kept securely and anonymously by default.

There needs to be a really, really good reason for Ambient Location information not to be anonymous. For most of the current popular applications where Ambient Location information is used to “sense” the world, anonymous data are all that is required.

It might be that some considerable effort, as in differential privacy, must be applied to data to maintain privacy, but there is great risk associated with linking individuals to their location.

The recent debate on different approaches to contact tracing, centralized vs. decentralized is illustrative here. In both cases the data collected are anonymous however there is greater risk in the centralized model that there could be a security compromise and data “could” be identifiable at least theoretically.

The risk comes from storing the data in one location as opposed to distributed on individual devices. Against this risk of course there may be counter arguments that from a perspective of epidemiology it is valuable to be able to view the graph of user interactions only possible with a central repository of data.

Regardless of where Ambient Location data are stored it should be secure, encrypted both “At Rest” e.g. on the device or server but also “In transit” while moving across the network between device and server.

Location data may be Personally Identifiable Information (PII)?

The data that can be considered to be personally identifiable extends beyond the obvious name, address and telephone number and there are grey areas specifically with types of spatial data.

Any data that, with the favorite legal term of “reasonable effort”, can be used to identify an individual data subject is PII. So the IP address of the client using your service is PII, as is any device ID specific to a mobile phone for example such as the IMEI or IMSI code.

These are obvious, but spatial imagery also brings unique challenges. While satellite imagery and aerial photography can be argued to be not PII as the resolution of imagery and the generally vertical orientation of imagery makes identifying individuals impossible, the same cannot be said for terrestrial imagery.

Because it would be possible to combine an image taken at ground level where an individual could be recognized, with metadata of when the image was acquired it is necessary for services such as Google Maps Street View and Apple's Look Around to blur faces and car registration plates.

Location data storage must be time and space limited.

Is the collection of Ambient Location Information temporary and limited to a defined period of storage, and if not why not ? Again of course there may be applications where the user might want data to be stored indefinitely. For example, storing location history in Google Maps is a convenient way relive travels. However, without the explicit consent from the user, such historic information should be removed after a sufficiently short and specified period of time.

Privacy-preserving Approaches to Data Collection

An important emerging technique to ensure the privacy of individuals' location data is differential privacy. Differential privacy helps developers of spatial services to derive insights from data while at the same time ensuring that those results do not allow any individual's data to be distinguished or re-identified. Differential privacy works by injecting noise into statistical or spatial functions, reducing resolution of data in time and space while maintaining the information content. Differential Privacy techniques crucially also offer testing mechanisms which allow the success of the techniques to be quantified and varied in effectiveness for particular use cases.

Principles of Ethical Data Sharing - The User Perspective

The sustained popularity of smartphones has driven down the cost of electronic cameras and GNSS receivers to the point where virtually every mobile phone now has the capability to capture geotagged images, making it quick, easy and cheap for the general public to share their location data with the global online community.

While most users understand the basic precautions required for safe financial transactions online and take appropriate steps to protect their personal payment details, many are oblivious to the potential dangers of broadcasting their spatial data. Sharing photos of a garden barbecue, competing with friends on exercise leaderboards or posting a holiday snap on social media are perfectly safe activities provided the user has a clear understanding of the possible pitfalls and takes simple measures to mitigate the risks.

Transparency Concerns

Spatial data is a term that is not well understood by the public and can be a barrier that prevents constructive engagement. However, people are familiar with personal location and this offers a simple example of spatial data which can help engage users in meaningful discussion by use of more inclusive language.

Awareness

Spatial data can be legitimately used for many different activities and users should be aware of its nature and purpose to make informed decisions about whether or not their processed data can be safely shared online. However, it may be unclear what information could be revealed to other people by sharing location data or that spatial data are being included at all.

What location data are included?

Accuracy

GNSS receivers can provide accurate spatial data for outside locations, though the level of detail varies for different applications. For example, a fitness app has a valid need to process the accurate relative position to best calculate the user’s speed and distance, though their absolute position in the world is not relevant to measuring their physical performance. Conversely, a weather app needs to know the user’s location to provide an accurate forecast for their area, but has no proper requirement for their precise location as the climate will not differ on opposite sides of the same street.

Is use of location data clear and properly justified?

Privacy

Sharing personal spatial data creates a unique signature for an individual which can reveal their presence at or absence from an identifiable location at a particular moment in time. For example, sharing photos of a family barbecue online may inadvertently reveal the location of the garden from the image’s geotag which could be used to identify their home address. Subsequently posting a holiday snap on a social media platform could indicate that the family is currently absent from home for a period of time, especially if the picture is from overseas, making the property a more attractive target for criminals.

Do location data reveal personal information?

Security

A single location has many associated properties which can reveal valuable information, so sharing spatial coordinates carries the inherent risk that it can be difficult to foresee the unintended consequences of processing those related details. Even when data are completely anonymized to decouple it from an individual, unexpected information may still be revealed. For example, Strava published a heatmap based on spatial data aggregated from users of their fitness app which clearly showed the locations and details of foreign military bases in Afghanistan and Syria in 2018 due to forces personnel exercising at those bases.

Do location data reveal group information?

Timeliness

The time at which location data is disclosed can be important and may reveal that a known path is followed. For example, posting vehicle tracking data may reveal that a known route is followed on a regular basis which could be used to predict it in future. Also, real time spatial data could be combined with previous trip timings to intercept an irregular journey.

Do location data timings predict future activities?

A Design Language for Location Sharing

Having identified the key factors to consider when sharing or publishing spatial data online, the next step is to communicate those ideas in a simple manner that is easy to understand and remember for all users, including the general public.

How the guidance is applied will depend on the particular use case, but it is important to explain the aims clearly so that content creators have the knowledge and confidence to share their own spatial data in a safe and responsible way.

Key Message Elements

The first message is to raise public awareness that spatial data are as important and sensitive as personal financial details. The public are familiar with basic financial security measures and understand that though there are risks, online transactions can be performed easily and safely with a few simple precautions. The same is true for using spatial data responsibly, so people need to be explicitly aware of situations that may involve sharing of location, e.g. sharing geotagged photos, understand the risks and take suitable precautions.

The most important questions to answer when dealing with location data are:

  1. What is included?
  2. How is it used?
  3. What can it tell others?

Location Statement

Users should expect responsible developers and service providers who collect or use spatial data to provide a clear written statement about location data use in their product which answers these three questions and a comprehensible privacy policy. Statements should include comprehensive justification for spatial data use and practical steps that users can take to mitigate risks and keep themselves safe without sensationalizing the dangers. Failure to provide such information should serve as a warning to the user that the product should not be trusted as the provider does not deem the safety of their users to be sufficiently important.

Effective Communication

The most memorable message is short and simple and encapsulates the idea. For example, before you cross the road: stop, look and listen. A message can be made more memorable by associating it with a pictogram which serves as a quick reminder.

Overall Concept

User awareness should be raised with a simple prompt that location data sharing requires proper consideration.

Sharing? Think Location

Thought Process

Users should be quickly reminded of the thought processes involved in safely sharing location data, as outlined above.

What Is Included?
How Is It Used?
What Can It Tell Others?

User Interface

Developers also have a responsibility to highlight when spatial data are being used. Most modern web browsers display safety-critical information next to the web address to confirm whether network access is encrypted or that the browser is accessing the user’s camera or microphone with easily recognized icons such as a padlock for encryption. Spatial data should be similarly represented using a well-known icon such as the location pin so users are aware that their location is visible to others. For example, when a photo is taken using a smartphone, the user interface should clearly display whether or not geotag information is included.

Use Location
No Location

Principles of Ethical Data Sharing - The Regulator Perspective

The Role of the Regulator

As discussed in section [[[#ethics-and-law]]], ethics and law guide each other. Particularly in times where public or personal safety are concerned, ethical lines can grey with regard to the sharing of location data, as was evidenced by the COVID-19 pandemic of 2019. Here, the role of the regulator is to outline what controls may be necessary to ensure that individuals are protected from harm and breach of privacy.

Other grey examples include tweets during crises that can be used to actually detect the crisis, but also show the user's location in real time. The US Geological Survey Tweet Earthquake Dispatch analyzes social media, and uses geolocation combined with machine learning to detect earthquakes. This analysis is capable of detecting earthquakes faster than conventional scientific instruments which can take up to 20 minutes to perform the same task, albeit more accurately. However, the release of information that provides a location of activities conducted by a particular target group could be used maliciously. For example, geolocation of dating apps could be used by a predator to find common meetup spots where individuals will be alone and may not be able to identify them as strangers.

Location is often used as the means of achieving data integration. This too has its risks. Data which cannot otherwise be linked together can be using the location dimension, and in a way that helps a subject to be identified that might otherwise not be.

The role of regulators is to provide the guard rails that help provide confidence to users that their data will be protected, and to developers that they know their obligations as data managers. To build trust, regulators must ensure that user rights and developer obligations are clearly explained, easily accessible, and broadly understood. The explanation must cover how each regulation protects the citizen, and how they would pursue their rights should a developer breach them. Finally, regulators must also ensure that developers are held accountable for breaching their data management obligations.

Examples of Good Practice

Integrating Authorities (Australia)

Data integration is the process of combining data from multiple sources to create a unified view. The resulting data present new knowledge that can be applied toward achievement of social good, business objectives, and personal gain. There are countless examples of data integration being used for positive outcomes for society, including integration of health data to improve the quality of patient centred care.

Australia has had the concept of Integrating Authorities for some time. Integrating Authorities undertaking high risk data integration projects involving Commonwealth data for statistical and research purposes are regulated, and must be accredited to ensure that the processes they follow to integrate Commonwealth data create accurate matches, whilst at the same time preserve the privacy of the subjects. The Commonwealth also developed 7 High Level Principles for Data Integration. These principles outline, among other things, Custodian and Integrator responsibilities to preserve privacy and confidentiality during integration exercises, along with the need to highlight impacts upon them originating from each exercise.

Whilst restricted to integration of Commonwealth of Australia data, the accreditation provides reasonable guidelines for good practice for integrating data.

More Information

For more information on these topics, you can follow the links below.

Acknowledgements

TBD