1. Introduction
This document defines a simple API for browsers that enables the collection of aggregated, differentially-private metrics.
The primary goal of this API is to enable attribution for advertising.
1.1. Attribution
In advertising, attribution is the process of identifying actions that precede an outcome of interest, and allocating value to those actions.
Actions that are of interest to advertisers are primarily the showing of advertisements (also referred to as impressions). Other actions include ad clicks (or other interactions) and opportunities to show ads that were not taken.
Desired outcomes for advertising are more diverse, as they include any result that an advertiser seeks to improve through the showing of ads. A desirable outcome might also be referred to as a conversion, which refers to "converting" a potential customer into a customer. What counts as a conversion could include sales, subscriptions, page visits, and enquiries.
For this API, actions and outcomes are both events: things that happen once. What is unique about attribution for advertising is that these events might not occur on the same site. Advertisements are most often shown on sites other than the advertiser’s site.
The primary challenge with attribution is in maintaining privacy. Attribution involves connecting activity on different sites. The goal of attribution is to find an impression that was shown to the same person before the conversion occurred.
If attribution information were directly revealed, it would enable unwanted cross-context recognition, thereby enabling tracking.
This document avoids cross context recognition by ensuring that attribution information is aggregated using an aggregation service. The aggregation service is trusted to compute an aggregate without revealing the values that each person contributes to that aggregate.
Strict limits are placed on the amount of information that each browser instance contributes to the aggregates for a given site. Differential privacy is used to provide additional privacy protection for each contribution.
Details of aggregation service operation is included in § 6 Aggregation. The differential privacy design used is outlined in § 7 Differential Privacy.
1.2. Background
From the early days of the Web, advertising has been widely used to financially support the creation of sites.
One characteristic that distinguished the Web from other venues for advertising was the ability to obtain information about the effectiveness of advertising campaigns.
Web advertisers were able to measure key metrics like reach (how many people saw an ad), frequency (how often each person saw an ad), and conversions (how many people saw the ad then later took the action that the ad was supposed to motivate). In comparison, these measurements were far more timely and accurate than for any other medium.
The cost of measurement performance was privacy. In order to produce accurate and comprehensive information, advertising businesses performed extensive tracking of the activity of all Web users. Each browser was given a tracking identifier, often using cookies that were lodged by cross-site content. Every action of interest was logged against this identifier, forming a comprehensive record of a person’s online activities.
Having a detailed record of a person’s actions allowed advertisers to infer characteristics about people. Those characteristics made it easier to choose the right audience for advertising, greatly improving its effectiveness. This created a strong incentive to gather more information.
Online advertising is intensely competitive. Sites that show advertising seek to obtain the most money for each ad placement. Advertisers seek to place advertising where it will have the most effect relative to its cost. Any competitive edge gained by these entities—and the intermediaries that operate on their behalf—depends on having more comprehensive information about a potential audience.
Over time, actions of interest expanded to include nearly every aspects of online activity. Methods were devised to correlate that information with activity outside of the Web. An energetic trade has formed, with multiple purveyors of personal information that is traded for various purposes.
1.3. Goals
The goal of this document is to define a means of performing attribution for advertising that does not enable tracking.
1.4. End-User Benefit
The measurement of advertising performance creates new cross-site flows of information. That information flow creates a privacy risk or cost—of cross-context recognition—that needs to be justified in terms of benefits to end users.
Any benefits realized by end users through the use of attribution are indirect.
End users that visit a website pay for "free" content or services primarily through their attention to any advertisements the site shows them. This "value" accrues to the advertiser, who in turn pays the site. The site is expected to use this money to support the provision of their content or services.
Participation in an attribution measurement system would comprise a secondary cost to Web users.
Support for attribution enables more effective advertising, largely by informing advertisers about what ads perform best, and in what circumstances. Those circumstances might include the time and place that the ad is shown, the person to whom the ad is presented, and the details of the ad itself.
Connecting that information to outcomes allows an advertiser to learn what circumstances most often lead to the outcomes they most value. That allows advertisers to spend more on effective advertising and less on ineffective advertising. This lowers the overall cost of advertising relative to the value obtained. [ONLINE-ADVERTISING]
Sites that provide advertising inventory, such as content publishers and service providers, indirectly benefit from more efficient advertising. Venues for advertising that are better able to show ads that result in the outcomes that advertisers seek can charge more for ad placements.
Sites that obtain support through the placement of advertisements are better able to provide quality content or services. Importantly, that support is derived unevenly from their audience. This can be more equitable than other forms of financial support. Those with a lower tendency or ability to spend on advertised goods obtain the same ad-supported content and services as those who can afford to pay. [EU-AD][COPPACALYPSE]
The ability to supply "free" services supported by advertising has measurable economic benefit that derives from the value of those services. [FREE-GDP]
1.5. Collective Privacy Effect
The use of aggregation—if properly implemented—ensures that information provided to sites is about groups and not individuals.
The introduction of this mechanism therefore represents collective decision-making, as described in Privacy Principles § collective-privacy.
Participation in attribution measurement carries a lower privacy cost when the group that participates is larger. This is due to the effect of aggregation on the ability of sites to extract information about individuals from aggregates. This is especially true for central differential privacy, which is the mathematical basis for the privacy design used in this specification.
Larger cohorts of participants also produce more representative—and therefore more useful—statistics about the advertising that is being measured.
If attribution is justified, both these factors motivate the enablement of attribution for all users.
Acting to enable attribution measurement by user agents will not be positively received by some people. Different people perceive the costs and benefits that come from engaging with advertising differently. The proposed design allows people the option of appearing to participate in attribution without revealing that choice to sites; see § 4.5.1 Optional Participation.
1.6. Attribution Using Histograms
Attribution attempts to measure correlation between one or more ad placements (impressions) and the outcomes that an advertiser desires.
When considered in the aggregate, information about individuals is not useful. Actions and outcomes need to be grouped.
The simplest form of attribution splits impressions into a number of groupings according to the attributes of the advertisement and counts the number of conversions. Groupings might be formed from attributes such as where the ad is shown, what was shown (the "creative"), when the ad was shown, or to whom.
These groupings and the tallies of conversions attributed to each form a histogram. Each bucket of the histogram counts the conversions for a group of ads.
Different groupings might be used for different purposes. For instance, grouping by creative (the content of an ad) might be used to learn which creative works best.
Adding a value greater than one at each conversion enables more than simple counts. Histograms can also aggregate values, which might be used to differentiate between different outcomes. The value that is allocated to impressions is called a conversion value. A higher conversion value might be used for larger purchases or any outcome that is more highly-valued. A conversion value might also be split between multiple impressions to split credit, though this capability is not presently supported in the API.
-
Compatibility with privacy-preserving aggregation services
-
Flexibility to assign buckets
-
As histogram size increases, noise becomes a problem
2. Overview of Operation
The private attribution API provides aggregate information about the association between two classes of events: impressions and conversions.
An impression is any action that an advertiser takes on any website. The API does not constrain what can be recorded as an impression. Typical actions that an advertiser might seek to measure include:
-
Displaying an advertisement.
-
Having a user interact with an advertisement in some way.
-
Not displaying an advertisement (especially for controlled experiments that seek to confirm whether an advertising campaign is effective).
For the API, a conversion is an outcome that is being measured. The API does not constrain what might be considered to be an outcome. Typical outcomes that advertisers might seek to measure include:
-
Making a purchase.
-
Signing up for an account.
-
Visiting a webpage.
The remainder of this section describes how the Private Attribution API operates in conjunction with an aggregation service to produce an aggregate attribution measurement. That operation is illustrated in the following figure.
When an impression occurs, the saveImpression() method can be used to request that the browser save information. This includes an identifier for the impression and some additional information about the impression. For instance, advertisers might use additional information to record whether the impression was an ad view or an ad click.
At conversion time, a conversion report is created. A conversion report is an encrypted histogram contribution that includes information from any impressions that the browser previously stored.
The measureConversion() method accepts a simple query that is used to tell the browser how to construct a conversion report. That includes a simple query that selects from the impressions that the browser has stored, a conversion value that is allocated to the selected impression(s), and other information needed to construct the conversion report.
The histogram created by the conversion report is constructed as follows:
-
If the query found no impressions, or the privacy budget for the site is exhausted, a histogram consisting entirely of zeros (0) is constructed.
-
If one or more matching impressions is found, the browser runs the attribution logic (default last-touch) to select the most recent impression. The provided conversion value is added to a histogram at the bucket that was specified at the time of the attributed impression. All other buckets are set to zero.
The browser updates the privacy budget store to reflect the reported conversion.
The resulting histogram is prepared for aggregation according to the requirements of the chosen aggregation service and returned to the site. This minimally involves encryption of the histogram.
A site that invokes this API will always receive a valid conversion report. As a result, sites learn nothing about what happened on other sites from this interaction.
The site can collect the encrypted histograms it receives from calls to this API and submit them to the aggregation service.
Upon receiving a set of encrypted histograms from a site, the aggregation service:
-
confirms that it has not previously computed an aggregate from the provided inputs and that there are enough conversion reports,
-
adds the histograms including sufficient noise to produce a differentially-private aggregate histogram, and
-
returns the aggregate to the site.
3. API Usage
A site using the Private Attribution API will typically register either impressions or conversions, but in some cases the same site may do both.
To register an impression, a site calls saveImpression(). No preparation is required to use this API beyond collecting parameter values, although it may be useful to examine the supported aggregationServices in deciding whether to use the Private Attribution API.
To request a conversion report, a site calls measureConversion().
Before calling this API, a site must
select a supported aggregation service.
The page may select any of the supported services found in aggregationServices.
The name of the selected service must be supplied as
the aggregationService
member of the PrivateAttributionConversionOptions
dictionary when calling the measureConversion() method.
3.1. Site Identities
This API relies on the HTML definition of site as the primary scope over which it operates. Three types of sites are recognized:
-
An impression site is the site derived from the top-level origin of the relevant settings object at the time that
saveImpression()
is invoked to store an impression. -
A conversion site is the site derived from the top-level origin of the relevant settings object at the time that
measureConversion()
is invoked. -
An intermediary site is the site derived from the origin of the relevant settings object at the time that either
saveImpression()
ormeasureConversion()
is invoked, unless this origin is same site with the impression site or conversion site, respectively.
This API uses site rather than origin because it depends on associating all activity that might have privacy consequences with a single entity. Features like cookies allow privacy-relevant information to be exchanged freely by same site origins, which could otherwise be used to exceed privacy budgets.
3.2. Navigator Interface
partial interface Navigator { [SecureContext ,SameObject ]readonly attribute PrivateAttribution ; };
privateAttribution
3.3. Finding a Supported Aggregation Service
The aggregationServices
attribute
contains a set of aggregation services supported by the user agent. The page
must select and specify one of these services when calling the measureConversion() method.
It may also be useful to query the supported services
before registering an impression,
but that is not required,
and impressions are not scoped to a single aggregation service.
A site might have a preference order for the aggregation services that it uses. The following code iterates over a preference list and finds one that the user agent supports.
const preferredServices= [ "https://aggregator.example/tee" , "https://aggregator.example/dap" , "https://example.com/aggregator" , ]; const supportedServices= navigator. privateAttribution? . aggregationServices; const serviceUrl= preferredServices. find( url=> supportedServices? . has( url));
If the user agent supports the URL
and if it includes one of the preferred services,
the first preferred service is saved
in a variable named serviceUrl
.
Otherwise, serviceUrl
will remain undefined
.
enum PrivateAttributionProtocol {"dap-12-histogram" ,"tee-00" };dictionary {
PrivateAttributionAggregationService required DOMString protocol ; }; [SecureContext ,Exposed =Window ]interface {
PrivateAttributionAggregationServices readonly maplike <USVString ,PrivateAttributionAggregationService >; }; [SecureContext ,Exposed =Window ]interface {
PrivateAttribution readonly attribute PrivateAttributionAggregationServices aggregationServices ; };
The aggregationServices attribute is a mapping from URLs that identify an aggregation service to metadata about that service:
protocol
, of type DOMString- The
protocol
that the aggregation service uses. Different versions of the same protocol use different values. Even if a single service provider supports multiple protocols, each needs to use a different URL. This ensures that each can be uniquely identified by URL without also specifying the choice of protocol.
The URL is passed as the aggregationService
parameter
to measureConversion() to select the identified aggregation service.
The PrivateAttributionProtocol
describes the submission protocol
used by different aggregation services. This document defines two protocols:
dap-12-histogram
- A DAP-based protocol [DAP] that uses MPC; see § 6.1 Multi-Party Computation Aggregation.
tee-00
- A protocol for submission to a TEE; see § 6.2 Trusted Execution Environments.
3.4. Saving Impressions
The saveImpression()
method requests
that the user agent record an impression in the impression store.
In this case, the site saves the impression directly,
identifying the advertiser (advertiser.example
)
and including information that is negotiated by the advertiser.
In the following example,
this includes the filterData
value (2)
that the advertiser might later use to select this advertisement,
the index of the histogram (histogramIndex
= 3)
into which to include any attributed value,
and a retention period (lifetimeDays
= 7)
that is at least as long as the advertiser requires.
navigator. privateAttribution. saveImpression({ histogramIndex: 3 , filterData: 2 , conversionSite: "advertiser.example" , lifetimeDays: 7 , });
Alternatively, an intermediary, such as a Supply-Side Platform (SSP) or Demand-Side Platform (DSP), might call the same API from an iframe. Making the same API call from a frame results in saving the intermediary site identity with the impression.
dictionary {
PrivateAttributionImpressionOptions required unsigned long histogramIndex ;unsigned long filterData = 0;required USVString conversionSite ;unsigned long lifetimeDays = 30; }; [SecureContext ,Exposed =Window ]partial interface PrivateAttribution {undefined saveImpression (PrivateAttributionImpressionOptions ); };
options
The arguments to saveImpression() are as follows:
histogramIndex
, of type unsigned long- If measureConversion() matches this impression with a subsequent conversion, the conversion value will be added to the histogram bucket identified by this index.
filterData
, of type unsigned long, defaulting to0
- An optional piece of metadata associated with the impression. The filterData can be used to identify which impressions may receive attribution from a conversion.
conversionSite
, of type USVString- The site where conversions for this impression may occur, identified by its domain name. The measureConversion() method will only attribute to this impression when called by the indicated site.
lifetimeDays
, of type unsigned long, defaulting to30
- A positive "time to live" (in days) after which the impression can no longer receive attribution. If not specified, the default is 30 days. The user agent should impose an upper limit on the lifetime, and silently reduce the value specified here if it exceeds that limit.
The saveImpression(options)
method
causes the user agent to invoke the save an impression algorithm
with this’s relevant settings object and the provided options.
3.5. Requesting Attribution for a Conversion
The measureConversion()
method
requests that the user agent perform attribution for a conversion,
and return a conversion report.
The measureConversion() method always returns a conversion report, regardless of whether matching impression(s) are found. If there is no match, or if differential privacy disallows reporting the attribution, the returned conversion report will not contribute to the histogram, i.e., will be uniformly zero.
To request the creation of an encrypted measurement,
the site invokes the measureConversion()
method.
This function takes four different types of input:
-
The selected aggregation service, which is identified using a URL. The example process for selecting an aggregation service shows how to select a service that the browser supports.
const serviceDetails= { aggregationService: serviceUrl, }; -
Details of the aggregated measurement. These values will be consistent for all invocations of the API across multiple browsers. This includes the size of the histogram and the amount of privacy budget that might have been expended.
const aggregatedMeasurementDetails= { histogramSize: 20 , epsilon: 1 , }; -
A set of attributes, all optional, that select the impressions to consider. This includes how old impressions can be (
lookbackDays
), the impression sites that might have saved impressions (impressionSites
), the intermediary sites that might have saved impressions (intermediarySites
), and the choice offilterData
.const selectionDetails= { lookbackDays: 14 , impressionSites: [ "publisher.example" , "other.example" ], intermediarySites: [ "ad-tech.example" ], filterData: 2 , }; -
The choice of attribution logic that the browser will apply, plus any parameters that the logic needs.
const attributionDetails= { logic: "last-touch" , value: 3 , maxValue: 7 , };
Once these values are decided, the site invokes the API to obtain an encrypted conversion report.
const measurement= await navigator. privateAttribution. measureConversion({ ... serviceDetails, ... aggregatedMeasurementDetails, ... selectionDetails, ... attributionDetails, }); sendReportToServer( measurement. report);
This report can be collected, along with other reports from this browser and other browsers. Collected reports can then all be submitted to an aggregation service to obtain an aggregate histogram.
dictionary {
PrivateAttributionConversionOptions required USVString aggregationService ;double epsilon = 1.0;required unsigned long histogramSize ;unsigned long lookbackDays ;unsigned long filterData ;sequence <USVString >impressionSites = [];sequence <USVString >intermediarySites = [];PrivateAttributionLogic logic = "last-touch";unsigned long value = 1;unsigned long maxValue = 1; };dictionary {
PrivateAttributionConversionResult required Uint8Array ; }; [
report SecureContext ,Exposed =Window ]partial interface PrivateAttribution {Promise <PrivateAttributionConversionResult >measureConversion (PrivateAttributionConversionOptions ); };
options
The arguments to measureConversion() are as follows:
aggregationService
, of type USVString- A selection from the aggregation services that can be found in aggregationServices.
epsilon
, of type double, defaulting to1.0
- The amount of privacy budget to expend on this conversion report.
histogramSize
, of type unsigned long- The number of histogram buckets to use in the conversion report.
lookbackDays
, of type unsigned long- A positive integer number of days. Only impressions occurring within the past
lookbackDays
may match this conversion. If omitted, it is equivalent to an implementation-defined maximum. filterData
, of type unsigned long- Only impressions having a filterData value matching this value will be eligible to match this conversion.
impressionSites
, of type sequence<USVString>, defaulting to[]
- A set of impression sites. Only impressions recorded where the impression site is in this set are eligible to match this conversion.
intermediarySites
, of type sequence<USVString>, defaulting to[]
- A set of sites which called the saveImpression() API. Only impressions recorded by scripts originating from one of the intermediary sites are eligible to match this conversion.
logic
, of type PrivateAttributionLogic, defaulting to"last-touch"
- A selection from PrivateAttributionLogic indicating the attribution logic to use.
value
, of type unsigned long, defaulting to1
- The conversion value. If an attribution is made and privacy restrictions are satisfied, this value will be encoded into the conversion report.
maxValue
, of type unsigned long, defaulting to1
- The maximum conversion value across all contributions included in the aggregation. Together with epsilon, this is used to calibrate the distribution of random noise that will be added to the outcome. It is also used to determine the amount of privacy budget to expend on this conversion report.
The measureConversion(options)
method
causes the user agent to invoke the measure a conversion algorithm
with this’s relevant settings object and the provided options.
3.6. Permissions Policy Integration
This specification defines two policy-controlled features:
-
Invocation of the saveImpression() API, identified by the string "
".save-impression
-
Invocation of the measureConversion() API, identified by the string "
".measure-conversion
The default allowlist for both of these features is *
.
Having separate permissions for saveImpression() and measureConversion() allows pages that do both to limit subresources to the expected kind of activity.
Enabling permissions by default simplifies the task of integrating external services.
Permissions policy provides only all-or-nothing control, it does not enable delegation of a portion of privacy budget.
4. API Internals
4.1. Impression Store
The impression store is used by the measureConversion() method to find matching impressions.
4.1.1. Contents
The impression store is a set of impressions:
Filter Data | The filterData value passed to saveImpression().
|
---|---|
Impression Site | The impression site where saveImpression() was called. |
Intermediary Site | The intermediary site that called saveImpression(),
or undefined if the API was invoked by the impression site.
|
Conversion Sites | The set of conversion sites that were passed to saveImpression(). |
Timestamp | The time at which saveImpression() was called. |
Lifetime | The number of days an impression remains eligible for attribution, either from the call to saveImpression(), or a user agent-defined limit. |
Histogram Index | The histogram index passed to saveImpression(). |
4.1.2. Maintenance
The user agent should periodically use the timestamp and lifetime values to identify and delete any impressions in the impression store that have expired.
It is not necessary to remove impressions immediately upon expiry, as long as measureConversion() excludes expired impressions from attribution. However, the user agent should not retain expired impressions indefinitely.
4.1.3. Clearing
A mechanism must be provided to clear the impression store. For example, the impression store could be cleared upon activation of the control that disables the Private Attribution API. It is recommended that any mechanism a user agent provides to clear stored browsing data (history, cookies, etc.) be extended to cover the impression store.
4.2. Privacy Budget Store
The privacy budget store records the state of the per-site privacy budgets. It is updated by deduct privacy budget.
The safety limits need to be described in more detail. Some references to clearing the impression store may need to be updated to refer to the privacy budget store as well.
A privacy budget key is a tuple consisting of the following items:
- epoch
- site
-
A site
The privacy budget store is a map whose keys are privacy budget keys and whose values are floats.
To deduct privacy budget given a privacy budget key key, float epsilon, integer sensitivity, and integer globalSensitivity:
-
If the privacy budget store does not contain key, set its value of key to be a user agent-defined value.
-
Let currentValue be the result of getting the value of key in the privacy budget store.
-
If currentValue is less than or equal to 0, return false.
-
Let newValue be currentValue - epsilon * sensitivity / globalSensitivity.
-
Set the value of key in the privacy budget store to newValue.
-
Return whether newValue is greater than or equal to 0.
4.3. Save Impression Algorithm
To save an impression, given an environment settings object settings and given options:
-
Collect the implicit API inputs from settings:
-
The timestamp is set to the current high resolution time.
-
The impression site is set to the result of obtaining a site from the top-level origin.
-
The intermediary site is set to
-
a value of
undefined
if the origin is same site with the top-level origin, -
otherwise, the result of obtaining a site from the origin.
-
-
-
Validate the page-supplied API inputs:
-
If options.
lifetimeDays
is 0, throw aRangeError
. -
Clamp options.
lifetimeDays
to the user agent’s upper limit.
-
-
If the private attribution API is enabled, save the impression to the impression store.
saveImpression() does not return a status indicating whether the impression was recorded. This minimizes the ability to detect when the Private Attribution API is disabled.
4.4. Measure Conversion Algorithm
To measure a conversion, given a environment settings object settings and options:
-
Collect the implicit API inputs from settings:
-
Let now be the current high resolution time.
-
Let topLevelSite (the conversion site) be the result of obtaining a site from the top-level origin.
-
The intermediary site is set to
-
a value of
undefined
if the origin is same site with the top-level origin, -
otherwise, the result of obtaining a site from the origin.
-
-
-
Validate the page-supplied API inputs:
-
If logic is specified, and the value is anything other than "last-touch", throw a
TypeError
. -
If options.
lookbackDays
is 0, throw aRangeError
.
-
-
Let report be an all-zero histogram.
-
If the private attribution API is enabled, set report to the result of do attribution and fill a histogram with options, topLevelSite, and now.
-
Let encryptedReport be the result of encrypting report.
-
Return encryptedReport.
4.4.1. Attribution Logic
A site that measures conversions can specify attribution logic, which determines how the conversion value is allocated to histogram buckets. The measureConversion() function accepts a logic parameter that specifies the attribution logic.
enum {
PrivateAttributionLogic , };
"last-touch"
Each attribution logic specifies a process for allocating values to histogram buckets, after the common matching logic is applied, and privacy budgeting occurs.
To do attribution and fill a histogram, given options, site topLevelSite, and moment now:
-
For each epoch starting from the oldest epoch supported by the user agent to the current privacy budget epoch:
-
Let impressions be the result of invoking common matching logic with options, epoch, and now.
-
If impressions is not empty:
-
Let key be a privacy budget key whose items are epoch and topLevelSite.
-
Let budgetOk be the result of deduct privacy budget with key, options.
epsilon
, options.value
, and options.maxValue
. -
If budgetOk is true, extend matchedImpressions with impressions.
-
-
-
If matchedImpressions is empty, return the the result of invoking create an all-zero histogram with options.
histogramSize
. -
Switch on options.
logic
:- "last-touch"
-
Return the result of fill a histogram with last-touch attribution with matchedImpressions, options.
histogramSize
, and options.value
.
To fill a histogram with last-touch attribution, given a set of impressions matchedImpressions, an integer histogramSize, and an integer value:
-
Let impression be the value in matchedImpressions with the most recent timestamp.
-
Let histogram be the result of invoking create an all-zero histogram with histogramSize.
-
Let index be impression’s histogram index.
-
If index is less than histogram’s size, set histogram[index] to value.
-
Return histogram.
To create an all-zero histogram, given an integer size:
4.4.2. Common Impression Matching Logic
To perform common matching logic, given options, epoch, and moment now:
-
Let lookbackDays be options.
lookbackDays
if it exists, the implementation-defined maximum otherwise. -
If the number of days since the end of epoch exceeds lookbackDays, return matching.
-
For each impression in the impression store for the epoch:
-
If now - lookbackDays is after impression’s timestamp, continue.
-
If options.
filterData
exists, and it is not equal to impression’s filter data, continue. -
If options.
impressionSites
does not contain impression’s impression site, continue. -
Append impression to matching.
-
-
Return matching.
4.5. User Control and Visibility
Consider merging this section with § 9.2 Disabling the Private Attribution API.
4.5.1. Optional Participation
-
Users should be able to opt out. Opt out should be undetectable.
This mechanism may be a dedicated control for the Private Attribution API, or it may be a consolidated privacy control that applies to multiple features, including private attribution. Further, user agent developers should consider interaction of other privacy modes with the Private Attribution API. For example, attribution might be disabled in a private browsing mode, or it might be disabled if the user has opted out of collection of diagnostic data.
4.5.2. Visibility
-
User ability to view the impression store and past report submissions.
5. Implementation Considerations
-
Management and distribution of values for the following:
-
Histogram size
-
Ad IDs
-
6. Aggregation
An aggregation service takes multiple pieces of attribution information and produces an aggregate metric.
User agent implementations will have different requirements for aggregation. However, the aggregation process has some common elements.
Firstly, user agents will need to be configured with, or otherwise obtain, information about the aggregation service. This includes the aggregation methods that are supported and any configuration that is required.
Each aggregation method needs to define how a histogram is:
-
prepared for aggregation,
-
encrypted,
-
annotated with any necessary metadata, and
-
submitted to the aggregation service for aggregation.
The aggregation method also needs to define how the aggregated result is obtained by a site.
6.1. Multi-Party Computation Aggregation
A Multi-Party Computation (MPC) system is one that involves multiple independent entities that cooperatively compute an agreed function.
This specification uses an MPC system based on Prio [PRIO] and the Distributed Aggregation Protocol (DAP) [DAP]. This is a two-party MPC system that is characterized by its reliance on client-provided proofs of correctness for inputs. This allows for very efficient MPC operation at a modest cost in the size of submissions to the system.
An aggregation service that uses Multi-Party Computation (MPC) comprises two or more independent services that cooperate to compute a predefined function.
The basic guarantee provided by MPC is that only the defined outputs of a function, plus well-defined leakage, is revealed to any entity.
The MPC guarantees hold only to the extent that a subset of the entities that participate are honest. For the two-party MPC used in Prio, privacy—that is, the confidentiality of inputs—is maintained as long as either MPC operator remains honest. This MPC configuration does not protect against the corruption of the outputs by either MPC operator.
6.1.1. Prio and DAP
The "dap-12-histogram" aggregation method uses Prio [PRIO] and the Distributed Aggregation Protocol (DAP) [DAP]. Specifically, this aggregation method uses the Prio3L1BoundSum instantiation [PRIO-L1] of the Prio3 Verifiable Distributed Aggregation Function (VDAF) [VDAF].
DAP and the Prio3L1BoundSum instantiation define how a report is prepared, encrypted, and submitted for aggregation. DAP also defines how an aggregate is obtained and what configuration is necessary for a user agent to obtain about the aggregation service.
Several extensions to DAP [DAP-EXT] are necessary for this application:
-
Late task binding improves the ability of a site to collect reports and aggregate them as needed.
-
Website identity is critical to ensure that differential privacy protections are effective. This prevents a malicious actor that is able to correlate user identity across multiple sites from exceeding the sensitivity bounds for that user by aggregating reports from multiple sites together.
-
Privacy budget consumption ensures that the aggregation service does not aggregate reports that received less privacy budget than the aggregation task was configured with.
User agents need to include all of these extensions in reports that they generate.
6.2. Trusted Execution Environments
A Trusted Execution Environment (TEE) uses specialized hardware to ensure that computation is isolated from other programs that run on the same hardware.
TODO
6.3. Anti-Replay Requirements
Conversion reports generated by browsers are bound to the amount of privacy budget that was expended by the site that requested the report.
An aggregation service MUST guarantee that it does not accept the same report more than once.
7. Differential Privacy
This design uses the concept of differential privacy as the basis of its privacy design. [PPA-DP]
Differential privacy is a mathematical definition of privacy that can guarantee the amount of private information that is revealed by a system. [DP] Differential privacy is not the only means by which privacy is protected in this system, but it is the most rigorously defined and analyzed. As such, it provides the strongest privacy guarantees.
Differential privacy uses randomized noise to hide private data contributions to an aggregated dataset. The effect of noise is to hide individual contributions to the dataset, but to retain the usefulness of any aggregated analysis.
To apply differential privacy, it is necessary to define what information is protected. In this system, the protected information is the impressions of a single user profile, on a single user agent, over a single epoch, for a single website that registers conversions. § 7.1 Privacy Unit describes the implications of this design in more detail.
This attribution design uses a form of differential privacy called individual differential privacy. In this model, user agents are each separately responsible for ensuring that they limit the information that is contributed.
The individual differential privacy design of this API has three primary components:
-
User agents limit (using the privacy budget) the amount of information about impressions that leaves the device through conversion reports. § 7.2 Privacy Budgets explores this in greater depth.
-
Aggregation services ensure that any given conversion report is only used in accordance with the privacy budget that was accounted for it by the user agent. § 6.3 Anti-Replay Requirements describes requirements on aggregation services in more detail.
-
Noise is added by aggregation services. § 7.3 Differential Privacy Mechanisms details the mechanisms that might be used.
Together, these measures place limits on the information that is released for each privacy unit.
7.1. Privacy Unit
An implementation of differential privacy requires a clear definition for what is protected. This is known as the privacy unit, which represents the entity that receives privacy protection.
This system adopts a privacy unit that is the combination of three values:
-
A user agent profile. That is, an instance of a user agent, as used by a single person.
-
The site that requests information about impressions.
The sites that register impressions are not considered. Those sites do not receive information from this system directly.
-
The current epoch.
A change to any of these values produces a new privacy unit, which results in a separate privacy budget. Each site that a person visits receives a bounded amount of information for each epoch.
Ideally, the privacy unit is a single person. Though ideal, it is not possible to develop a useful system that guarantees perfect correspondence with a person, for a number of reasons:
-
People use multiple browsers and multiple devices, often without coordination.
-
A unit that covered all websites could be exhausted by one site, denying other sites any information.
-
Advertising is an ongoing activity. Without allocating privacy budget for new data, sites could exhaust their budget forever.
7.1.1. Browser Instances
Each browser instance manages a separate privacy budget.
Coordination between browser instances might be possible, but not expected. That coordination might allow privacy to be improved by reducing the total amount of information that is released. It might also improve the utility of attribution by allowing impressions on one browser instance to be converted on another.
Coordination across different implementations is presently out of scope for this work. Implementations can perform some coordination between instances that are known to be for the same person, but this is not mandatory.
7.1.2. Per-Site Limits
The information released to websites is done on the basis of site. This aligns with the same boundary used in other privacy-relevant functions.
A finer privacy unit, such as an origin, would make it trivial to obtain additional information. Information about the same person could be gathered from multiple origins. That information could then be combined by exploiting the free flow of information within the site, using cookies [COOKIES] or similar.
§ 7.2.2 Safety Limits discusses attacks that exploit this limit and some additional safety limits that might be implemented by user agents to protect against those attacks.
7.1.3. Privacy Budget Epochs
Sites receive a separate differential privacy budget that is used to query impressions recorded in each time interval. This period is called a privacy budget epoch (or simply epoch) and its duration is one week (7 days), where a day is 86400 seconds.
This budget applies to the impressions that are registered with the user agent and later queried, not conversions.
From the perspective of the analysis [PPA-DP] each epoch of impressions forms a separate database. A finite privacy budget is enforced across all the queries made on each database.
Having a conversion report produced from impressions that span multiple epochs has privacy consequences. A single visit to a website can give that site information about activities across many epochs. This only requires that the conversion site is identified as the destination for impressions over that entire period. The number of epochs that can be queried is limited by user agents.
The goal is to set an epoch that is as large as feasible. A longer period of time allows for a better privacy/utility balance because sites can be allocated a larger overall budget at any point in time, while keeping the overall rate of privacy loss low. However, a longer interval means that it is easier to exhaust a privacy budget completely, yield no information until the next refresh.
The decision to set the epoch duration to a week is largely arbitrary. One week is expected to be enough to allow sites some flexibility to make decisions about how to spend privacy budgets without careful planning that needs to account for changes that might occur days or weeks in the future.
§ 7.2 Privacy Budgets describes the process for budgeting in more detail.
7.2. Privacy Budgets
Browsers maintain privacy budgets, which is a means of limiting the amount of privacy loss.
This specification uses an individual form of (ε, δ)-differential privacy as its basis. In this model, privacy loss is measured using the value ε. The δ value is handled by the aggregation service when adding noise to aggregates.
Each user agent instance is responsible for managing privacy budgets.
Each conversion report that is requested specifies an ε value that represents the amount of privacy budget that the report consumes and a max on the value that can be returned in the conversion report.
7.2.1. Privacy Budget Deduction
When searching for impressions for the conversion report, the user agent deducts the specified ε value from the budget for the privacy budget epoch in which those impressions were saved. If the privacy budget for that epoch is not sufficient, the impressions from that epoch are not used.
The details of how to deduct privacy budget is given below ... WIP
A conversion report might be requested at the time marked with "now". That conversion report selects impressions marked with black circles, corresponding to impressions from Site B, C, and E.
As a result, privacy budgets for the querying site is deducted from epochs 1, 3, 4, and 5. No impressions were recorded for epoch 2, so no budget is deducted from that epoch.
How a user agent manages exhaustion of a privacy budget depends on the attribution logic that was specified.
7.2.2. Safety Limits
The basic privacy unit is vulnerable to attack by an adversary that is able to correlate activity for the same person across multiple sites.
Groups of sites can sometimes coordinate their activity, such as when they have shared ownership or strong agreements. A group of sites that can be sure that particular visitor is the same person—using any means, including something like FedCM [FEDCM]—can combine information gained from this API.
This can be used to increase the rate at which a site gains information from attribution, proportional to the number of sites across which coordination occurs. The default privacy unit places no limit on the information released in this way.
To counteract this effect, user agents can implement safety limits, which are additional privacy budgets that do not consider site. Safety limits might be significantly higher than per-site budgets, so that they are not reached for most normal browsing activity. The goal would be to ensure that they are only effective for intensive activity or when being attacked.
Like the per-site privacy budget, it is critical that sites be unable to determine whether their request for a conversion report has caused a safety limit to be exceeded.
7.3. Differential Privacy Mechanisms
The specific mechanisms that are used depend on the type of aggregation service.
8. Security Considerations
8.1. Impression Store
The impression store used by the Private Attribution API holds information related to browsing activity and persists across browsing sessions. Although the flow of information through the impression store is strictly controlled, it carries some amount of information across origins.
The following measures limit the possibility of harmful information flow through the impression store:
-
Websites cannot read from the impression store. Information from the impression store is released only via encrypted conversion reports. Differential privacy, provided by a combination of functionality in the user agent and in the aggregation service, provides a rigorous bound on the probability that the aggregated information output by the aggregation service is distinguishable from the value it would have absent any user’s contribution.
-
Users can explicitly clear the impression store.
-
It is recommended that user agents limit how long data can persist in the impression store, even absent explicit user action, by imposing a maximum value of lifetimeDays.
8.2. API Implementation
The Private Attribution APIs must be implemented carefully to maintain the required security and privacy properties. A site calling the APIs must not be able to learn:
-
Whether the Private Attribution APIs are enabled.
-
Whether an attribution occurred.
-
Whether the privacy budget is exhausted.
-
Whether the conversion report reflects a non-zero conversion value.
-
Which histogramIndex is assigned the conversion value.
Note that explicit return values or thrown exceptions are not the only way that a site can learn from the Private Attribution APIs. It may be possible to infer sensitive information from side channels like:
-
Variation in the time it takes for the APIs to complete.
-
Consumption of memory or storage by the API, if that consumption is somehow observable by the site.
While complete elimination of all side channels is impractical, implementations must make reasonable efforts to prevent leakage of sensitive information from the attribution APIs. Strategies to prevent leakage include:
-
Fully validating all API inputs, even when the API is disabled.
-
Avoiding conditional logic. For example, measureConversion() should always go through the full process of constructing a conversion report, even when the conversion value to be reported is zero.
8.3. Aggregation Services
Although not part of the web platform, security of aggregation services is quite important to the overall security of the Private Attribution mechanism. Conversion reports produced by measureConversion() are encrypted to cryptographic key(s) of the aggregation service. Thus, much of the potential for disclosure of the information contained in these reports depends on the details of the aggregation service.
User agent developers should carefully consider the design of an aggregation service and the trustworthiness of the aggregation service operator before adding it as a supported service for the Private Attribution API. Additional discussion of these issues may be found in § 6 Aggregation and § 9 Privacy Considerations.
8.4. Combining Reports from Multiple Sites
The privacy mechanisms in the Private Attribution API operate primarily at the granularity of sites. A malicious operator may attempt to register impressions for multiple sites, thus exceeding the amount of information that would otherwise be released through private attribution. § 7.2.2 Safety Limits discusses establishing additional cross-site privacy budgets to mitigate this possibility.
Rate limits on calls to the Private Attribution APIs could also be an effective mechanism to prevent harvesting information through overuse of the APIs.
8.5. Ad Fraud
As with many technologies, advertising on the web has been the subject of various kinds of fraud.
Fraudulent registration of impressions is a particular concern with the Private Attribution API, because impressions are stored only on the device. It is not possible to apply server-side intelligence to identify fraudulent impressions and exclude them from attribution. Conversely, even though conversion reports are encrypted, because the reports are sent to a server, the server can make a determination that the conversion is likely fraudulent and exclude it from aggregation.
An important mitigation against malicious use of the Private Attribution APIs is the explicit specification of eligible conversion sites when registering an impression, and of eligible impression sites and ad IDs when registering a conversion. This prevents impressions on arbitrary malicious sites from interfering with attribution to the intended set of candidate impressions.
9. Privacy Considerations
9.1. Information Exposed by the Private Attribution API
The impression store and privacy budget store contain information about a cross-section of browsing activity. As use of the API increases, so does the scope of this information. However, most of the information written to these stores is never disclosed. Because attribution is performed on the device (on-device attribution), only information about attributed conversions is exposed by the Private Attribution API. This contrasts with other schemes in which information about both impressions and conversions is sent to the aggregation service for off-device attribution. In the latter class of schemes, the amount of information that could be revealed in a compromise of the aggregation service (or in a compromise of communication with the aggregation service) is significantly larger.
When the Private Attribution API makes an attribution, information about that attribution is released from the device only to the extent the differential privacy restrictions allow.
While the Private Attribution API is intended to measure the association of relatively infrequent conversion events with a limited set of related impression candidates, it is important to consider how the API might be misused for larger-scale data collection. The requirement that impressions enumerate the possible conversion sites (and vice-versa) has an important role in preventing misuse of the API for mass data collection, and in making attempts at such misuse more visible.
It is unclear whether the privacy budget store should be cleared whenever the impression store is cleared. On one hand, it contains information about browsing activity, so is desirable to include it when clearing browsing activity. On the other hand, it is only possible to strictly adhere to the requirements of the differential privacy mechanism, if information about a fully- or partially- depleted privacy budget is maintained until that budget is no longer relevant (i.e. the end of the privacy budget epoch).
9.2. Disabling the Private Attribution API
The Private Attribution API is designed to reveal only aggregate information. The use of differential privacy limits the chance of determining whether any particular user contributed to the aggregated output. However, some users may still prefer not to participate in attribution measurement. As discussed in § 4.5.1 Optional Participation, the user agent must provide a mechanism for the user to disable the Private Attribution API.
To minimize the risk of fingerprinting, and to prevent discrimination against users who choose to disable the Private Attribution API, sites must not be able to detect that the API is disabled. Specifically, all calls to the Private Attribution API that are otherwise valid, must complete successfully, even when the API is disabled. The only difference in behavior is that conversion reports returned when the API is disabled will never report any conversion value. Because the reports are encrypted, this difference cannot be detected by the site receiving the conversion report.
9.3. Including Identifying Information with Saved Impressions
Sites are able to encode some amount of data
in impressions,
using filterData
or other fields.
The API does not prevent sites from encoding user identifiers
in these fields.
The attribution process can use this data
when constructing a conversion report,
which implies some risk of that identifying information
becoming available to the site that receives that report.
The following measures mitigate this risk:
-
The impression store cannot be read directly. Thus, identifiers are only usable for tracking to the extent information about them is revealed in conversion reports.
-
The information in conversion reports is only revealed after aggregation and the addition of noise.
-
Users have the ability to clear the impression store.
-
No impressions are saved to the impression store when the Private Attribution API is disabled.
9.4. Use in Third-party Contexts
The Private Attribution API is available even in third-party contexts. In particular, a third-party iframe may call saveImpression(). Note, however, that the impression is recorded with the site of the top-level navigation context, not the origin of the iframe.
While the availability of the API in third-party contexts carries some increase in privacy risk, this support is deemed necessary because iframes are commonly used to display advertisements.
10. Acknowledgements
This specification is the result of a lot of work from many people. The broad shape of this level of the API is based on an idea from Luke Winstrom. The privacy architecture is courtesy of the authors of [PPA-DP].