Exposure of settings and characteristics of browsers can impact user privacy by allowing for browser fingerprinting. This document defines different types of fingerprinting, considers distinct levels of mitigation for the related privacy risks and provides guidance for Web specification authors on how to balance these concerns when designing new Web features.
This is a draft of a document intended to be published as an Interest Group Note. Constructive feedback of all kinds is welcomed; feel free to contact the editor directly or send comments to the public-privacy mailing list (public archives).

What is fingerprinting?

In short, browser fingerprinting is the capability of a site to identify or re-identify a visiting user, user agent or device via configuration settings or other observable characteristics.

A more detailed list of types of fingerprinting is included below. A similar definition is provided by [[RFC6973]].

Privacy impacts and threat models

Browser fingerprinting can be used as a security measure (e.g. as means of authenticating the user). However, fingerprinting is also a potential threat to users' privacy on the Web. This document does not attempt to provide a single unifying definition of "privacy" or "personal data", but we highlight how browser fingerprinting might impact users' privacy. For example, browser fingerprinting can be used to:

The privacy implications associated with each use case are discussed below. Following from the practice of security threat model analysis, we note that there are distinct models of privacy threats for fingerprinting. Defenses against these threats differ, depending on the particular privacy implication and the threat model of the user.

Identify a user

There are many reasons why users might wish to remain anonymous or unidentified online, including: concerns about surveillance, personal physical safety, concerns about discrimination against them based on what they read or write when using the Web. When a browser fingerprint is correlated with identifying information (like a real name), an application or service provider may be able to identify an otherwise pseudonymous user.

Users concerned about physical safety from, for example, a governmental adversary might employ onion routing systems such as Tor to limit network-level linkability but still face the danger of browser fingerprinting to correlate their Web-based activity.

Unexpected correlation of browsing activity

Fingerprinting provides privacy concerns even when real-world identities are not implicated. Some users may be surprised or concerned that an online party can correlate multiple visits (on the same or different sites) to develop a profile or history of the user. This concern may be heightened because it may occur without the user's knowledge or consent and tools such as clearing cookies do not prevent or “re-set” correlation done via browser fingerprinting.

Fingerprinting also allows for tracking across origins: different sites may be able to combine information about a single user even where a cookie policy would block accessing of the cookies between origins, because the fingerprinting is relatively unique and the same for all origins.

Inferences about a user

The observable characteristics used for browser fingerprinting can themselves reveal information from which inferences can be drawn about a user. For example, the OS version and CPU information might be used to draw inferences about a user’s purchasing power or proclivity (for example). Users may consider this an unwelcome intrusion into their privacy even if they remain unidentified. Additionally, decisions might be made based on these inferences (e.g. which offers to display and at what price) that users perceive as discriminatory and an instance of being singled out and treated differently. This intrusion is compounded when browser fingerprints are correlated with user credentials, purchasing histories and other information about the user (e.g. cross-site browsing histories).

Is this in scope for browser fingerprinting? Or is this just a general privacy concern about leakage of information via observable characteristics?

Types of fingerprinting

Passive

Passive fingerprinting is browser fingerprinting based on characteristics observable in the contents of Web requests, without the use of any code executing on the client side.

Passive fingerprinting would trivially include cookies (often unique identifiers sent in HTTP requests) and the set of HTTP request headers and the IP address and other network-level information. The User-Agent string, for example, is an HTTP request header that typically identifies the browser, renderer, version and operating system. For some populations, the user agent string and IP address will commonly uniquely identify a particular user's browser [[NDSS-FINGERPRINTING]].

Active

For active fingerprinting, we also consider techniques where a site runs JavaScript or other code on the local client to observe additional characteristics about the browser. Techniques for active fingerprinting might include accessing the window size, enumerating fonts or plug-ins, evaluating performance characteristics, or rendering graphical patterns.

Users, user agents and devices may also be re-identified by a site that first sets and later retrieves state stored by a user agent or device. This cookie-like fingerprinting allows re-identification of a user or inferences about a user in the same way that HTTP cookies allow state management for the stateless HTTP protocol [[RFC6265]].

Cookie-like fingerprinting can also circumvent user attempts to limit or clear cookies stored by the user agent, as demonstrated by the "evercookie" implementation [[EVERCOOKIE]]. Where state is maintained across user agents (as in the case of common plugins with local storage), across devices (as in the case of certain browser syncing mechanisms) or across software upgrades, cookie-like fingerprinting can allow re-identification of users, user agents or devices where active and passive fingerprinting might not.

Feasibility

Fingerprinting mitigation levels of success

There are different levels of success in addressing browser fingerprinting:

Decreased fingerprinting surface
Removing the source of entropy or accessible attributes that can be used for fingerprinting.
Increased anonymity set
By standardization, convention or common implementation, increasing the commonality of particular configurations to decrease the likelihood of unique fingerprintability.
Detectable fingerprinting
Making (in particular, client-side) fingerprinting observable to the user agent or some other party, so that the user agent might block it or a crawler can determine that it's happening.

A lost cause?

Given the advances in techniques for browser fingerprinting (see the Research section below), particularly in active fingerprinting, many have asked whether browser fingerprinting is a "lost cause" and mitigations therefore not worth pursuing during the design process. This document works under the expectation that mitigations with different levels of success are feasible under different circumstances, for different threat models and against different types of fingerprinting. In general, active fingerprinting may be made detectable; we can minimize increases to the surface of passive fingerprinting; and cookie-like fingerprinting can be documented to enable clearing local state.

However, the mitigations recommended here are simply mitigations, not solutions. Research in browser fingerprinting continues and even with the mitigations described here, users should not rely on sites being completely unable to recognize or correlate traffic, most especially when executing client-side code. A fingerprinting surface extends across all implemented Web features for a particular user agent, and even to other layers of the stack. In order to mitigate the risk as a whole, fingerprinting must be considered during the design and development of all specifications.

Some implementers and some users may be willing to accept reduced functionality or decreased performance in order to minimize browser fingerprinting. Documenting which features have fingerprinting risk eases the work of implementers building modes for these at-risk users; minimizing fingerprinting even in cases where common implementations will have easy active fingerprintability allows such users to reduce the functionality trade-offs necessary.

Mitigations

Weighing increased fingerprinting surface

The fingerprinting surface of a user agent is the set of observable characteristics that can be used in concert to identify a user, user agent or device or correlate its activity. Web specification authors regularly attempt to strike a balance between new functionality and fingerprinting surface. For example, feature detection functionality allows for progressive enhancement, but detailed granularity in feature detection increases the fingerprinting surface of a user agent. (An attacker can test for many features on every visitor's browser and might uniquely identify a user by the exact set of features enabled.)

Authors and Working Groups determine the appropriate balance between these properties on a case-by-case basis, given their understanding of the functionality, its likely implementations and the entropy of increased fingerprinting surface. However, given the distinct privacy impacts described above and in order to improve consistency across specifications, the following requirements provide guidance for this balance:

Avoid any increase to the surface for passive fingerprinting.

Unless a feature cannot reasonably be designed in any other way, increased passive fingerprintability should be prevented.

Prefer functionally-comparable designs that don't increase the surface for active fingerprinting.

If comparable functionality could be accomplished without increasing the surface for active fingerprinting, prefer the less fingerprintable alternative. Defining "equivalent" or "comparable" functionality can be difficult; use your best judgment and avoid unnecessary fingerprintability.

The difference between these practices recognizes that passive fingerprinting surface has lesser options for mitigation (lacking external detectability and client-side preventability) and greater feasibility for reduction.

Mark features that contribute to fingerprintability.

Where a feature does contribute to the fingerprinting surface, indicate that impact, by explaining the effect (and any known implementer mitigations) and marking the relevant section with a fingerprinting icon, as this paragraph is.

This practice (and this image) is drawn from the HTML5 specification, which uses it throughout. Can we get feedback from the HTML WG or from readers of that specification as to whether the practice has been useful?

A standardized profile?

TODO: why it would be useful to have standardized profile values, where possible

TODO: why we don't typically recommend that we try to do this across user agent implementations ... that is, why we're not advocating for getting rid of the User Agent string

TODO: explain why randomization probably isn't helpful

Specify orderings and non-functional differences.

To reduce unnecessary entropy, specify aspects of API return values and behavior that don't contribute to functional differences. For example, if the ordering of return values in a list has no semantic value, specify a particular ordering (alphabetical order by a defined algorithm, for example) so that incidental differences don't expose fingerprinting surface.

Detectability

Where a client-side API provides some fingerprinting surface, authors can still mitigate the privacy concerns via detectability. If client-side fingerprinting activity is to some extent distinguishable from functional use of APIs, user agent implementations may have an opportunity to prevent ongoing fingerprinting or make it observable to users and external researchers (including academics or relevant regulators) who may be able to detect and investigate the use of fingerprinting.

Design APIs to access only the entropy necessary.

Following the basic principle of data minimization, design your APIs such that a site can access (and does access by default) only the entropy necessary for particular functionality.

TODO: An example would probably be very useful here.

Anticipate disabled functionality for the fingerprinting-conscious.

If your specification exposes some fingerprinting surface (whether it's active or passive), some implementers (e.g. the Tor Browser) are going to be compelled to disable those features for certain privacy-conscious users. Following the principle of progressive enhancement, and to avoid further divergence (which might itself expose variation in users), consider whether some functionality in your specification is still possible if fingerprinting surface features are disabled.

Clearing all local state

Features which enable storage of data on the client and functionality for client- or server-side querying of that data can increase the ease of cookie-like fingerprinting. Storage can vary between large amounts of data (for example, the Web Storage API) or just a binary flag (has or has not provided a certain permission; has or has not cached a single resource).

Avoid new cookie-like local state mechanisms.

If functionality does not require maintaining client-side state in a way that is subsequently queryable (or otherwise observable), avoid creating a new cookie-like feature. Can the functionality be accomplished with existing HTTP cookies or an existing JavaScript local storage API?

Where features do require setting and retrieving local state, there are ways to mitigate the privacy impacts related to unexpected cookie-like behavior; in particular, you can help implementers prevent "permanent", "zombie" or "evercookies".

Highlight any local state mechanisms to enable simultaneous clearing.

Clearly note where state is being maintained and could be queried and provide guidance to implementers on enabling simultaneous deletion of local state for users. Such functionality can mitigate the threat of "evercookies" because the presence of state in one such storage mechanism can't be used to persist and re-create an identifier. As a result, your design should not rely on saving and later querying data on the client beyond a user's clearing all local state. That is, you should not expect any local state information to be permanent.

Though not strictly browser fingerprinting, there are other privacy concerns regarding user tracking for features that provide local storage of data. Mitigations suggested in the Web Storage API specification include: white-listing, black-listing, expiration and secure deletion [[WEBSTORAGE-user-tracking]].

Do Not Track: a cooperative approach

Expressions of, and compliance with, a Do Not Track signal may not prevent or inhibit browser fingerprinting, but may mitigate some user concerns about fingerprinting, specifically around tracking as defined in those specifications [[TRACKING-DNT]] [[TRACKING-COMPLIANCE]] and as implemented by services engaged in fingerprinting that voluntarily comply with those user preferences.

This mitigation is included here for completeness: DNT standardization and adoption is ongoing. The use of DNT in this way should not require any work for your specification, unless for some reason you need to specify a certain specific DNT behavior for your functionality.

Research

Some browser developers maintain pages on browser fingerprinting, including: potential mitigations or modifications necessary to decrease the surface of that browser engine; different vectors that can be used for fingerprinting; potential future work. These are not cheery, optimistic documents.

What are the key papers to read here, historically or to give the latest on fingerprinting techniques? What are some areas of open research that might be relevant?

Testing

A non-exhaustive list of sites that allow the visitor to test their configuration for fingerprintability.

Acknowledgements

Many thanks to Robin Berjon for ReSpec and to Tobie Langel for Github advice; to the Privacy Interest Group for review; and to Christine Runnegar for contributions.