This specification defines a mechanism by which user agents may verify that a fetched resource has been delivered without unexpected manipulation.
A list of changes to this document may be found at https://github.com/w3c/webappsec.
Sites and applications on the web are rarely composed of resources from only a single origin. For example, authors pull scripts and styles from a wide variety of services and content delivery networks, and must trust that the delivered representation is, in fact, what they expected to load. If an attacker can trick a user into downloading content from a hostile server (via DNS poisoning, or other such means), the author has no recourse. Likewise, an attacker who can replace the file on the CDN server has the ability to inject arbitrary content.
Delivering resources over a secure channel mitigates some of this risk: with TLS, HSTS, and pinned public keys, a user agent can be fairly certain that it is indeed speaking with the server it believes it’s talking to. These mechanisms, however, authenticate only the server, not the content. An attacker (or administrator) with access to the server can manipulate content with impunity. Ideally, authors would not only be able to pin the keys of a server, but also pin the content, ensuring that an exact representation of a resource, and only that representation, loads and executes.
This document specifies such a validation scheme, extending two HTML elements
and the fetch()
API with an integrity
attribute that contains a cryptographic hash
of the representation of the resource the author expects to load. For instance,
an author may wish to load some framework from a shared server rather than hosting it
on their own origin. Specifying that the expected SHA-384 hash of
https://example.com/example-framework.js
is Li9vy3DqF8tnTXuiaAJuML3ky+er10rcgNR/VqsVpcw+ThHmYcwiB1pbOxEbzJr7
means
that the user agent can verify that the data it loads from that URL matches
that expected hash before executing the JavaScript it contains. This
integrity verification significantly reduces the risk that an attacker can
substitute malicious content.
This example can be communicated to a user agent by adding the hash to a
script
element, like so:
<script src="https://example.com/example-framework.js"
integrity="sha384-Li9vy3DqF8tnTXuiaAJuML3ky+er10rcgNR/VqsVpcw+ThHmYcwiB1pbOxEbzJr7"
crossorigin="anonymous"></script>
Scripts, of course, are not the only response type which would benefit
from integrity validation. The scheme specified here also applies to link
and future versions of the specification are likely to expand this coverage.
Compromise of a third-party service should not automatically mean compromise of every site which includes its scripts. Content authors will have a mechanism by which they can specify expectations for content they load, meaning for example that they could load a specific script, and not any script that happens to have a particular URL.
The verification mechanism should have error-reporting functionality which would inform the author that an invalid response was received.
An author wishes to use a content delivery network to improve performance
for globally-distributed users. It is important, however, to ensure that
the CDN’s servers deliver only the code the author expects them to
deliver. To mitigate the risk that a CDN compromise (or unexpectedly malicious
behavior) would change that site in unfortunate ways, the following
integrity metadata is added to the link
element included on the page:
<link rel="stylesheet" href="https://site53.example.net/style.css"
integrity="sha384-+/M6kredJcxdsqkczBUjMLvqyHb1K/JThDXWsBVxMEeZHEaMKEOEct339VItX1zB"
crossorigin="anonymous">
An author wants to include JavaScript provided by a third-party
analytics service. To ensure that only the code that has been carefully
reviewed is executed, the author generates integrity metadata for
the script, and adds it to the script
element:
<script src="https://analytics-r-us.example.com/v1.0/include.js"
integrity="sha384-MBO5IDfYaE6c6Aao94oZrIOiC6CGiSN2n4QUbHNPhzk5Xhm0djZLQqTpL0HzTUxk"
crossorigin="anonymous"></script>
A user agent wishes to ensure that JavaScript code running in high-privilege HTML contexts (for example, a browser’s New Tab page) aren’t manipulated before display. Integrity metadata mitigates the risk that altered JavaScript will run in these pages’ high-privilege contexts.
Conformance requirements phrased as algorithms or specific steps can be implemented in any manner, so long as the end result is equivalent. In particular, the algorithms defined in this specification are intended to be easy to understand and are not intended to be performant. Implementers are encouraged to optimize.
This section defines several terms used throughout the document.
The term digest refers to the base64-encoded result of executing a cryptographic hash function on an arbitrary block of data.
The term origin is defined in the Origin specification. [[!RFC6454]]
The terms secure document and secure context are defined in section 2 of the Secure Contexts specification. An example of a secure document is a document loaded over HTTPS. A counterexample is a document loaded over HTTP.
A potentially secure origin is defined in section 2 of the Mixed
Content specification. An example of a potentially secure origin
is an origin whose scheme component is HTTPS
.
The message body and the transfer encoding of a resource are defined by RFC7230, section 3. [[!RFC7230]]
The representation data and content encoding of a resource are defined by RFC7231, section 3. [[!RFC7231]]
A base64 encoding is defined in RFC 4648, section 4. [[!RFC4648]]
The SHA-256, SHA-384, and SHA-512 are part of the SHA-2 set of cryptographic hash functions defined by the NIST in “FIPS PUB 180-4: Secure Hash Standard (SHS)”.
The Augmented Backus-Naur Form (ABNF) notation used in this document is specified in RFC5234. [[!ABNF]]
The following core rules are included by reference, as defined in
Appendix B.1 of [[!ABNF]]: WSP
(white space)
and VCHAR
(printing characters).
The integrity verification mechanism specified here boils down to the process of generating a sufficiently strong cryptographic digest for a resource, and transmitting that digest to a user agent so that it may be used to verify the response.
To verify the integrity of a response, a user agent requires integrity metadata, which consists of the following pieces of information:
The hash function and digest MUST be provided in order to validate a response’s integrity.
At the moment, no options are defined. However, future versions of the spec may define options, such as MIME types [[!MIMETYPE]].
This metadata MUST be encoded in the same format as the hash-source
(without the single quotes)
in section 4.2 of the Content Security Policy Level 2 specification.
For example, given a script resource containing only the string "alert('Hello, world.');",
an author might choose [SHA-384sha2 as a hash function.
H8BRh8j48O9oYatfu5AZzq6A9RINhZO5H16dQZngK7T62em8MUt1FLm52t+eX6xO
is the base64-encoded
digest that results. This can be encoded as follows:
sha384-H8BRh8j48O9oYatfu5AZzq6A9RINhZO5H16dQZngK7T62em8MUt1FLm52t+eX6xO
Digests may be generated using any number of utilities. OpenSSL, for example, is quite commonly available. The example in this section is the result of the following command line:
echo -n "alert('Hello, world.');" | openssl dgst -sha384 -binary | openssl enc -base64 -A
Conformant user agents MUST support the SHA-256, SHA-384 and SHA-512 cryptographic hash functions for use as part of a request’s integrity metadata, and MAY support additional hash functions.
Multiple sets of integrity metadata may be associated with a single resource in order to provide agility in the face of future cryptographic discoveries. For example, the resource described in the previous section may be described by either of the following hash expressions:
sha384-dOTZf16X8p34q2/kYyEFm0jh89uTjikhnzjeLeF0FHsEaYKb1A1cv+Lyv4Hk8vHd
sha512-Q2bFTOhEALkN8hOms2FKTDLy7eugP2zFZ1T8LCvX42Fp3WoNr3bjZSAHeOsHrbV1Fu9/A0EzCinRE7Af1ofPrw==
Authors may choose to specify both, for example:
<script src="hello_world.js"
integrity="sha384-dOTZf16X8p34q2/kYyEFm0jh89uTjikhnzjeLeF0FHsEaYKb1A1cv+Lyv4Hk8vHd
sha512-Q2bFTOhEALkN8hOms2FKTDLy7eugP2zFZ1T8LCvX42Fp3WoNr3bjZSAHeOsHrbV1Fu9/A0EzCinRE7Af1ofPrw=="
crossorigin="anonymous"></script>
In this case, the user agent will choose the strongest hash function in the list, and use that metadata to validate the response (as described below in the “parse metadata” and “get the strongest metadata from set” algorithms).
When a hash function is determined to be insecure, user agents SHOULD deprecate and eventually remove support for integrity validation using that hash function. User agents MAY check the validity of responses using a digest based on a deprecated function.
To allow authors to switch to stronger hash functions without being held back by older user agents, validation using unsupported hash functions acts like no integrity value was provided (see the “Does response match metadataList” algorithm below). Authors are encouraged to use strong hash functions, and to begin migrating to stronger hash functions as they become available.
User agents must provide a mechanism for determining the relative priority of two
hash functions and return the empty string if the priority is equal. That is, if
a user agent implemented a function like getPrioritizedHashFunction(a,
b) it would return the hash function the user agent considers the most
collision-resistant. For example, getPrioritizedHashFunction('sha256',
'sha512')
would return 'sha512'
and getPrioritizedHashFunction('sha256',
'sha256')
would return the empty string.
The getPrioritizedHashFunction is an internal implementation detail. It is not an API that implementors provide to web applications. It is used in this document only to simplify the algorithm description.
In order to mitigate an attacker’s ability to read data cross-origin by brute-forcing values via integrity checks, responses are only eligible for such checks if they are same-origin or are the result of explicit access granted to the loading origin via CORS. [[!CORS]]
As noted in RFC6454, section 4, some user agents use
globally unique identifiers for each file URI. This means that
resources accessed over a file
scheme URL are unlikely to be
eligible for integrity checks.
One should note that being a secure document (e.g., a document delivered over HTTPS) is not necessary for the use of integrity validation. Because resource integrity is only an application level security tool, and it does not change the security state of the user agent, a secure document is unnecessary. However, if integrity is used in something other than a secure document (e.g., a document delivered over HTTP), authors should be aware that the integrity provides no security guarantees at all. For this reason, authors should only deliver integrity metadata on a potentially secure origin. See Non-secure contexts remain non-secure for more discussion.
The following algorithm details these restrictions:
CORS
,
return true
.true
.false
.Step 2 returns true
if the fetch was a CORS-enabled request. If the
fetch failed the CORS checks, it won’t be available to us for integrity
checking because it won’t have loaded successfully.
This algorithm accepts a string, and returns either no metadata
, or a set of
valid hash expressions whose hash functions are understood by
the user agent.
no metadata
if result is empty, otherwise return
result.getPrioritizedHashFunction(currentAlgorithm, newAlgorithm)
is the empty string, add item to result. If the
result is newAlgorithm, set strongest to
item, set result to the empty set, and add
item to result.no metadata
, return true
.false
.true
.false
.This algorithm allows the user agent to accept multiple, valid strong hash
functions. For example, a developer might write a script
element such as:
<script src="https://example.com/example-framework.js"
integrity="sha384-Li9vy3DqF8tnTXuiaAJuML3ky+er10rcgNR/VqsVpcw+ThHmYcwiB1pbOxEbzJr7
sha384-+/M6kredJcxdsqkczBUjMLvqyHb1K/JThDXWsBVxMEeZHEaMKEOEct339VItX1zB"
crossorigin="anonymous"></script>
which would allow the user agent to accept two different content payloads, one of which matches the first SHA384 hash value and the other matches the second SHA384 hash value.
User agents may allow users to modify the result of this algorithm via user preferences, bookmarklets, third-party additions to the user agent, and other such mechanisms. For example, redirects generated by an extension like HTTPSEverywhere could load and execute correctly, even if the HTTPS version of a resource differs from the HTTP version.
This algorithm returns false
if the response is not eligible for integrity
validation since Subresource Integrity requires CORS, and it is a logical error
to attempt to use it without CORS. Additionally, user agents SHOULD report a
warning message to the developer console to explain this failure.
The Fetch specification should contain the following modifications in order to enable the rest of this specification’s work [[!FETCH]]:
The following text should be added to section 2.1.4: “A request has an associated integrity metadata. Unless stated otherwise, a request’s integrity metadata is the empty string.”
Perform the following step between steps 10 and 11 in the “main fetch” algorithm:
Add the following to the Request class definition:
Add the following attribute to the Request
class after the
redirect
attribute as follows:
readonly attribute DOMString integrity;
Add the following member to the end of the RequestInit
dictionary:
DOMString integrity = "";
In step 4 of the constructor, modify the end of the step to read, “and integrity is request’s integrity.”
Add the following to the list of descriptions after the constructor:
“The integrity
attribute’s getter must return
request’s integrity.”
A variety of HTML elements result in requests for resources that are to be
embedded into the document, or executed in its context. To support integrity
metadata for some of these elements, a new integrity
attribute is added to
the list of content attributes for the link
and script
elements.
A corresponding integrity
IDL attribute which reflects the
value each element’s integrity
content attribute is added to the
HTMLLinkElement
and HTMLScriptElement
interfaces.
A future revision of this specification is likely to include integrity support
for all possible subresources, i.e., a
, audio
, embed
, iframe
, img
,
link
, object
, script
, source
, track
, and video
elements.
integrity
attributeThe integrity
attribute represents integrity metadata for an element.
The value of the attribute MUST be either the empty string, or at least one
valid metadata as described by the following ABNF grammar:
integrity-metadata = *WSP hash-with-options *( 1*WSP hash-with-options ) *WSP / *WSP
hash-with-options = hash-expression *("?" option-expression)
option-expression = *VCHAR
hash-algo = <hash-algo production from [Content Security Policy Level 2, section 4.2]>
base64-value = <base64-value production from [Content Security Policy Level 2, section 4.2]>
hash-expression = hash-algo "-" base64-value
The integrity
IDL attribute must reflect the integrity
content attribute.
option-expression
s are associated on a per hash-expression
basis and are
applied only to the hash-expression
that immediately precedes it.
In order for user agents to remain fully forwards compatible with future
options, the user agent MUST ignore all unrecognized option-expression
s.
Note that while the option-expression
has been reserved in the syntax, no
options have been defined. It is likely that a future version of the spec will
define a more specific syntax for options, so it is defined here as broadly
as possible.
integrity
attributeintegrity
attributeThe user agent MUST refuse to render or execute responses that fail an integrity check and MUST return a network error, as described in Modifications to Fetch.
On a failed integrity check, an error
event is thrown. Developers
wishing to provide a canonical fallback resource (e.g., a resource not served
from a CDN, perhaps from a secondary, trusted, but slower source) can catch this
error
event and provide an appropriate handler to replace the
failed resource with a different one.
link
element for stylesheetsWhenever a user agent attempts to obtain a resource pointed to by a
link
element that has a rel
attribute with the keyword of stylesheet
,
modify step 4 to read:
Do a potentially CORS-enabled fetch of the resulting absolute URL, with the
mode being the current state of the element’s crossorigin content attribute,
the origin being the origin of the link element’s Document, the default origin
behaviour set to taint, and the integrity metadata of the request to the
value of the element’s integrity
attribute.
script
elementReplace step 14.1 of HTML5’s “prepare a script” algorithm with:
src
attribute and
the request’s associated integrity metadata be the value of the element’s
integrity
attribute.Optimizing proxies and other intermediate servers which modify the responses MUST ensure that the digest associated with those responses stays in sync with the new content. One option is to ensure that the integrity metadata associated with resources is updated. Another would be simply to deliver only the canonical version of resources for which a page author has requested integrity verification.
To help inform intermediate servers, those serving the resources SHOULD
send along with the resource a Cache-Control
header
with a value of no-transform
.
Integrity metadata delivered by a context that is not a secure context, such as an HTTP page, only protects an origin against a compromise of the server where an external resources is hosted. Network attackers can alter the digest in-flight (or remove it entirely, or do absolutely anything else to the document), just as they could alter the response the hash is meant to validate. Thus, authors SHOULD deliver integrity metadata only to a secure document. See also securing the web.
Similarly, since integrity checks do not provide any privacy guarantees, Integrity metadata MUST NOT affect the return values of the Mixed Content algorithms as defined in section 5 of the Mixed Content specification.
Digests are only as strong as the hash function used to generate them. User agents SHOULD refuse to support known-weak hashing functions like MD5 or SHA-1, and SHOULD restrict supported hashing functions to those known to be collision-resistant. At the time of writing, SHA-384 is a good baseline. Moreover, user agents SHOULD re-evaluate their supported hash functions on a regular basis, and deprecate support for those functions shown to be insecure.
Attackers can determine whether some cross-origin resource has certain content by attempting to load it with a known digest, and watching for load failures. If the load fails, the attacker can surmise that the response didn’t match the hash, and thereby gain some insight into its contents. This might reveal, for example, whether or not a user is logged into a particular service.
Moreover, attackers can brute-force specific values in an otherwise static resource: consider a JSON response that looks like this:
{'status': 'authenticated', 'username': 'admin'}
An attacker can precompute hashes for the response with a variety of common usernames, and specify those hashes while repeatedly attempting to load the document.
Much of the content here is inspired heavily by Gervase Markham’s Link Fingerprints concept, as well as WHATWG’s Link Hashes.
A special thanks to Mike West of Google, Inc. for his invaluable contributions to the initial version of this spec. Additonally, Brad Hill, Anne van Kesteren, Jonathan Kingston, Mark Nottingham, Dan Veditz, Eduardo Vela, Tanvi Vyas, and Michal Zalewski provided invaluable feedback.