This specification defines a mechanism by which user agents may verify that a fetched resource has been delivered without unexpected manipulation.

A list of changes to this document may be found at https://github.com/w3c/webappsec.

Introduction

Sites and applications on the web are rarely composed of resources from only a single origin. For example, authors pull scripts and styles from a wide variety of services and content delivery networks, and must trust that the delivered representation is, in fact, what they expected to load. If an attacker can trick a user into downloading content from a hostile server (via DNS poisoning, or other such means), the author has no recourse. Likewise, an attacker who can replace the file on the CDN server has the ability to inject arbitrary content.

Delivering resources over a secure channel mitigates some of this risk: with TLS, HSTS, and pinned public keys, a user agent can be fairly certain that it is indeed speaking with the server it believes it’s talking to. These mechanisms, however, authenticate only the server, not the content. An attacker (or admin!) with access to the server can manipulate content with impunity. Ideally, authors would not only be able to pin the keys of a server, but also pin the content, ensuring that an exact representation of a resource, and only that representation, loads and executes.

This document specifies such a validation scheme, extending several HTML elements with an integrity attribute that contains a cryptographic hash of the representation of the resource the author expects to load. For instance, an author may wish to load jQuery from a shared server rather than hosting it on their own origin. Specifying that the expected SHA-256 hash of https://code.jquery.com/jquery-1.10.2.min.js is C6CB9UYIS9UJeqinPHWTHVqh_E1uhG5Twh-Y5qFQmYg means that the user agent can verify that the data it loads from that URL matches that expected hash before executing the JavaScript it contains. This integrity verification significantly reduces the risk that an attacker can substitute malicious content.

This example can be communicated to a user agent by adding the hash to a script element, like so:

<script src="https://code.jquery.com/jquery-1.10.2.min.js"
        integrity="ni:///sha-256;C6CB9UYIS9UJeqinPHWTHVqh_E1uhG5Twh-Y5qFQmYg?ct=application/javascript">

Scripts, of course, are not the only resource type which would benefit from integrity validation. The scheme specified here applies to all HTML elements which trigger fetches, as well as to fetches triggered from CSS and JavaScript.

Moreover, integrity metadata may also be useful for purposes other than validation. User agents may decide to use the integrity metadata as an identifier in a local cache, for instance, meaning that common resources (for example, JavaScript libraries) could be cached and retrieved once, regardless of the URL from which they are loaded.

Goals

  1. Compromise of the third-party service should not automatically mean compromise of every site which includes its scripts. Content authors will have a mechanism by which they can specify expectations for content they load, meaning for example that they could load a specific script, and not any script that happens to have a particular URL.

  2. The verification mechanism should have reporting functionality which would inform the author that an invalid resource was downloaded. Further it should be possible for an author to choose to run only the reporting functionality, allowing potentially corrupt resources to run on her site, but flagging violations for manual review.

  3. The metadata provided for verification may enable improvements to user agents’ caching schemes: common resources such as JavaScript libraries can be downloaded once, and only once, even if multiple instances with distinct URLs are requested.

Use Cases/Examples

Resource Integrity

  • An author wishes to use a content delivery network to improve performance for her globally-distributed users. She wishes to ensure, however, that the CDN’s servers deliver only the code she expects them to deliver. She can mitigate the risk that CDN compromise (or unexpectedly malicious behavior) would change her code in unfortunate ways by adding integrity metadata to the script element included on her page:

    <script src="https://site53.cdn.net/include.js"
            integrity="ni:///sha-256;SDfwewFAE...wefjijfE?ct=application/javascript"></script>
    
  • An author wants to include JavaScript provided by a third-party analytics service on her site. She wants, however, to ensure that only the code she’s carefully reviewed is executed. She can do so by generating integrity metadata for the script she’s planning on including, and adding it to the script element she includes on her page:

    <script src="https://analytics-r-us.com/v1.0/include.js"
            integrity="ni:///sha-256;SDfwewFAE...wefjijfE?ct=application/javascript"></script>
    
  • A user agent wishes to ensure that pieces of its UI which are rendered via HTML (for example, Chrome’s New Tab Page) aren’t manipulated before display. Integrity metadata mitigates the risk that altered JavaScript will run in these page’s high-privilege context.

  • The author of a mash-up wants to make sure her creation remains in a working state. Adding integrity metadata to external subresources defines an expected revision of the included files. The author can then use the reporting functionality to be notified of changes to the included resources.

Conformance requirements phrased as algorithms or specific steps can be implemented in any manner, so long as the end result is equivalent. In particular, the algorithms defined in this specification are intended to be easy to understand and are not intended to be performant. Implementers are encouraged to optimize.

Key Concepts and Terminology

This section defines several terms used throughout the document.

The term digest refers to the base64url-encoded (with any trailing U+003D EQUALS SIGN (=) characters removed) result of executing a cryptographic hash function on an arbitrary block of data.

A secure channel is any communication mechanism that the user agent has defined as “secure” (typically limited to HTTP over Transport Layer Security (TLS) [[!RFC2818]]).

An insecure channel is any communication mechanism other than those the user agent has defined as “secure”.

Clarification needed whether we want to talk about (in)secure channels or (un)authenticated origins. This is Github issue 71 (freddyb). {:.issue data-number=”71”}

The term origin is defined in the Origin specification. [[!RFC6454]]

The MIME type of a resource is a technical hint about the use and format of that resource. [[!MIMETYPE]]

The message body and the transfer encoding of a resource are defined by RFC7230, section 3. [[!RFC7230]]

The representation data and content encoding of a resource are defined by RFC7231, section 3. [[!RFC7231]]

A base64url encoding is defined in RFC 4648, section 5. In a nutshell, it replaces the characters U+002B PLUS SIGN (+) and U+002F SOLIDUS (/) characters in normal base64 encoding with the U+002D HYPHEN-MINUS (-) and U+005F LOW LINE (_) characters, respectively. [[!RFC4648]]

The Augmented Backus-Naur Form (ABNF) notation used in this document is specified in RFC 5234. [[!ABNF]]

The SHA-256, SHA-384, and SHA-512 are part of the SHA-2 set of cryptographic hash functions defined by the NIST in “Descriptions of SHA-256, SHA-384, and SHA-512”.

Framework

The integrity verification mechanism specified here boils down to the process of generating a sufficiently strong cryptographic digest for a resource, and transmitting that digest to a user agent so that it may be used when fetching the resource.

Integrity metadata

To verify the integrity of a resource, a user agent requires integrity metadata, which consists of the following pieces of information:

The hash function and digest MUST be provided in order to validate a resource’s integrity. The MIME type SHOULD be provided, as it mitigates the risk of certain attack vectors.

This metadata MUST be encoded as a “named information” (ni) URI, as defined in RFC6920. [[!RFC6920]]

For example, given a resource containing only the string “Hello, world.”, an author might choose SHA-256 as a hash function. -MO_YqmqPm_BYZwlDkir51GTc9Pt9BvmLrXcRRma8u8 is the base64url-encoded digest that results. This can be encoded as an ni URI as follows:

ni:///sha-256;-MO_YqmqPm_BYZwlDkir51GTc9Pt9BvmLrXcRRma8u8

Or, if the author further wishes to specify the Content Type (text/plain):

ni:///sha-256;-MO_YqmqPm_BYZwlDkir51GTc9Pt9BvmLrXcRRma8u8?ct=text/plain

Digests may be generated using any number of utilities. OpenSSL, for example, is quite commonly available. The example in this section is the result of the following command line:

echo -n "Hello, world." | openssl dgst -sha256 -binary | openssl enc -base64 -A | sed -e 's/+/-/g' -e 's/\//_/g' -e 's/=*$//g'

Cryptographic hash functions

Conformant user agents MUST support the SHA-256, SHA-384 and SHA-512 cryptographic hash functions for use as part of a resource’s integrity metadata, and MAY support additional hash functions.

Agility

Multiple sets of integrity metadata may be associated with a single resource in order to provide agility in the face of future discoveries. For example, the “Hello, world.” resource described above may be described either of the following ni URLs:

ni:///sha-256;-MO_YqmqPm_BYZwlDkir51GTc9Pt9BvmLrXcRRma8u8?ct=application/javascript
ni:///sha-512;rQw3wx1psxXzqB8TyM3nAQlK2RcluhsNwxmcqXE2YbgoDW735o8TPmIR4uWpoxUERddvFwjgRSGw7gNPCwuvJg?ct=application/javascript

Authors may choose to specify both, for example:

<script src="hello_world.js"
   integrity="
      ni:///sha-256;-MO_YqmqPm_BYZwlDkir51GTc9Pt9BvmLrXcRRma8u8?ct=application/javascript
      ni:///sha-512;rQw3wx1psxXzqB8TyM3nAQlK2RcluhsNwxmcqXE2YbgoDW735o8TPmIR4uWpoxUERddvFwjgRSGw7gNPCwuvJg?ct=application/javascript
    "></script>

In this case, the user agent will choose the strongest hash function in the list, and use that metadata to validate the resource (as described below in the “parse metadata” and “get the strongest metadata from set” algorithms).

When a hash function is determined to be insecure, user agents MUST deprecate and eventually remove support for integrity validation using that hash function.

Validation using unsupported hash functions always fails (see the “Does resource match metadataList” algorithm below). Authors are therefore encouraged to use strong hash functions, and to begin migrating to stronger hash functions as they become available.

Priority

User agents MUST provide a mechanism of determining the relative priority of two hash functions. That is, getPrioritizedHashFunction(a, b) MUST return the hash function the user agent considers the most collision-resistant. For example, getPrioritizedHashFunction('SHA-256', 'SHA-512') would return SHA-512.

If both algorithms are equally strong, the user agent SHOULD ensure that there is a consistent ordering.

Resource verification algorithms

Apply algorithm to resource

  1. Let result be the result of applying algorithm to the representation data without any content-codings applied, except when the user agent intends to consumes the content with content-encodings applied (e.g., saving a gzip’d file to disk). In the latter case, let result be the result of applying algorithm to the representation data.
  2. Let encodedResult be result of base64url-encoding result.
  3. Strip any trailing U+003D EQUALS SIGN (=) characters from encodedResult.
  4. Return encodedResult.

Is resource eligible for integrity validation

In order to mitigate an attacker’s ability to read data cross-origin by brute-forcing values via integrity checks, resources are only eligible for such checks if they are same-origin, publicly cachable, or are the result of explicit access granted to the loading origin via CORS. [[!CORS]]

As noted in RFC6454, section 4, some user agents use globally unique identifiers for each file URI. This means that resources accessed over a file scheme URL are unlikely to be eligible for integrity checks.

Certain HTTP headers can also change the way the resource behaves in ways which integrity checking cannot account for. If the resource contains these headers, it is ineligible for integrity validation:

  • Authorization or WWW-Authenticate hide resources behind a login; such non-public resources are excluded from integrity checks.
  • Refresh can cause IFrame contents to transparently redirect to an unintended target, bypassing the integrity check.

Consider the impact of other headers: Content-Length, Content-Range, etc. Is there danger there?

The following algorithm details these restrictions:

  1. Let request be the request that fetched resource.
  2. If resource contains any of the following HTTP headers, return false:
    • Authorization
    • WWW-Authenticate
    • Refresh
  3. If the mode of request is CORS, return true.
  4. If the origin of request is resource’s origin, return true.
  5. If resource is cachable by a shared cache, as defined in [[!RFC7234]], return true.
  6. Return false.

Step 2 returns true if the resource was a CORS-enabled request. If the resource failed the CORS checks, it won’t be available to us for integrity checking because it won’t have loaded successfully.

Parse metadata.

This algorithm accepts a string, and returns either no metadata, or a set of valid “named information” (ni) URLs whose hash functions are understood by the user agent.

  1. If metadata is the empty string, return no metadata.
  2. Let result be the empty set.
  3. For each token returned by splitting metadata on spaces:
    1. If token is not a valid “named information” (ni) URI, skip the remaining steps, and proceed to the next token.
    2. Let algorithm be the alg component of token.
    3. Transform all ASCII characters to lowercase ASCII and remove the dash from the sha- prefix in algorithm if there is one.
    4. If algorithm is a hash function recognized by the user agent, add token to result.
  4. Return result.

Get the strongest metadata from set.

  1. Let strongest be the empty string.
  2. For each item in set:
    1. If strongest is the empty string, set strongest to item, skip to the next item.
    2. Let currentAlgorithm be the alg component of strongest.
    3. Let newAlgorithm be the alg component of item.
    4. If the result of getPrioritizedHashFunction(currentAlgorithm, newAlgorithm) is newAlgorithm, set strongest to item.
  3. Return strongest.

Does resource match metadataList?

  1. If resource’s URL’s scheme is about, return true.
  2. If resource is not eligible for integrity validation, return false.
  3. Let parsedMetadata be the result of parsing metadataList.
  4. If parsedMetadata is no metadata, return true.
  5. Let metadata be the result of getting the strongest metadata from parsedMetadata.
  6. Let algorithm be the alg component of metadata.
  7. Let expectedValue be the val component of metadata with any trailing U+003D EQUALS SIGN (=) removed.
  8. Let expectedType be the value of metadata’s ct query string parameter.
  9. If expectedType is not the empty string, and is not a case-insensitive match for resource’s MIME type, return false.
  10. Let actualValue be the result of applying algorithm to resource.
  11. If actualValue is a case-sensitive match for expectedValue, return true. Otherwise, return false.

If expectedType is the empty string in #10, it would be reasonable for the user agent to warn the page’s author about the dangers of MIME type confusion attacks via its developer console.

User agents may allow users to modify the result of this algorithm via user preferences, bookmarklets, third-party additions to the user agent, and other such mechanisms. For example, redirects generated by an extension like HTTPSEverywhere could load and execute correctly, even if the HTTPS version of a resource differs from the HTTP version.

Modifications to Fetch

The Fetch specification should contain the following modifications in order to enable the rest of this specification’s work [[!FETCH]]:

  1. The following text should be added to section 2.1.4: “A request has an associated integrity metadata. Unless stated otherwise, a request’s integrity metadata is the empty string.”

  2. The following text should be added to section 2.1.5: “A response has an associated integrity state, which is one of indeterminate, pending, corrupt, and intact. Unless stated otherwise, it is indeterminate.

  3. Perform the following steps before executing both the “basic fetch” and “CORS fetch with preflight” algorithms:

    1. If request’s integrity metadata is the empty string, set response’s integrity state to indeterminate. Otherwise:

      1. Set response’s integrity state to pending.
      2. Include a Cache-Control header whose value is “no-transform”.
      3. If request’s integrity metadata contains a Content Type:
        1. Set request’s Accept header value to the value of request’s integrity metadata’s Content Type.
  4. Add the following step before step #1 of the handling of 401 status codes in the HTTP fetch algorithm:

    1. If request’s integrity state is pending, set response’s integrity state to corrupt and return response.
  5. Before firing the process request end-of-file event for any request:

    1. If the request’s integrity metadata is the empty string, set the response’s integrity state to indeterminate and skip directly to firing the event.

    2. If response matches the request’s integrity metadata, set the response’s integrity state to intact and skip directly to firing the event.

    3. Set the response’s integrity state to corrupt and skip directly to firing the event.

Verification of HTML document subresources

A variety of HTML elements result in requests for resources that are to be embedded into the document, or executed in its context. To support integrity metadata for each of these, and new elements that are added in the future, a new integrity attribute is added to the list of content attributes for the link and script elements.

A corresponding integrity IDL attribute which reflects the value each element’s integrity content attribute is added to the HTMLLinkElement and HTMLScriptElement interfaces.

A future revision of this specification is likely to include SRI support for all possible subresources, i.e., a, audio, embed, iframe, img, link, object, script, source, -track, and video elements.

The integrity attribute

The integrity attribute represents integrity metadata for an element. The value of the attribute MUST be either the empty string, or at least one valid “named information” (ni) URI [[!RFC6920]], as described by the following ABNF grammar:

integrity-metadata = "" / 1*( *WSP NI-URL ) *WSP ]

The NI-URL rule is defined in RFC6920, section 3, figure 4.

The integrity IDL attribute must reflect the integrity content attribute.

Element interface extensions

HTMLLinkElement
attribute DOMString integrity
The value of this element’s integrity attribute
HTMLScriptElement
attribute DOMString integrity
The value of this element’s integrity attribute

Handling integrity violations

Documents may specify the behavior of a failed integrity check by delivering a Content Security Policy which contains an integrity-policy directive, defined by the following ABNF grammar:

directive-name  = "integrity-policy"
directive-value = 1#failure-mode [ "require-for-all" ]
failure-mode    = ( "block" / "report" )

A document’s integrity policy is the value of the integrity-policy directive, if explicitly provided as part of the document’s Content Security Policy, or block otherwise.

If the document’s integrity policy contains block, the user agent MUST refuse to render or execute resources that fail an integrity check, and MUST report a violation.

If the document’s integrity policy contains report, the user agent MAY render or execute resources that fail an integrity check, but MUST report a violation.

If the document’s integrity policy contains require-for-all, the user agent MUST treat the lack of integrity metadata for an resource as automatic failure, refuse to fetch the resource, and report a violation.

Elements

Whenever a user agent attempts to obtain a resource pointed to by a link element that has a rel attribute with the value of stylesheet and a type of text/css:

  1. Set the integrity metadata of the request to the value of the element’s integrity attribute.

Additionally, perform the following steps before firing a load event at the element:

  1. If the response’s integrity state is corrupt:
    1. If the document’s integrity policy is block:
      1. Abort the load event, and treat the resource as having failed to load.
      2. If resource is same origin with the origin of the link element’s Document, then queue a task to fire a simple event named error at the link element.
    2. Report a violation.
The script element

When executing step 5 of step 14 of HTML5’s “prepare a script” algorithm:

  1. Set the integrity metadata of the request to the value of the element’s integrity attribute.

Insert the following steps after step 5 of step 14 of HTML5’s “prepare a script” algorithm:

  1. Once the fetching algorithm has completed:
    1. If the response’s integrity state is corrupt:
      1. If the document’s integrity policy is block:
        1. If resource is same origin with the script element’s Document’s origin, then queue a task to fire a simple event named error at the element, and abort these steps.
      2. Report a violation.

Verification of CSS-loaded subresources

Tab and Anne are poking at adding fetch() to some spec somewhere which would allow CSS files to specify various arguments to the fetch algorithm while requesting resources. Detail on the proposal is at http://lists.w3.org/Archives/Public/public-webappsec/2014Jan/0129.html. Once that is specified, we can proceed defining an integrity argument that would allow integrity checks in CSS.

</section>

Proxies

Optimizing proxies and other intermediate servers which modify the content of fetched resources MUST ensure that the digest associated with those resources stays in sync with the new content. One option is to ensure that the integrity metadata associated with resources is updated along with the resource itself. Another would be simply to deliver only the canonical version of resources for which a page author has requested integrity verification. To support this latter option, user agents MUST send a Cache-Control header with a value of no-transform when requesting a resource with associated integrity metadata (see item 3 in the “Modifications to Fetch” section).

Think about how integrity checks would effect vary headers in general.

Security Considerations

Insecure channels remain insecure

Integrity metadata delivered over an insecure channel provides no security benefit. Attackers can alter the digest in-flight (or remove it entirely (or do absolutely anything else to the document)), just as they could alter the resource the hash is meant to validate. Authors who desire any sort of security whatsoever SHOULD deliver resources containing digests over secure channels.

Hash collision attacks

Digests are only as strong as the hash function used to generate them. User agents SHOULD refuse to support known-weak hashing functions like MD5 or SHA-1, and SHOULD restrict supported hashing functions to those known to be collision-resistant. At the time of writing, SHA-256 is a good baseline. Moreover, user agents SHOULD reevaluate their supported hashing functions on a regular basis, and deprecate support for those functions shown to be insecure.

Cross-origin data leakage

Attackers can determine whether some cross-origin resource has certain content by attempting to load it with a known digest, and watching for load failure. If the load fails, the attacker can surmise that the resource didn’t match the hash, and thereby gain some insight into its contents. This might reveal, for example, whether or not a user is logged into a particular service.

Moreover, attackers can brute-force specific values in an otherwise static resource: consider a JSON response that looks like this:

{'status': 'authenticated', 'username': 'Stephan Falken'}

An attacker can precompute hashes for the response with a variety of common usernames, and specify those hashes while repeatedly attempting to load the document. By examining the reported violations, the attacker can obtain a user’s username.

User agents SHOULD mitigate the risk by refusing to fire error events on elements which loaded cross-origin resources, but some side-channels will likely be difficult to avoid (image’s naturalHeight and naturalWidth for instance).

Acknowledgements

None of this is new. Much of the content here is inspired heavily by Gervase Markham’s Link Fingerprints concept, as well as WHATWG’s Link Hashes.