Entry Point Regulation

Editor’s Draft,

This version:
https://w3c.github.io/webappsec/specs/epr/
Feedback:
public-webappsec@w3.org with subject line “[EPR] … message topic …” (archives)
Issue Tracking:
Inline In Spec
Editors:
(Google Inc.)
(Google Inc.)

Abstract

Entry Point Regulation aims to mitigate the risk of reflected cross-site scripting (XSS), cross-site script inclusion (XSSI), and cross-site request forgery (CSRF) attacks by demarcating the areas of an application which are intended to be externally referencable. A specified policy is applied on external requests for all non-demarcated resources.

Status of this document

This is a public copy of the editors’ draft. It is provided for discussion only and may change at any moment. Its publication here does not imply endorsement of its contents by W3C. Don’t cite this document other than as work in progress.

Changes to this document may be tracked at https://github.com/w3c/webappsec.

The (archived) public mailing list public-webappsec@w3.org (see instructions) is preferred for discussion of this specification. When sending e-mail, please put the text “EPR” in the subject, preferably like this: “[EPR] …summary of comment…

This document was produced by the Web Application Security Working Group.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 August 2014 W3C Process Document.

Table of Contents

1. Introduction

This section is not normative.

Entry Point Regulation intends to provide defense-in-depth against reflected cross-site scripting and other content injection (XSS), cross-site script inclusion (XSSI), and cross-site request forgery (CSRF) attacks.

These attacks all rely on the fundamentally porous nature of the web: any addressable portion of an application can be requested by any third-party, with arbitrary query parameters and fragment identifiers. The user agent will happily issue such requests with all the authority granted to the user, which can result in a number of problems.

If an author can limit incoming traffic to a strict set of well-audited entry points, web applications can reduce the risk these attacks present, and indeed some authors have taken steps to do so via server-side logic, single page application (SPA) frameworks, (and, soon, via Service Workers). Server-side techniques can be an effective solution, but have a number of drawbacks. Complexity to the side, they are prone to false-positive restrictions in cases where a user’s intent should override the author’s intent (bookmarked links, for instance).

This document defines a browser-enforced mechanism which can be layered on top of an existing application without server-side modifications, providing the attack mitigation authors desire, while allowing user intent to trump brittle filters when possible.

1.1. Goals

The threat model EPR operates under assumes that the user is authenticated to various web applications and origins within a single browser, and that the user browses to web content that may be malicious. Web content can freely make authenticated cross-origin requests, enabling XSS, CSRF, and XSSI attacks. While CSP has shown to be an effective approach to addressing XSS, the protection CSP provides is only as good as the policies web applications are able to put into place. In many cases enforcing optimally secure CSP policies has proven difficult, for example when web content needs to leverage JavaScript libraries requiring eval(). EPR provides a defense-in-depth option for authors to mitigate XSS while also providing a new opportunity to mitigate CSRF and XSSI attacks.

After an author implements EPR for an origin, the following statements ought to hold:

  1. Authors should be able to block incoming requests based on the URL being requested. That is, a request for / might be allowed, while a request for /api/logout.cgi might be denied.

    Modifications might include stripping query and fragment data from the request’s URL, or stripping cookies and other authentication information.

  2. Authors should be able to block or modify incoming requests that contain data (e.g. the URL might have query or fragment data, or the request might contain a body) differently from requests that do not contain data.

  3. Authors should be able to block or modify incoming requests based on the request’s context; that is, navigations create different attack surfaces than subresource inclusions, and should be treated differently.

  4. Requests should be excluded from the above filters if the request originates from a same-origin source. That is, any page on https://example.com/ may request any resource on that origin, while requests from https://not-example.com/ would be restricted.

  5. A user agent may choose to exclude other requests from the above filters in order to prioritize a user’s intent. For instance, URLs typed directly into the address bar or bookmarked URLs might skip filters entirely.

1.2. Examples

The developer of a web-based "Internet of Things" administration console would like to have a high degree of assurance that XSS and CSRF attacks will not affect users. If such an attack were to occur it could allow attackers to turn users' home appliances on and off at will, or perform other actions with serious consequences. Because of the pervasive nature of XSS and CSRF vulnerabilities, the developer had been considering creating only monolithic desktop and mobile applications as opposed to utilizing the web platform. While this would allow them to sidestep the security concerns inherent with the web platform, it is clearly not ideal for users. Implementation of CSP seemed to present a solution, however a fully restrictive policy is not possible due to library compatibility requirements.

The developer decides to implement Entry Point Regulation. They create a manifest specifying a default policy to block external requests by default. The developer lists out each entry point path in their manifest. Testing is first performed in report-only mode, and the blocking behavior is only enabled once the developer is comfortable with the behavior of EPR.

2. Key Concepts and Terminology

EPR categorizes requests as follows:

  1. navigational request if its context frame type is one of "top-level", "auxiliary", or "nested". Navigational requests load a resource into a context where markup will be rendered, meaning that they place an origin at risk of both XSS and CSRF attack.
  2. subresource request if it is not a connection request, and its context frame type is "none". Subresource requests cannot execute code directly, so the XSS risk is minimal, but they do present the risk of CSRF and XSSI.
  3. connection request if its context is one of "beacon", "cspreport", "eventsource", "fetch", "ping", or "xmlhttprequest". These connection types are distinguished from subresource requests only because of their flexibility (POST vs GET, etc) and their typical usage (API endpoints vs static resources). The risks are similar, but authors may wish to set different rules for these kinds of requests than they would for other subresource requests.

In the interest of keeping manifest creation simple, we should consider merging subresource and connection requests into a single category. Navigations are susceptable to XSS, whereas this is not a concern for subresource and connection requests. If there isn’t a similar very specific distinction between attacks that would involve subresource and connection requests than we should merge them.

It could make sense to split out IMAGE SRC, SCRIPT SRC, etc. requests. It should be very easy for a manifest author to tag individual rules in the manifest so that images would be available to IMG tags on a different origin, but not SCRIPT tags. If we can identify a very specific attack scenario where this is useful then it makes sense to do this.

An EPR store is an opaque storage mechanism which offers a user agent the ability to save, retrieve, and modify EPR manifests on a per-origin basis. The implementation is vendor-specific, and the interface provided is not exposed to the web.

The Augmented Backus-Naur Form (ABNF) notation is specified in [RFC5234].

3. Framework

In a nutshell:

  1. UA requests a resource from example.com for the very first time.
  2. example.com responds with a document that has an EPR header, which tells the UA that it should regulate entry points for the origin.

    As no EPR manifest, and therefore no policy, is available for this request, a default EPR policy will apply as described in §4.2 Default EPR policy .

    Somewhere in Fetch after we have the headers, we’ll call out to §4.3 Process response’s EPR header to take whatever actions we need to take here. This means we’ll grab an EPR manifest file, and store it persistently for use in regulating future requests.

  3. Subsequent navigations and resource requests from example.com will run through §4.1 Process request to determine whether they match the ruleset defined in the EPR manifest we processed above.

    If they don’t match a ruleset, the user agent will take some action, as described in §3.2.3 Behaviors.

  4. That’s it!

3.1. The EPR HTTP Response Header Field

Servers may request the protections outlined in this document by sending an EPR HTTP response header field along with a response. This request is represented by the following ABNF:

"EPR:" *WSP "1" *WSP

User agent conformance details upon receipt of such a header are explained in §4.3 Process response’s EPR header .

3.2. Entry Point Manifests

An EPR manifest is a JSON file containing entry point regulation policy data for an origin.

Servers which opt-in to EPR protections via the EPR header MUST make a manifest file available via [MANIFEST]. EPR rules are included in a manifest via the epr_manifest attribute.

"epr_manifest" attribute inconsistent (?) with "epr" member as described below.

{
  ...,
  "epr": {
    "reportURL": "https://example.com/reporting-endpoint",
    "redirectURL": "https://example.com/",
    "navigationBehavior": "allowStrippedGET",
    "subresourceBehavior": "allowStrippedGET",
    "rules": [
      {
        "path": "/",
        "types": [ "navigational" ],
        "allowData": false
      },
      {
        "regex": "^/\\d+$",
        "types": [ "navigational" ],
        "allowData": false
      },
      ...
      {
        "path": "/image",
        "types": [ "subresource" ],
        "allowData": true
      },
    ]
  }
}

It isn’t clear that the EPR manifest ought to be part of an application manifest as defined in [MANIFEST]. We’ve lumped it in there at the moment because it seems worth trying out, but it’s not clear that the concepts (though similar) mesh as well as they need to.

3.2.1. The epr manifest member

The policy data that makes up the EPR manifest is delivered via an epr member of an application manifest [MANIFEST]. This member’s value is an dictionary adhering to the following IDL:

enum EPRBehavior {
  "allow",
  "block",
  "redirect",
  "omitCredentials",
  "allowStrippedGET"
};

dictionary EPRPolicy {
  USVString? reportURL;
  USVString? redirectURL;
  EPRBehavior navigationBehavior = "allowStrippedGET";
  EPRBehavior subresourceBehavior = "allowStrippedGET";
  sequence<EPRRule> rules;
};
reportURL, of type USVString, nullable
A URL to which violation reports will be sent. See §4.5 Report request as an entrypoint violation for user agent conformance requirements.

Note: Authors may use the allow behavior to simulate a "report only" mode that does not actually modify incoming requests but does send reports back to the report URL.

redirectURL, of type USVString, nullable
A URL to redirect to when using the redirect behavior.
navigationBehavior, of type EPRBehavior, defaulting to "allowStrippedGET"
If a navigational request doesn’t match rules, this property defines the action the user agent will take. Detailed conformance requirements can be found in §4.1 Process request , and a high-level description of the behaviors can be found in §3.2.3 Behaviors.
subresourceBehavior, of type EPRBehavior, defaulting to "allowStrippedGET"
If a non-navigational request doesn’t match rules, this property defines the action the user agent will take. Detailed conformance requirements can be found in §4.1 Process request , and a high-level description of the behaviors can be found in §3.2.3 Behaviors.
rules, of type sequence<EPRRule>
The ruleset which should be applied. Details are in §3.2.2 Ruleset.

3.2.2. Ruleset

EPR manifests define a set of rules for a site, governing a user agent’s fetching behavior for requests made to that site’s origin. Each rule is scoped to a specific subset of an origin’s URLs via a path prefix or a regular expression. Incoming requests which do not match the ruleset (as defined in §4.4 Does request match rule? ) will be dealt with as defined in navigationBehavior or subresourceBehavior, as appropriate.

The following IDL defines rules' syntax:

enum EPRRequestType {
  "connection", "navigational", "subresource"
};

dictionary EPRRule {
  USVString? path;
  USVString? regex;
  sequence<EPRRequestType> types;
  boolean allowData;
};
path, of type USVString, nullable
A path prefix defining a rule’s scope. See §4.4 Does request match rule? for user agent conformance requirements. One path or regex may be specified for a given rule, but not both.
regex, of type USVString, nullable
A regular expression, defining a rule’s scope. See §4.4 Does request match rule? for user agent conformance requirements. One path or regex may be specified for a given rule, but not both.
types, of type sequence<EPRRequestType>
A set of request types to which this rule applies: the values MUST be one or more of "navigational" (which encompasses navigational requests), "subresource" (subresource requests), or "connection" (connection requests). See §4.4 Does request match rule? for user agent conformance requirements.
allowData, of type boolean
If true, then matching requests' URLs are allowed to contain non-empty query and fragment properties, and requests' body may be non-null.

See §4.1 Process request for user agent conformance requirements.

3.2.3. Behaviors

If a request does not match the ruleset defined in an EPR manifest’s rules property, then the user agent looks to either navigationBehavior or subresourceBehavior to determine what action to take.

The following behaviors are defined (and, if none is explicitly specified, then allowStrippedGET is used as a default):

allow
Allow the request without modification. This behavior may be used to put the user agent in a "report only" mode, where violations are reported (as described in §4.5 Report request as an entrypoint violation ), but requests proceed without modification.
block
Cancel the request entirely, returning a network error.
redirect
Redirect the request to a specified URL.
omitCredentials
Drop cookies and other authentication properties of the request by setting it’s credentials mode to "omit".
allowStrippedGET
Allow GET requests, after setting its url’s fragment and query properties to null.

POST and other request types will be canceled, returning a network error.

User agent conformance requirements are defined in §4.1 Process request .

3.2.4. Caching

EPR manifest files are cached as per standard policy served in HTTP headers. Manifest files are removed if the user clears their browser cache, as is any persistent indication that EPR has been enabled by the site (as may have been indicated by a HTTP response header). When a manifest file expires from the cache, the user agent should attempt to download the manifest file again when possible. At minimum this should occur on the next request to the EPR-enabled site.

4. Processing Algorithms

4.1. Process request

  1. Let policy be the policy retrieved from a user agent’s EPR store for request’s URL’s origin.
  2. Let rules be the set of rules contained in policy’s rules property.

    Note: rules may be the empty set if no rules are specified. In this case, the behavior specified in the policy’s navigationBehavior or subresourceBehavior will be applied to all incoming requests.

  3. Let matched be false.
  4. For each rule in rules, if request matches rule:
    1. Set matched to true.
    2. Skip the remaining rules in rules.
  5. If matched is true, return without modifying request.
  6. Otherwise, let behavior be the value of policy’s navigationBehavior if request is a navigational request, and subresourceBehavior otherwise.

    Do we need a connectionBehavior property?

  7. Execute the steps associated with the value of behavior in the list below:
    allow
    1. Return without modifying the request.
    block
    1. Cancel the request, and return a network error.
    redirect
    Do not make the request to the original resource. Redirect the user agent to the redirectURL.
    omitCredentials
    1. Set request’s credentials mode property to omit.
    2. Set request’s URL’s username to the empty string, and password to null.
    allowStrippedGET
    1. If request’s method is not GET, cancel the request, and return a network error.
    2. Set request’s URL’s fragment and query parameters to null.
  8. Follow steps in §4.5 Report request as an entrypoint violation .
  9. Given matched is false, the user agent should initiate a new background manifest download. It is possible that a policy author might make a mistake and deploy a policy that inappropriately blocks access to resources. So initiating a new download when a policy action is applied prevents broken manifests from persisting in the cache. The user agent may choose to implement heuristics so as to avoid excessive manifest download attempts. For example, by never attempting to re-download a manifest more than once an hour.

4.2. Default EPR policy

A default policy MUST be applied when all of the following criteria are met:

The default EPR policy specifies the allowStrippedGET behavior is applied to requests, preventing requests from containing data that would enable reflected or DOM-based XSS.

Allow data on everything under a specific hardcoded path, in order to facilitate URLs sent in e-mail, etc.?

The intent of the default EPR policy is to mitigate XSS (not CSRF) when no EPR policy is available yet. Even when the user is not authenticated to a site, XSS is problematic because the attack may persist until the user has authenticated. This is not the case with CSRF, and CSRF is not effective until the user has authenticated to a site, at which point it is much more likely that a policy has been downloaded.

Note: The original proposal.

4.3. Process response’s EPR header

Given a response (response), this algorithm parses its header list to extract an EPR header field. If such a field is present as EPR: 1, the user agent MUST fetch and process an EPR manifest from response’s origin unless one or more of the following statements is true:

  1. response’s request’s context is manifest
  2. A manifest for this origin is already cached at the user agent.
  3. There is already a pending manifest request for the origin.

Once EPR has been enabled for an origin due to the presence of EPR: 1 on a response, EPR is effectively enabled for all resources from this origin, persistently, even if these resources do not specify an EPR header. To disable EPR, an origin must send an EPR: 0 header. The EPR: 0 header is also persistent.

To process response response, execute the following steps:

  1. If response’s URL is a priori insecure, abort these steps.
  2. If response’s header list contains a header named EPR, then:
    1. Let manifest URL be the manifest URL provided by [MANIFEST].
    2. Let request be a request whose method is GET, URL is manifest URL, context frame type is none, context is manifest, and credentials mode is omit.
    3. Fetch request.
    4. To process response for the response manifest response:
      1. Store the manifest in the user agent’s EPR Store, keyed to response’s URL’s origin.

4.4. Does request match rule?

A request (request) is said to match a rule (rule) if the following algorithm returns Matches:

  1. If request is a connection request, and "connection" is not contained in rule’s types list, return Does Not Match.
  2. If request is a navigational request, and "navigational" is not contained in rule’s types list, return Does Not Match.
  3. If request is a subresource request, and "subresource" is not contained in rule’s types list, return Does Not Match.
  4. If rule has a path property whose value is neither null nor undefined:
    1. Let rule path be rule’s path.
    2. Let exact match be false if the final character of rule path is the U+002F SOLIDUS character (/), and true otherwise.
    3. Let rule path list be the result of splitting rule path on the U+002F SOLIDUS character (/).
    4. If rule path list’s length is greater than url path list’s length, return Does Not Match.
    5. For each entry in rule path list:
      1. Percent decode entry.
      2. Percent decode the first item in url path list.
      3. If entry is not an ASCII case-insensitive match for the first item in url path list, return Does Not Match
      4. Pop the first item in url path list off the list.
    6. If exact match is true, and url path list is not empty, return Does Not Match
  5. If rule has a regex property whose value is neither null nor undefined:
    1. Let rule regex be rule’s regex.
    2. Let url path be the empty string, and for each component in request’s URL’s path:
      1. Append the U+002F SOLIDUS character (/) to url path.
      2. Append component to url path.
    3. If url path does not regex match (TODO) rule regex, return Does Not Match.

      Need to have spec language for this. There’s surely a regex spec somewhere, right? DR: Can we just reference the ECMAScript spec?

  6. If rule’s allowData is false, then return Does Not Match if any of the following statements are true:
    1. request’s URL’s fragment property is not null.
    2. request’s URL’s query property is not null.
    3. request’s body property is not null.
  7. Return Matches.

4.5. Report request as an entrypoint violation

We need to define violation reports. Steal something from CSP.

Proposed format:

{ "epr-report": { "policy-fetch-time": Thu Apr 16 2015 14:23:46 GMT-0700 (PDT) "affected-uri": "http://example.org/page.html", "referrer": "http://evil.example.com/", "type": "navigational", "applied-behavior": "allowStrippedGET", "redirectedTo": "" } }

5. IANA Considerations

5.1. The EPR HTTP Request Header Field

The permanent message header field registry should be updated with the following registration [RFC3864]:
Header field name
EPR
Applicable protocol
http
Status
standard
Author/Change controller
W3C
Specification document
This specification (See §3.1 The EPR HTTP Response Header Field)

6. Acknowledgements

Entry point regulation is an implementation of concepts introduced by Charlie Reis et al. in section 5 of [ISOLATION].

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words "for example" or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word "Note" and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Conformant Algorithms

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps can be implemented in any manner, so long as the end result is equivalent. In particular, the algorithms defined in this specification are intended to be easy to understand and are not intended to be performant. Implementers are encouraged to optimize.

Conformance Classes

A conformant user agent must implement all the requirements listed in this specification that are applicable to user agents.

A conformant server must implement all the requirements listed in this specification that are applicable to servers.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[FETCH]
Anne van Kesteren. Fetch. Living Standard. URL: https://fetch.spec.whatwg.org/
[MANIFEST]
Marcos Caceres; et al. Manifest for a web application. WD. URL: https://w3c.github.io/manifest/
[MIX]
Mike West. Mixed Content. LCWD. URL: https://w3c.github.io/webappsec/specs/mixedcontent/
[RFC3864]
Graham Klyne; Mark Nottingham; Jeffrey C. Mogul. Registration Procedures for Message Header Fields. RFC. URL: http://www.ietf.org/rfc/rfc3864.txt
[RFC6454]
Adam Barth. The Web Origin Concept. RFC. URL: http://www.ietf.org/rfc/rfc6454.txt
[URL]
Anne van Kesteren; Sam Ruby. URL. WD. URL: http://www.w3.org/TR/url
[WEBIDL2]
Cameron McCormack; Boris Zbarsky. Web IDL (Second Edition). ED. URL: https://heycam.github.io/webidl/
[HTML5]
Robin Berjon; et al. HTML5. 28 October 2014. REC. URL: http://www.w3.org/TR/html5/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: http://www.ietf.org/rfc/rfc2119.txt
[RFC5234]
D. Crocker, Ed.; P. Overell. Augmented BNF for Syntax Specifications: ABNF. January 2008. Internet Standard. URL: http://www.ietf.org/rfc/rfc5234.txt

Informative References

[ISOLATION]
Eric Y. Chen; et al. App Isolation: Get the Security of Multiple Browsers with Just One. URL: http://www.collinjackson.com/research/papers/appisolation.pdf

IDL Index

enum EPRBehavior {
  "allow",
  "block",
  "redirect",
  "omitCredentials",
  "allowStrippedGET"
};

dictionary EPRPolicy {
  USVString? reportURL;
  USVString? redirectURL;
  EPRBehavior navigationBehavior = "allowStrippedGET";
  EPRBehavior subresourceBehavior = "allowStrippedGET";
  sequence<EPRRule> rules;
};

enum EPRRequestType {
  "connection", "navigational", "subresource"
};

dictionary EPRRule {
  USVString? path;
  USVString? regex;
  sequence<EPRRequestType> types;
  boolean allowData;
};

Issues Index

In the interest of keeping manifest creation simple, we should consider merging subresource and connection requests into a single category. Navigations are susceptable to XSS, whereas this is not a concern for subresource and connection requests. If there isn’t a similar very specific distinction between attacks that would involve subresource and connection requests than we should merge them.
It could make sense to split out IMAGE SRC, SCRIPT SRC, etc. requests. It should be very easy for a manifest author to tag individual rules in the manifest so that images would be available to IMG tags on a different origin, but not SCRIPT tags. If we can identify a very specific attack scenario where this is useful then it makes sense to do this.
"epr_manifest" attribute inconsistent (?) with "epr" member as described below.
It isn’t clear that the EPR manifest ought to be part of an application manifest as defined in [MANIFEST]. We’ve lumped it in there at the moment because it seems worth trying out, but it’s not clear that the concepts (though similar) mesh as well as they need to.
Do we need a connectionBehavior property?
Allow data on everything under a specific hardcoded path, in order to facilitate URLs sent in e-mail, etc.?
Need to have spec language for this. There’s surely a regex spec somewhere, right? DR: Can we just reference the ECMAScript spec?
We need to define violation reports. Steal something from CSP.