Network Reporting API

1. Introduction

This document extends the concepts defined in [REPORTING] to enable a class of reports which are not tied to the lifetime of any particular document. This enables network errors to be reported on, even (or especially) in cases where a document could not be loaded.

Decoupling reports from documents implies two major differences from the document-centred reporting defined in [REPORTING]: First, configuration of reporting must be done at the origin level, rather than through document response headers. Second, the reports are queued and delivered by the user agent separately from document reports.

1.1. Guarantees

This specification aims to provide a best-effort report delivery system that executes out-of-band with website activity. The user agent will be able to do a better job prioritizing and scheduling delivery of reports, as it has an overview of cross-origin activity that individual websites do not, and can deliver reports based on error conditions that would prevent a website from loading in the first place.

The delivery is not, however, guaranteed in a strict sense. We spell out a reasonable set of retry rules in the algorithms below, but it’s quite possible for a report to be dropped on the floor if things go badly.

Reporting can generate a good deal of traffic, so we allow developers to set up groups of endpoints, using a failover and load-balancing mechanism inspired by the DNS SRV record. The user agent will do its best to deliver a particular report to at most one endpoint in a group. Endpoints can be assigned weights to distribute load, with each endpoint receiving a specified fraction of reporting traffic. Endpoints can be assigned priorities, allowing developers to set up fallback collectors that are only tried when uploads to primary collectors fail.

1.2. Examples

MegaCorp Inc. wants to collect Network Error Log reports for its site. It can do so by serving an origin policy manifest with the following key, to define a set of reporting endpoints named "endpoint-1":

{
"network_reporting_endpoints": {
  "group": "endpoint-1",
  "max_age": 10886400,
  "endpoints": [
    { "url": "https://example.com/reports", "priority": 1 },
    { "url": "https://backup.com/reports", "priority": 2 }
  ] }
}

And the following headers, which direct NEL reports to that group:

NEL: { ..., "report-to": "endpoint-1" }

2. Concepts

2.1. Endpoint groups

An endpoint group is a set of network reporting endpoints that will be used together for backup and failover purposes.

Each endpoint group has a name, which is an ASCII string.

Each endpoint group has an endpoints list, which is a list of network reporting endpoints.

Each endpoint group has a subdomains flag, which is either "include" or "exclude".

Each endpoint group has a ttl representing the number of seconds the group remains valid for an origin.

Each endpoint group has a creation which is the timestamp at which the group was added to an origin.

A endpoint group is expired if its creation plus its ttl represents a time in the past.

2.2. Network reporting endpoints

A network reporting endpoint is an endpoint, which is extended with these additional attributes:

Each network reporting endpoint has a priority, which is a non-negative integer.

Each network reporting endpoint has a weight, which is a non-negative integer.

Each network reporting endpoint has a retry_after, which is either null, or a timestamp after which delivery should be retried.

An network reporting endpoint is pending if its retry_after is not null, and represents a time in the future.

2.3. Clients

A client represents a particular origin’s relationship to a set of endpoints.

Each client has an origin, which is an origin.

Each client has an endpoint-groups list, which is a list of endpoint groups, each of which MUST have a distinct name. (The algorithm in § 3.2 Process origin policy configuration guarantees this by keeping only the first entry in the configuration member with a particular name.)

2.4. Failover and load balancing

The network reporting endpoints in an endpoint group that all have the same priority form a failover class. Failover classes allow the developer to provide backup collectors (those with higher priority values) that will only receive reports if all of the primary collectors (those with lower priority values) fail.

Developers can assign each network reporting endpoint in a failover class a weight, which determines how report traffic is balanced across the failover class.

The algorithm that implements these rules is described in § 5.1 Choose an endpoint from a group.

Note: The priority and weight fields have the same semantics as the corresponding fields in a DNS SRV record.

Note: Failover and load balancing is a feature that would be generally useful outside of Reporting. Reporting delegates to the [FETCH] API to actually upload reports once an endpoint has been selected. If, in the future, the Fetch API adds native support for failover and load balancing of requests, a future version of this specification will be updated to use it instead of this bespoke mechanism.

2.5. Storage

A conformant user agent MUST provide a reporting cache, which is a storage mechanism that maintains a set of endpoint groups that websites have instructed the user agent to associate with their origins, and a set of reports which are queued for delivery.

This storage mechanism is opaque, vendor-specific, and not exposed to the web, but it MUST provide the following methods which will be used in the algorithms this document defines:

Insert, update, and remove clients.
Enqueue and dequeue reports for delivery.
Retrieve a list of client objects for an origin.
Retrieve a list of queued report objects.
Clear the cache.

3. Endpoint Delivery

A server MAY define a set of endpoint groups for an origin it controls through an origin policy manifest [ORIGIN-POLICY].

Endpoint groups are specified with the "network_reporting_endpoints" member, which defines the endpoint groups to be associated with that origin.

This member is defined in § 3.1 The "network_reporting_endpoints" policy item, and its processing in § 3.2 Process origin policy configuration.

3.1. The "network_reporting_endpoints" policy item

The network_reporting_endpoints member defines the endpoint groups to be associated with the origin.

If present, the member must be an array of objects.

Each object in the array defines a endpoint group to which reports may be delivered, and will be parsed as defined in § 3.2 Process origin policy configuration.

The following subsections define the set of known members which may be specified for each object in the array. Future versions of this document may define additional such members, and user agents MUST ignore unknown members when parsing the configuration.

3.1.1. The `group` member

The OPTIONAL group member is a string that associates a name with the endpoint group.

If present, the member’s value MUST be a string. If not present, the endpoint group will be given the name "default".

3.1.2. The `include_subdomains` member

The OPTIONAL include_subdomains member is a boolean that enables this endpoint group for all subdomains of the current origin’s host.

3.1.3. The `max_age` member

The REQUIRED max_age member defines the endpoint group’s lifetime, as a non-negative integer number of seconds.

The member’s value MUST be a non-negative number.

A value of "0" will cause the endpoint group to be removed from the user agent’s reporting cache.

3.1.4. The `endpoints` member

The REQUIRED endpoints member defines the list of endpoints that belong to this endpoint group.

The member’s value MUST be an array of JSON objects.

The following subsections define the initial set of known members in each JSON object in the array. Future versions of this document may define additional such members, and user agents MUST ignore unknown members when parsing the elements of the array.

3.1.5. The `endpoints.url` member

The REQUIRED url member is a string that defines the location of the endpoint.

The member’s value MUST be a string. Moreover, the URL that the member’s value represents MUST be potentially trustworthy [SECURE-CONTEXTS]. Non-secure endpoints will be ignored.

3.1.6. The `endpoints.priority` member

The OPTIONAL priority member is a number that defines which failover class the endpoint belongs to.

The member’s value, if present, MUST be a non-negative integer.

3.1.7. The `endpoints.weight` member

The OPTIONAL weight member is a number that defines load balancing for the failover class that the endpoint belongs to.

The member’s value, if present, MUST be a non-negative integer.

3.2. Process origin policy configuration

Given a map (parsed), and an origin (origin), this algorithm extracts a list of network reporting endpoints and endpoint groups for origin, and updates the reporting cache accordingly.

Note: This algorithm is called from around step 9 of Origin Policy § parse-a-string-into-an-origin-policy, and only updates the reporting cache if the response has been delivered securely.

Origin Policy monkey patching. Talk to Domenic.

Let groups be an empty list.
If parsed["network_reporting_endpoints"] exists and is a list, then for each item in parsed["network_reporting_endpoints"]:
1. If item has no member named "max_age", or that member’s value is not a number, skip to the next item.
2. If item has no member named "endpoints", or that member’s value is not an array, skip to the next item.
3. Let name be item’s "group" member’s value if present, and "default" otherwise.
4. If there is already a endpoint group in groups whose name is name, skip to the next item.
5. Let endpoints be an empty list.
6. For each endpoint item in the value of item’s "endpoints" member:
  1. If endpoint item has no member named "url", or that member’s value is not a string, or if that value is not an absolute-URL string or a path-absolute-URL string, skip to the next endpoint item.
  2. Let endpoint url be the result of executing the URL parser on endpoint item’s "url" member’s value, with base URL set to response’s url. If endpoint url is failure, skip to the next endpoint item.
  3. If endpoint item has a member named "priority", whose value is not a non-negative integer, skip to the next endpoint item.
  4. If endpoint item has a member named "weight", whose value is not a non-negative integer, skip to the next endpoint item.
  5. Let endpoint be a new network reporting endpoint whose properties are set as follows:
    
    name
    
    null
    
    url
    
    endpoint url
    
    priority
    
    The value of the endpoint item’s "priority" member, if present; 1 otherwise.
    
    weight
    
    The value of the endpoint item’s "weight" member, if present; 1 otherwise.
    
    failures
    
    0
    
    retry_after
    
    null
  6. Add endpoint to endpoints.
7. Let group be a new endpoint group whose properties are set as follows:
  
  name
  
  name
  
  subdomains
  
  "include" if item has a member named "include_subdomains" whose value is true, "exclude" otherwise.
  
  ttl
  
  item’s "max_age" member’s value.
  
  creation
  
  The current timestamp
  
  endpoints
  
  endpoints
8. Add group to groups.
Let client be a new client whose properties are set as follows:

origin

origin

endpoint-groups

groups
If there is already an entry in the reporting cache for origin, remove it.
Insert client into the reporting cache for origin.

4. Report Generation

Network reports can be generated with or without an active document. If a document is present, and can be considered the source of the report, then the report generated may be visible to reporting observers in that document.

When a user agent is to generate a network report, given a string (type), another string (endpoint group), a serializable object (data), and an optional Document (document), it must run the following steps:

If document is given, then
1. Let settings be document’s environment settings object.
2. Let report be the result of running Reporting API § 2.3 Queue data as type for destination with data, type, endpoint group and settings.
Otherwise, let report be the result of running Reporting API § 2.3 Queue data as type for destination with data, type, and endpoint group.
Append report to the reporting cache.

5. Report Delivery

Over time, various features will queue up a list of reports in the user agent’s reporting cache. The user agent will periodically grab the list of currently pending reports, and deliver them to the associated endpoints. This document does not define a schedule for the user agent to follow, and assumes that the user agent will have enough contextual information to deliver reports in a timely manner, balanced against impacting a user’s experience.

That said, a user agent SHOULD make an effort to deliver reports as soon as possible after queuing, as a report’s data might be significantly more useful in the period directly after its generation than it would be a day or a week later.

5.1. Choose an `endpoint` from a `group`

Note: This algorithm is the same as the target selection algorithm used for DNS SRV records.

Given an endpoint group (group), this algorithm chooses an arbitrary eligible endpoint from the group, if there is one, taking into account the priority and weight of the endpoints.

Let endpoints be a copy of group’s endpoints list.
Remove every endpoint from endpoints that is pending.
If endpoints is empty, return null.
Let priority be the minimum priority value of each endpoint in endpoints.
Remove every endpoint from endpoints whose priority value is not equal to priority.
If endpoints is empty, return null.
Let total weight be the sum of the weight value of each endpoint in endpoints.
Let weight be a random number ≥ 0 and ≤ total weight.
For each endpoint in endpoints:
1. If weight is less than or equal to endpoint’s weight, return endpoint.
2. Subtract endpoint’s weight from weight.
It should not be possible to fall through to here, since the random number chosen earlier will be less than or equal to total weight.

5.2. Send reports

A user agent sends reports by executing the following steps:

Let reports be a copy of the list of queued report objects in reporting cache.
Let endpoint map be an empty map of network reporting endpoint objects to lists of report objects.
For each report in reports:
1. Let origin be the origin of report’s url.
2. Let client be the entry in the reporting cache for origin.
3. If there exists an endpoint group (group) in client’s endpoint-groups list whose name is report’s destination:
  1. Let endpoint be the result of executing § 5.1 Choose an endpoint from a group on group.
  2. If endpoint is not null:
    1. Append report to endpoint map’s list of reports for endpoint.
    2. Skip to the next report.
4. If origin is a tuple origin whose host is a domain:
  1. For each parent domain that is a superdomain match for origin’s host [RFC6797], considering longer domains first:
    1. Let parent origin be a copy of origin, with its host replaced with parent domain.
    2. Let client be the entry in the reporting cache for parent origin.
    3. If there exists an endpoint group (group) in client’s endpoint-groups list whose name is report’s destination and whose subdomains flag is "include":
      1. Let endpoint be the result of executing § 5.1 Choose an endpoint from a group on group.
      2. If endpoint is not null:
        
        Append report to endpoint map’s list of reports for endpoint.
        
        Skip to the next report.
  Note: This algorithm ensures that more specific subdomains policies take precendence over less specific ones, and that subdomains policies are ignored for any non-domain origins (e.g., for a request to a raw IP address).
5. If we reach this step, the report did not match any network reporting endpoint and the user agent MAY remove report from the reporting cache directly. Depending on load, the user agent MAY instead wait for § 6.2 Garbage Collection at some point in the future.
For each (endpoint, reports) pair in endpoint map:
1. Let origin map be an empty map of origins to lists of report objects.
2. For each report in reports:
  1. Let origin be the origin of report’s url.
  2. Append report to origin map’s list of reports for origin.
3. For each (origin, per-origin reports) pair in origin map, execute the following steps asynchronously:
  1. Let result be the result of executing Reporting API § 3.5.2 Attempt to deliver reports to endpoint on endpoint, origin, and per-origin reports.
  2. If result is "Success":
    1. Set endpoint’s failures to 0, and its retry_after to null.
    2. Remove each report in reports from the reporting cache.
    Otherwise, if result is "Remove Endpoint":
    1. Remove endpoint from the reporting cache.
      
      Note: reports remain in the reporting cache for potential delivery to other endpoints.
    Otherwise (if result is "Failure"):
    1. Increment endpoint’s failures.
    2. Set endpoint’s retry_after to a point in the future which the user agent chooses.
      
      Note: We don’t specify a particular algorithm here, but user agents are encouraged to employ some sort of exponential backoff algorithm which increases the retry period with the number of failures, with the addition of some random jitter to ensure that temporary failures don’t lead to a crush of reports all being retried on the same schedule.
      
      Add in a reasonable reference describing a good algorithm. Wikipedia, if nothing else.

Note: User agents MAY decide to attempt delivery for only a subset of the collected reports or endpoints (because, for example, sending all the reports at once would consume an unreasonable amount of bandwidth, etc). As reports are only removed from the cache when they’re successfully delivered, skipped reports will simply be delivered later.

6. Implementation Considerations

6.1. Delivery

The user agent SHOULD attempt to deliver reports as soon as possible to provide feedback to developers as quickly as possible. However, when this desire is balanced against the impact on the user, the user wins. With that in mind, the user agent MAY delay delivery of reports based on its knowledge of the user’s activities and context.

For instance, the user agent SHOULD prioritize the transmission of reporting data lower than other network traffic. The user’s explicit activities on a website should preempt reporting traffic.

The user agent MAY choose to withhold report delivery entirely until the user is on a fast, cheap network in order to prevent unnecessary data cost.

The user agent MAY choose to prioritize reports from particular origins over others (perhaps those that the user visits most often?)

6.2. Garbage Collection

Periodically, the user agent SHOULD walk through the cached reports and endpoints, and discard those that are no longer relevant. These include:

endpoint groups which are expired.
endpoint groups which have not been used in some arbitrary period of time (perhaps a ~week?)
reports whose attempts exceed some user-agent-defined threshold (~5 seems reasonable.)
reports which have not been delivered in some arbitrary period of time (perhaps ~2 days?)

7. Sample Reports

POST / HTTP/1.1
Host: example.com
...
Content-Type: application/reports+json

[{
  "type": "csp",
  "age": 10,
  "url": "https://example.com/vulnerable-page/",
  "user_agent": "Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0",
  "body": {
    "blocked": "https://evil.com/evil.js",
    "directive": "script-src",
    "policy": "script-src 'self'; object-src 'none'",
    "status": 200,
    "referrer": "https://evil.com/"
  }
}, {
  "type": "hpkp",
  "age": 32,
  "url": "https://www.example.com/",
  "user_agent": "Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0",
  "body": {
    "date-time": "2014-04-06T13:00:50Z",
    "hostname": "www.example.com",
    "port": 443,
    "effective-expiration-date": "2014-05-01T12:40:50Z"
    "include-subdomains": false,
    "served-certificate-chain": [
      "-----BEGIN CERTIFICATE-----\n
      MIIEBDCCAuygAwIBAgIDAjppMA0GCSqGSIb3DQEBBQUAMEIxCzAJBgNVBAYTAlVT\n
      ...
      HFa9llF7b1cq26KqltyMdMKVvvBulRP/F/A8rLIQjcxz++iPAsbw+zOzlTvjwsto\n
      WHPbqCRiOwY1nQ2pM714A5AuTHhdUDqB1O6gyHA43LL5Z/qHQF1hwFGPa4NrzQU6\n
      yuGnBXj8ytqU0CwIPX4WecigUCAkVDNx\n
      -----END CERTIFICATE-----",
      ...
    ]
  }
}, {
  "type": "nel",
  "age": 29,
  "url": "https://example.com/thing.js",
  "user_agent": "Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0",
  "body": {
    "referrer": "https://www.example.com/",
    "server-ip": "234.233.232.231",
    "protocol": "",
    "status-code": 0,
    "elapsed-time": 143,
    "age": 0,
    "type": "http.dns.name_not_resolved"
  }
}]

8. Security Considerations

8.1. Capability URLs

Some URLs are valuable in and of themselves. To mitigate the possibility that such URLs will be leaked via this reporting mechanism, we strip out credential information and fragment data from the URL we store as a report’s originator. It is still possible, however, for a feature to unintentionally leak such data via a report’s body. Implementers SHOULD ensure that URLs contained in a report’s body are similarly stripped.

9. Privacy Considerations

9.1. Network Leakage

Because this reporting mechanism is out-of-band, and doesn’t rely on a page being open, it’s entirely possible for a report generated while a user is on one network to be sent while the user is on another network, even if they don’t explicitly open the page from which the report was sent.

Consider mitigations. For example, we could drop reports if we change from one network to another. [w3c/BackgroundSync Issue #107]

9.2. Clock Skew

Each report is delivered along with an age property, rather than the timestamp at which it was generated. We do this because each user’s local clock will be skewed from the clock on the server by an arbitrary amount. The difference between the time the report was generated and the time it was sent will be stable, regardless of clock skew, and we can avoid the fingerprinting risk of exposing the clock skew via this API.

9.3. Cross-origin correlation

If multiple origins all use the same reporting endpoint, that endpoint may learn that a particular user has interacted with a certain set of websites, as it will receive origin-tagged reports from each. This doesn’t seem worse than the status quo ability to track the same information from cooperative origins, and doesn’t grant any new tracking ability above and beyond what’s possible with <img> today.

9.4. Subdomains

This specification allows any resource on a host to declare a set of reporting endpoints for that host and each of its subdomains. This doesn’t have privacy implications in and of itself (beyond those noted in § 9.5 Clearing the reporting cache), as the reporting endpoints themselves don’t take any real action, as features will need to opt-into using these reporting endpoints explicitly. Those features certainly will have privacy implications, and should carefully consider whether they should be enabled across origin boundaries.

9.5. Clearing the reporting cache

A user agent’s reporting cache contains data about a user’s activity on the web, and user agents ought to handle this data carefully. In particular, if a user agent gives users the ability to clear their site data, browsing history, browsing cache, or similar, the user agent MUST also clear the reporting cache. Note that this includes both the pending reports themselves, as well as the endpoints to which they would be sent. Both MUST be cleared.

9.6. Disabling Reporting

Reporting is, to some extent, a question of commons. In the aggregate, it seems useful for everyone for reports to be delivered. There is direct benefit to developers, as they can fix bugs, which means there’s indirect benefit to users, as the sites they enjoy will be more stable and enjoyable. As a concrete example, Content Security Policy grants something like herd immunity to cross-site scripting attacks by alerting developers about potential holes in their sites' defenses. Fixing those bugs helps every user, even those whose user agents don’t support Content Security Policy.

The calculus, of course, depends on the nature of data that’s being delivered, and the relative maliciousness of the reporting endpoints, but that’s the value proposition in broad strokes.

That said, it can’t be the case that this general benefit be allowed to take priority over the ability of a user to individually opt-out of such a system. Sending reports costs bandwidth, and potentially could reveal some small amount of additional information above and beyond what a website can obtain in-band ([NETWORK-ERROR-LOGGING], for instance). User agents MUST allow users to disable reporting with some reasonable amount of granularity in order to maintain the priority of constituencies espoused in [HTML-DESIGN-PRINCIPLES].

Network Reporting API

Abstract

Status of this document