User Interface Security and the Visibility API

Editor’s Draft,

This version:
https://w3c.github.io/webappsec/specs/uisecurity/
Feedback:
public-webappsec@w3.org with subject line “[UI Security] … message topic …” (archives)
Editor:
(Facebook)
Author:
Dan Kaminsky, Invited Expert
David Lin-Shung Huang, Carnegie Mellon University
Giorgio Maone, Invited Expert

Abstract

UI Security and the Visiblity API defines both a declarative and imperative means for resources displayed in an embedded context to protect themselves against having their content obscured, moved, or otherwise displayed in a misleading manner.

Status of this document

This is a public copy of the editors’ draft. It is provided for discussion only and may change at any moment. Its publication here does not imply endorsement of its contents by W3C. Don’t cite this document other than as work in progress.

Changes to this document may be tracked at https://github.com/w3c/webappsec.

The (archived) public mailing list public-webappsec@w3.org (see instructions) is preferred for discussion of this specification. When sending e-mail, please put the text “UI Security” in the subject, preferably like this: “[UI Security] …summary of comment…

This document was produced by the Web Application Security Working Group.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 September 2015 W3C Process Document.

Table of Contents

1. Introduction

This section is not normative.

Composite or "mash-up" web applications built using iframes are ubiquitous because they allow users to interact seamlessly and simultaneously with content from multiple origins while maintaining isolation boundaries that are essential to security and privacy for both users and applications.

However, those boundaries are not absolute. In particular, the visual and temporal integrity of embedded content is not protected from manipulation by the embedding resource. An embedding resource might constrain the viewport, draw over, transform, reposition, or resize the user’s view of a third-party resource.

Collectively known as User Interface Redressing, the goal of such manipulations might be to entice the user to interact with embedded content without knowing its context, (e.g. to send a payment or share content) commonly known as "clickjacking", or to convince paid content that it is being shown to the user when it is actually obscured, commonly known in the advertising business as "display fraud".

Existing anti-clickjacking measures such as frame-busting scripts and headers granting origin-based embedding permissions have shortcomings which prevent their application to important use-cases. Frame-busting scripts, for example, rely on browser behavior that has not been engineered to provide a security guarantee and as a consequence, such scripts may be unreliable if loaded inside a sandbox or otherwise disabled. The X-Frame-Options header and the frame-ancestors Content Security Policy directive offer an all-or-none approach to display of embedded content that is not appropriate for content which may be embedded in arbitrary locations, or known locations which might still be adversarial.

This document defines mechanisms to allow resources to protect themselves from embedding contexts which might otherwise interfere with their display and interaction characteristics. First, it defines an imperative API by which a resource can request that a conforming user agent guarantee unmodified display of its viewport, and report events on the success or failure of meeting such guarantees. This API should be suitable for e.g. paid content such as advertising to receive trustworthy signals about its viewability from a conforming user agent.

Secondly, this specification defines a declarative mechanism (via a Content Security Poicy directive) to request visiblity protection and receive notification, via event properties or out-of-band reporting, if certain events are delivered to a resource while it does not meet its requested visiblity contract.

how to interact with frame-ancestors and XFO?

2. Special Conformance Notes

This section is not normative.

UI Redressing attacks rely on fooling the subjective perceptions of human actors to induce them to interact with a web application out of its intended context. Because of this, the specific mechanisms which may be used in attack and defense may vary greatly with the details of a user agent implementation. For example, attacks which rely on redressing the cursor may not apply in a touch environment, or entire classes of attack may be impossible on a text-only browser or screen reader.

Similarly, the implementation of the policies specified herein is highly dependent on internal architecture and implementation strategies of the user agent; such strategies may vary greatly between user agents or even across versions or platforms for a single user agent.

This specification provides a normative means by which a resource owner can communicate to a user agent its desire for additional protective measures, actions to take if violations are detected, and tuning hints which may be useful for certain means of implementation. A user agent is conformant if it understands these directives and makes a best effort to provide the desired security properties, which might require no additional implementation steps, e.g. in the case of a screen reader that does not support embedded resources in a manner that is subject to any of the attack classes of concern.

While the indeterminacy of the user agent implementation protects applications from needing to constantly update their policies as user agents make internal changes, application authors should understand that even a conformant user agent cannot make perfect security guarantees against UI Redressing. Some potential checks suggested here might not be made as a consequence of implementation strategies, or to avoid unacceptable performance costs, and the intrepretation of policies might differ among user agents. (e.g. a tolerance hint may appy strictly to rectangular clipping in one implementation but to pixel or sub-pixel comparisions in another)

These directives should be used as part of a comprehensive risk mitigation strategy with an appropriate understanding of their limitations.

3. JavaScript API

should this look more like MutationObserver?

shoudl we have an argument that doesn’t apply GraphicsLayer raising?

Sterling-Cooper Online wants to be able to determine if ad impressions it is serving are actually visible to users. The following example code will send a beacon containing the resource’s location fragment, after a full second of uninterrupted visibility, in a viewport of at least 300x100, with content positioned at the origin.
function viewabilityHandler() {

  var lastEvent = null, done = false;

  function inTolerance (event) {
    return 
      (event.viewportWidth >= 300 && 
        event.viewportHeight >= 100)) &&
      (event.viewportX == 0 && event.viewportY == 0 &&
        event.visibleX == 0 && event.visibleY == 0) &&
      (event.visibleWidth == event.viewportWidth && 
        event.visibleHeight = event.viewportHeight);
  }

  function handleEvent (event) {
    if (event.type.toLowerCase() == "visibilityevent" && !done) {
      lastEvent = event;
      if (inTolerane(event)) {
        window.setTimeout(
          function() { if (lastEvent === event) { report() } },
          1000
        );  
  }}}

  function report() {
      done = true;
      navigator.sendBeacon(
        '/viewabilityEvent',
        new FormData().set('id', window.location.hash)
      );
  }
	
  return handleEvent();
};

var handler = viewabilityHandler();

window.addEventListener('visiblity', handler);
navigator.requestVisiblity();

3.1. requestVisiblity method

partial interface Navigator {
  void requestVisibility();
};
requestVisiblity()
Request that this browsing context be made visible and that VisibilityEvents be delivered to the Window object.

3.2. VisibilityEvent

interface VisibilityEvent : Event {
  readonly attribute long viewportWidth;
  readonly attribute long viewportHeight;
  readonly attribute long viewportX;
  readonly attribute long viewportY;
  readonly attribute long visibleWidth;
  readonly attribute long visibleHeight;
  readonly attribute long visibleX;
  readonly attribute long visibleY;
};
viewportWidth, of type long, readonly
The current width of the viewport.
viewportHeight, of type long, readonly
The current height of the viewport.
viewportX, of type long, readonly
The X position of the viewport root, relative to the root window’s coordinate set.
viewportY, of type long, readonly
The Y position of the viewport root, relative to the root window’s coordinate set.
visibleWidth, of type long, readonly
The current width of the visible rectangle.
visibleHeight, of type long, readonly
The current height of the visible rectangle.
visibleX, of type long, readonly
The X position of the top-left corner of the visible rectangle, relative to the current document’s coordinate set.
visibleY, of type long, readonly
The Y position of the top-left corner of the visible rectangle, relative to the current document’s coordinate set.

are viewportX and Y relative to current document coordinates or root window coords?

4. CSP Directive

This section describes the content security policy directives introduced in this specification.

4.1. input-protection

The input-protection directive, if present or implied, is declaritive syntatic sugar for the same fundamental mechanism as the JavaScript API. If a user input event, such as click, keypress, touch, and drag, is generated when the document is not in a visibility state that matches the visibility policy, this directive is violated.

If the input-protection directive is set as part of a Content-Security-Policy, violating the policy should cancel delivery of the UI event to the target and and fire a violation event. If set as part of a Content-Security-Policy-Report-Only, triggering of the heuristic should result in the event being delivered with the unsafe attribute on the UIEvent set to true and fire a violation event. (TODO: Pointer Events?)

The optional directive value allows resource authors to provide options for heuristic tuning in the form of space-separated option-name=option-value pairs.

directive-name    = "input-protection"
directive-value   = ["display-time=" num-val]
                    ["height=" num-val]
                    ["width=" num-val]
                    ["protected-element=" id-selector]

If the policy does not contain a value for this directive or any of the hint name=value pairs are absent, the user agent SHOULD apply default values for hints as described in the following.

display-time
is a numeric value from 0 to 10000 that specifies how long, in milliseconds, the screen area containing the protected user interface must have unmodified viewability properties when the event is processed.

If not specified, it defaults to 800. If a value out of the range stated above is specified, it defaults to the nearest value between the lower and the higher bounds.

width
is a numeric value that specifies the minimum viewable area’s X dimension. If unspecified, it defaults to the width of the protectedElement’s boundingClientRect.
height
is a numeric value that specifies the minimum viewable area’s Y dimension. If unspecified, it defaults to the height of the protectedElement’s boundingClientRect.
protected-element
An id querySelector expression which uniquely identifies an element in the protected document. The protected element’s boundingClientRect will be used as the boundaries for visibility calculations. If a minWidth and minHeight are specified, they are calculated relative to the origin of the protected element’s boundingClientRect. If unspecified, this value defaults to the Document element.

should the event have a timestamp or just let the event handler assume it happened close enough to when it recieves it?

innerHeight, innerWidth (ask Dan)

ancestorOrigins

5. Definitions

5.1. Visibility

An area is defined as visible if it has visual and spatio-temporal integrity and meets minimum required width and height requirements.

5.1.1. Visual Integrity

An area has visual integrity if it is not subject to being occuluded, drawn over, blurred, zoomed, clipped, rotated, scaled, transformed or subject to any other effects which would change its appearance relative to being painted in the same size viewport in the root window. The property of visual integrity applies to rectangular areas, and an area only has the property if no such changes occur within its entire bounding rectangle.

5.1.2. Spatio-Temporal Integrity

An area’s spatio-temporal integrity refers to it being displayed without modifications to its display state, viewport size, position of content within the viewport, or position of the viewport relative to the root window’s coordinate system. Any modification to these properties generates a new visibility state.

5.1.3. Visibility State

A visibility state is the description of a document’s visual and spatio-temporal integrity and size and the elapsed time since those properties have last changed.

6. Algorithms

6.1. Requesting Visibility

When a document calls requestVisibility(), the user agent SHOULD attempt to display the document’s viewport in a manner that is not subject to perturbation by ancestor frames and MUST report visibilityEvents.

NOTE: The exact procedures of this algorithm are not normative. Any algorithm which produces equivalent results is conformant and implementations are encouraged to optimize wherever possible.

  1. Complete any pending layout and painting tasks, (find more precice language here) including layer hoisting as described in the non-normative implementation advice.

  2. Visibility events should not be fired until DOMContentReady has fired on the protected document and all ancestor documents.

  3. Calculate a visibiltyEvent

    1. Obtain the height and width of the protected document’s viewport

    2. Obtain the x and y coordinates of the origin of the protected document’s viewport in the coordinate system of the root window.

    3. Obtain the bounding rectangle of the protected document that has visual integrity.

      1. If the entire viewport does not have visual integrity, the user agent MAY always report a rectangle of zero dimensions, e.g. if the viewport is subject to transformations with complex outlines, or modifications which section the viewport into multiple distinct visible regions.

      2. Otherwise, the user agent MAY report the largest bounding rectangle within the viewport with visual integrity, if e.g. only a small portion of the viewport is cropped.

    1. Dispatch a visibilityEvent to the window object of the protected document.

  4. Whenever the spatio-temporal or visual integrity of the protected document’s viewport changes, a new visibility state is created and step 3 of this algorithm MUST be repeated.

Internal changes to the contents of the protected document such as CSS effects, modifications to the DOM, or animations, MUST NOT create a new visibility state, but internally generated events which scroll or resize the viewport MUST.

6.2. Input Protection Algorithm

Implementation of the input-protection CSP directive is internally defined in terms of the Requesting Visibilty algorithm.

  1. If the directive is specified, begin as if requestVisibiity() was invoked.

  2. At step 3.4 (dispatch a visibilityEvent), do not dispatch the event (unless visibility events have also been requested with the imperative API) but retain the calculated values of the visibility state, plus a timestamp, as internal state.

  3. At step 4 (handle new visibility states), replace the retained internal state with the new calculated values and a new timestamp.

  4. Parse the directive options and determine values (or use defaults) for display time, height, width and protected-element.

During hit testing for user initiated events (UIEvents, PointerEvents, drag-and-drop, copy/paste, etc.):

do we need keypress or a keyboard-driven click events?

  1. Examine the element the event is delivered to to deterimine if it matches the id selector that is the value of protected-element.

    1. If element matches the selector, continue to step 2.

    2. If the element does not match and the element has a parent

      1. Set element to element.parent and go to step 1.1

    3. If no element in the set of elements which might handle the event matches the selector, deliver the event and terminate this portion of the algorithm. Continue monitoring visibility states.

  2. If an event will be delivered to the protected-element or one of its children

    1. Get the current timestamp and the retained internal state of the most recent visibility state

    2. Get the bounding rectangle of the protected-element

    3. If height and width were not set explicitly as part of the directive, set them to the current height and width of the protected element’s bounding rectangle for the purposes of processing this event

    4. Get the viewport height and width of the displayport for the protected document.

      1. If the size of the viewport is less in either dimention than the height and width values, go to handle a violation.

    5. Get the bounding rectangle of the protected-element and determine the x and y coordinates of the origin of that rectangle in the document’s own coordinate system.

    6. Calculate the protected rectangle that must be visibile by applying the height and width values to those origin coordinates. (if these values are not explicitly specified in the directive options, this is the bounding rectangle of the element)

    7. If the protected rectangle is not entirely within the rectangle defined by the visibleX, visibleY, visibleWidth, and visibleHeight values of the visibility state, go to handle a violation.

    8. If the protected rectangle is entirely within the visible rectangle, subtract the timestamp of the visibility state from the current timestamp and compare it to the display-time value of the directive, or 800 if unset. If the calculated value is less than the specified value, go to handle a violation.

    9. Otherwise, deliver the event as normal and terminate this portion of the algorithm. Continue monitoring visibility states.

Cursor redressing.

6.2.1. Handling Violations

When a violation is raised:

  1. If the directive is being enforced, cancel delivery of the event and Fire a Violation Event as defined in [CSP2].

  2. If the directive is being monitored, add a property unsafe to the event, set to true, and Fire a Violation Event as defined in [CSP2].

6.3. Unsafe Attribute

partial interface Event {
  readonly attribute boolean unsafe;
};
unsafe, of type boolean, readonly
Will be set to true if the event fired when the document’s visibility state did not meet input-protection requirements.

7. Implementation Considerations

This section is non-normative.

The internal details of the full pipeline between the normatively specified portions of the Web Platform, such as the Document Object Model and CSS, and pixels actually displayed to the user, is not standardized. There may exist considerable variability in the implementations and strategies employed between different user agents, or even within a single user agent on platforms with differing capabilities.

The implementation strategy detailed in this section is not normative. Any strategy which produces correct outcomes for the normative algorithms is conformant and implementers are encouraged to optimize whenever possible.

The possibility of wide variance among user agent implementations notwithstanding, the normative algorithms of this specification are designed such that a highly performant implementation should be possible on the most common internal architectures that are state-of-the-art as of the time of writing.

Roughly, at some point along the transformation from DOM to pixels, a user agent will arrive at an intermediate representation which represents a set of surfaces to be painted / clipped / scrolled. We will designate this a GraphicsLayer.

  1. On the way to preparing the set of GraphicsLayers, determine if an iframe which has invoked requestVisibliity() will be put into its own GraphicsLayer. If it is not, apply whatever implemenation-specific transformations are necessary to generate it as a distinct layer. (e.g. add translatez(0) to the documentElement)

  2. Determine the GraphicsLayer associated with the root document, topmost in the window.

  3. Without reordering prior intermediate representations in a manner which would change event dispatching, hit testing or the DOM as exposed to JavaScript, reorder the GraphicsLayers such that the iframe which has requested visibility is on top of the root GraphicsLayer. (e.g. by making it a direct child of the root layer)

NOTE: By reordering the layers such that an iframe which has requested visibility is topmost, it should naturally avoid being subject to nearly all manipulations and transformations by parent layers which could otherwise change its display characteristics.

  1. Clipping transformations must be respected. An iframe requesting visibility is not allowed to take over more of the display than allowed by its ancestor documents.

    1. Determine the size of the protected GraphicsLayer to be raised.

    2. Determine the position of that rectangle on the top viewport.

  2. Re-implement necessary clipping that would be otherwise bypassed by the layer raising.

    1. Obtain the bounds of the iframe in the viewport space of its parent

    2. Intersect those boundaries with those of the to-be-raised GraphicsLayer

    3. Repeat by finding the bounds in the viewport space of the next ancestor document, until the topmost document in the window is reached.

  3. Determine and clip appropriately for the position of content in the viewport.

    1. Correct the boundaries, position and offset of the to-be-raised GrapicsLayer to account for scrolling of both the root document and any intermediate frames

    Dan, I’m not sure I got those last 2 or so step exactly correct....

  4. If multiple iframes in the rendering tree have requested visibility, they may overlap, and only one can be topmost. It is undefined which should actually be topmost, only that each knows what its true visible boundaries are. If there are other to-be-raised GraphicsLayers subsequent and superior to this one, calculate and then intersect those boundaries with these boundaries to arrive at the final visible area.

  5. Report back the viewport and visible portion.

8. Privacy Considerations

This section is non-normative.

The timing of visibilityEvents may leak some information across Origin boundaries. An embedded document might have previously been unable to learn that it was obscured, or the timing and nature of repositioning of ancestor frame’s viewports. In some circumstances, this information leak might have privacy implications, but the granularity and nature of the information is such that it should not be of much value to attackers. Compared to anti-clickjacking strategies which rely on pixel comparisions, the side channels exposed by comparing rectulangar masks are very low bandwidth. The privacy gains from preventing clickjacking, considered in a holistic system context, may be quite large.

9. Security Considerations

This section is non-normative.

UI Redressing and Clickjacking attacks rely on violating the contextual and temporal integrity of embedded content. Because these attacks target the subjective perception of the user and not well-defined security boundaries, the heuristic protections afforded by the input-protection directive can never be 100% effective for every interface. It provides no protection against certain classes of attacks, such as displaying content around an embedded resource that appears to extend a trusted dialog but provides misleading information.

10. Accessibility Considerations

User of accessibility tools MUST NOT be prevented from accessing content because of input-protection or VisibilityEvents. Accessibility tools MAY undefine requestVisibility(), report a synthesized event with a visibility state at onDOMContentReady that indicates the entire document is visible, and/or redefine violation handling for input-protection to a no-op.

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words "for example" or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word "Note" and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Conformant Algorithms

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps can be implemented in any manner, so long as the end result is equivalent. In particular, the algorithms defined in this specification are intended to be easy to understand and are not intended to be performant. Implementers are encouraged to optimize.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[CSP2]
Mike West; Adam Barth; Daniel Veditz. Content Security Policy Level 2. 21 July 2015. CR. URL: http://www.w3.org/TR/CSP2/
[HTML]
Ian Hickson. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[DOM-LS]
Document Object Model URL: https://dom.spec.whatwg.org/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119

IDL Index

partial interface Navigator {
  void requestVisibility();
};

interface VisibilityEvent : Event {
  readonly attribute long viewportWidth;
  readonly attribute long viewportHeight;
  readonly attribute long viewportX;
  readonly attribute long viewportY;
  readonly attribute long visibleWidth;
  readonly attribute long visibleHeight;
  readonly attribute long visibleX;
  readonly attribute long visibleY;
};

partial interface Event {
  readonly attribute boolean unsafe;
};

Issues Index

how to interact with frame-ancestors and XFO?
should this look more like MutationObserver?
shoudl we have an argument that doesn’t apply GraphicsLayer raising?
are viewportX and Y relative to current document coordinates or root window coords?
should the event have a timestamp or just let the event handler assume it happened close enough to when it recieves it?
innerHeight, innerWidth (ask Dan)
ancestorOrigins
do we need keypress or a keyboard-driven click events?
Cursor redressing.
Dan, I’m not sure I got those last 2 or so step exactly correct....