Captured Surface Control

W3C Editor's Draft

More details about this document
This version:
https://w3c.github.io/mediacapture-surface-control/
Latest published version:
none
Latest editor's draft:
https://w3c.github.io/mediacapture-surface-control/
History:
Commit history
Editor:
Elad Alon (Google)
Feedback:
GitHub w3c/mediacapture-surface-control (pull requests, new issue, open issues)

Abstract

Consider a Web application capturer which has used getDisplayMedia() to start capturing another display surface, capturee. This specification introduces a set of APIs that allow capturer the following new capabilities:

Status of This Document

This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C standards and drafts index at https://www.w3.org/TR/.

This document was published by the Web Real-Time Communications Working Group as an Editor's Draft.

Publication as an Editor's Draft does not imply endorsement by W3C and its Members.

This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 03 November 2023 W3C Process Document.

Background

Nearly all video-conferencing Web applications offer their users the ability to share display surfaces - typically a browser tab (browser), a native app's window (window), or an entire screen (monitor).

Many of these applications also show the local user a "preview tile" with a video of the captured display surface.

All these applications suffer from one key drawback - if the user wishes to interact with a captured display surface, the user must first switch to that surface, taking them away from the video-conferencing application. This presents a few issues:

  1. Users can't simultaneously interact with the captured application and see the videos of remote users.
  2. Users are burdened by the need to repeatedly switch between the video-conferencing application and the captured surface.
  3. Users are limited in their ability to see and interact with controls exposed by the video-conferencing application while they are interacting with the captured surface. A non-comprehensive list of examples of such controls includes - embedded chat applications, emoji reactions, "knock-ins" by users asking to join the call, and multimedia controls.

It bears mentioning that Document Picture-in-Picture goes a long way towards addressing some of these issues. However, it not always a suitable solution, as not all use cases are adequately addressed by a floating window which will often be small, which obscures arbitrary other content on the screen, and whose size and positioning must be manually controlled by the user.

Permissions Policy Integration

This specification defines a policy-controlled feature identified by the string "captured-surface-control". Its default allowlist is "self".

Note

The API surfaces introduced by this specification can be categorized as either read-access or write-access. Note that only the write-access APIs (forwardWheel, increaseZoomLevel, decreaseZoomLevel and resetZoomLevel) are gated by the "captured-surface-control" permissions policy.

Zoom

Definition of Zoom

We define a concept of an integer "zoom level" that can be applied to display surfaces of any type, and which is independent of the user agent and the platform. It is expected that in the case of browser display surfaces, this concept will match the concept of zoom level that user agents typically exposed to the user.

For a given display surface of type surfaceType, we define the user agent's set of supported zoom levels for surfaceType as a non-empty set of integers including at least the default zoom level (100), and not including any integers lesser than 1.

Permitted Event Types for zoom-setting

We define the permitted event types for zoom-setting as a set composed of the following event types:

Zoom-control APIs

WebIDLpartial interface CaptureController {
  sequence<long> getSupportedZoomLevels();
  readonly attribute long? zoomLevel;
  Promise<undefined> increaseZoomLevel();
  Promise<undefined> decreaseZoomLevel();
  Promise<undefined> resetZoomLevel();
  attribute EventHandler onzoomlevelchange;
};
getSupportedZoomLevels()

This method allows applications to discover the set of zoom levels supported by the user agent.

When invoked, the user agent MUST run the following steps:

  1. If this is not actively capturing, throw an "InvalidStateError" DOMException.
  2. Let surfaceType be this.[[DisplaySurfaceType]].
  3. If surfaceType is not a supported display surface type, throw a "NotSupportedError" DOMException.
  4. Return a monotonically increasing sequence containing all of the values in the supported zoom levels for surfaceType.
zoomLevel

This attribute allows applications to discover the captured display surface's zoom level.

On getting, the user agent MUST return this.[[ZoomLevel]].

increaseZoomLevel()

This method allows applications to set the captured display surface's zoom level one step higher than its current value.

When this method is invoked, the user agent MUST run the set zoom level algorithm with this as the controller and "increase" as the zoomAction.

decreaseZoomLevel()

This method allows applications to set the captured display surface's zoom level one step lower than its current value.

When this method is invoked, the user agent MUST run the set zoom level algorithm with this as the controller and "decrease" as the zoomAction.

resetZoomLevel()

This method allows applications to set the captured display surface's zoom level to 100.

When this method is invoked, the user agent MUST run the set zoom level algorithm with this as the controller and "reset" as the zoomAction.

onzoomlevelchange

An event handler IDL attribute whose event handler event type is zoomlevelchange.

Whenever this.[[Source]]'s zoom level changes to newZoomLevel, the user agent MUST queue a global task on the user interaction task source given the current realm's global object, which will run the following stpes:

  1. If this is not actively capturing, abort these steps.
  2. Set this.[[ZoomLevel]] to newZoomLevel.
  3. Fire an event named zoomlevelchange at this.
Note

Examples of causes include:

  • The user interacted with the user agent to change the zoom level of a captured tab.
  • The capturing application called increaseZoomLevel().
  • The user changed the shared display surface, choosing one which has a different zoom level.

Scroll

Scrolling APIs

WebIDLpartial interface CaptureController {
  constructor();
  Promise<undefined> forwardWheel(HTMLElement? element);
};
constructor

CaptureController's constructor is extended to also define and initialize the following internal slots:

Internal Slot Initial value
[[ZoomLevel]] null
[[ForwardWheelElement]] null
[[ForwardWheelEventListener]] null
forwardWheel()

This method allows applications to automatically forward wheel events from an HTMLElement to the viewport of a captured display surface.

When invoked, the user agent MUST run the following steps:

  1. If this is not actively capturing, return a promise rejected with a DOMException object whose name attribute has the value InvalidStateError.
  2. If this is self-capturing, return a promise rejected with a DOMException object whose name attribute has the value InvalidStateError.
  3. Let surfaceType be this.[[DisplaySurfaceType]].
  4. If surfaceType is not a supported display surface type, return a promise rejected with a DOMException object whose name attribute has the value NotSupportedError.
  5. Let element be the method's first argument.
  6. Let P be a new Promise.
  7. Run the following steps in parallel:
    1. Get the current permission state of "captured-surface-control". If the result is NOT "granted", and the relevant global object does NOT have transient activation, then:
      1. Queue a global task on the user interaction task source given the current realm's global object as global to reject P with a DOMException object whose name attribute has the value InvalidStateError.
      2. Abort these steps.
      Note

      This step ensures that on the one hand, permission prompts are not be shown without transient activation, while on the one hand, if the permission is already "granted", forwardWheel() may be called immediately after getDisplayMedia() resolves, even if the transient activation that permitted the call to forwardWheel() has since expired.

    2. Request permission to use a PermissionDescriptor with its name member set to "captured-surface-control". If the result of the request is "denied", then:
      1. Queue a global task on the user interaction task source given the current realm's global object as global to reject P with a new DOMException object whose name is NotAllowedError.
      2. Abort these steps.
    3. If this.[[ForwardWheelElement]] is not null, remove an event listener with this.[[ForwardWheelElement]] as eventTarget and this.[[ForwardWheelEventListener]] as listener.
    4. Set this.[[ForwardWheelEventListener]] to null.
    5. Set this.[[ForwardWheelElement]] to element.
    6. If this.[[ForwardWheelElement]] is not null:
      1. Set this.[[ForwardWheelEventListener]] to an event listener defined as follows:
        type
        wheel
        callback
        The result of creating a new Web IDL EventListener instance representing a reference to a function of one argument of type Event event. This function executes the forward wheel event algorithm given this and event.
      2. Add an event listener with this.[[ForwardWheelElement]] as eventTarget and this.[[ForwardWheelEventListener]] as listener.
    7. Queue a global task on the user interaction task source given the current realm's global object as global to resolve P.
  8. Return P.

Extensions to the getDisplayMedia algorithm

Extend the getDisplayMedia algorithm as follows:

Recall that p is the promise which the algorithm returns. Immediately before the step which resolves it, add the following steps:

  1. If controller is not null and controller.[[DisplaySurfaceType]] is a supported display surface type, then set controller.[[ZoomLevel]] to controller.[[Source]]'s zoom level.

Subroutines

Subroutine: Actively capturing

To determine if a CaptureController controller is actively capturing, run the following steps:

  1. Let source be controller.[[Source]].
  2. If source is null, return false.
  3. If source has been stopped, return false.
  4. Return true.

Subroutine: Is self-capturing

To determine if a CaptureController controller is is self-capturing, run the following steps:

  1. If controller is not actively capturing, return false.
  2. If controller.[[Source]] is a display surface of type browser, and represents the relevant global object's associated Document, return true.
  3. Return false.

Subroutine: Supported display surface type

To determine if a display surface surfaceType is supported display surface type, run the following steps:

  1. If surfaceType is browser, return true.
  2. Return false.
Note

Whether window should be supported is under discussion.

Subroutine: Setting the zoom level

The set zoom level algorithm, given a controller of type CaptureController and a zoomAction of type DOMString as arguments, consists of running the following steps:

  1. If controller is not actively capturing, return a promise rejected with a DOMException object whose name attribute has the value InvalidStateError.
  2. If controller is self-capturing, return a promise rejected with a DOMException object whose name attribute has the value InvalidStateError.
  3. Let surfaceType be controller.[[DisplaySurfaceType]].
  4. If surfaceType is not a supported display surface type, return a promise rejected with a DOMException object whose name attribute has the value NotSupportedError.
  5. Ensure that the code is running from within the context of an event handler which was triggered by the browser agent firing a trusted event, triggered by the user interacting with the user agent. To do so, run the following steps:

    1. Let currentEvent be Window.event.
    2. If currentEvent is undefined, return a promise rejected with a DOMException object whose name attribute has the value InvalidStateError.
    3. If currentEvent.isTrusted is false, return a promise rejected with a DOMException object whose name attribute has the value InvalidStateError.
    4. If currentEvent.type is not in permitted event types for zoom-setting, return a promise rejected with a DOMException object whose name attribute has the value InvalidStateError.
      Note

      It follows from these steps that increaseZoomLevel(), decreaseZoomLevel() and resetZoomLevel() are only callable with transient activation, because permitted event types for zoom-setting only contains event types that confer this activation.

      In fact, our API shape implies a stronger guarantee - whereas transient activation persists for several seconds after the user action, the API shape here limits zoom-setting to immediately after the user's action.

  6. Let currentZoomLevel be controller.[[Source]]'s zoom level
  7. Let targetZoomLevel be a long. Set its value as follows:
    1. If zoomAction is "decrease" then:

      1. If currentZoomLevel is the minimum value in supported zoom levels, return a promise rejected with a DOMException object whose name attribute has the value InvalidStateError.
      2. Otherwise, set targetZoomLevel to the value in supported zoom levels that appears immediately after currentZoomLevel.
    2. Else, if zoomAction is "increase" then:

      1. If currentZoomLevel is the maximum value in supported zoom levels, return a promise rejected with a DOMException object whose name attribute has the value InvalidStateError.
      2. Otherwise, set targetZoomLevel to the value in supported zoom levels that appears immediately after currentZoomLevel.
    3. Else:

      1. Assert that zoomAction is "reset".
      2. Set targetZoomLevel to 100.
  8. Let P be a new Promise.
  9. Run the following steps in parallel:

    1. Request permission to use a PermissionDescriptor with its name member set to "captured-surface-control". If the result of the request is "denied", then:
      1. Queue a global task on the user interaction task source given the current realm's global object as global to reject P with a new DOMException object whose name is NotAllowedError.
      2. Abort these steps.
    2. Set this.[[Source]]'s zoom level to targetZoomLevel.
    3. Queue a global task on the user interaction task source given the current realm's global object as global to resolve P.
  10. Return P.

Subroutine: Forward wheel event

The forward wheel event algorithm takes a CaptureController controller and a WheelEvent event, and runs the following steps:

  1. If controller is not actively capturing, abort these steps.
  2. If this is self-capturing, abort these steps.
  3. Let surfaceType be controller.[[DisplaySurfaceType]].
  4. If surfaceType is not a supported display surface type, abort these steps.
  5. Run the following steps in parallel:
    1. Get the current permission state of "captured-surface-control". If the result is NOT "granted", abort these steps.
    2. If event.isTrusted is false, abort these steps.
    3. Let [scaledX, scaledY] be the result of the scale element coordinates algorithm on [event.offsetX, event.offsetY] and this.[[ForwardWheelElement]].
    4. Queue a global task on the user interaction task source of controller.[[Source]]'s current realm, given that realm's global object, to fire an event named "wheel" using WheelEvent with the x attribute initialized to scaledX, the y attribute initialized to scaledY, the deltaX attribute initialized to event.deltaX and the deltaY attribute initialized to event.deltaY, at the topmost event target.

Subroutine: Scale element coordinates

The scale element coordinates algorithm takes double coordinates [x, y] and a CaptureController controller, and run the following steps:

  1. Let scaleFactorX be (x / controller.[[ForwardWheelElement]].getBoundingClientRect().width).
  2. Let scaleFactorX be (x / controller.[[ForwardWheelElement]].getBoundingClientRect().height).
  3. Let surfaceWidth be controller.[[Source]]'s viewport's width.
  4. Let surfaceHeight be controller.[[Source]]'s viewport's height.
  5. Let scaledX be (|scaleFactorX| * |surfaceWidth|).
  6. Let scaledY be (|scaleFactorY| * |surfaceHeight|).
  7. Return [scaledX, scaledY].
Note

This subroutine assumes that controller is actively capturing.

Privacy and Security Considerations

The API surfaces introduced in this specification allow a capturing application limited control over a captured application. These APIs allow the capturing application to gain access to additional pixels in the captured application. This specification employs multiple means to ensure that new capabilities are used in accordance with the user's intentions. Among these means:

Zoom-setting: Limitation to specific interactions

increaseZoomLevel(), decreaseZoomLevel() and resetZoomLevel() are only callable from event handlers of specific event types - the permitted event types for zoom-setting. These are events dispatched directly by the user agent, triggered by user interaction. This specification intentionally excludes from this set such events as "mousemove", which users are liable to trigger inadvertently.

Scrolling: Limitation to specific interactions

The shape of forwardWheel() is intentionally chosen to limit the capturing application's control. The application designates a specific element which, when the user scrolls over it, the corresponding wheel events are forwarded to the captured application.

Limiting element types

This specification does not limit the type of Element for which either increaseZoomLevel(), decreaseZoomLevel(), resetZoomLevel() or forwardWheel() work. Such a limitation would accomplish nothing, because malicious applications could always overlay transparent permitted Element types on top of visible non-permitted Elements, thereby bypassing this restriction.

The limitation of interaction types is sufficient. This is accomplished by forwardWheel() through its shape, and by increaseZoomLevel(), decreaseZoomLevel() and resetZoomLevel() through their gating on event types.

Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key word MUST in this document is to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.