Consider a Web application |capturer| which has used {{MediaDevices/getDisplayMedia()}} to start capturing another [=display surface=], |capturee|. This specification introduces a set of APIs that allow |capturer| the following new capabilities:

Background

Nearly all video-conferencing Web applications offer their users the ability to share [=display surfaces=] - typically a browser tab ([=display surface/browser=]), a native app's window ([=display surface/window=]), or an entire screen ([=display surface/monitor=]).

Many of these applications also show the local user a "preview tile" with a video of the captured [=display surface=].

All these applications suffer from one key drawback - if the user wishes to interact with a captured [=display surface=], the user must first switch to that surface, taking them away from the video-conferencing application. This presents a few issues:

  1. Users can't simultaneously interact with the captured application and see the videos of remote users.
  2. Users are burdened by the need to repeatedly switch between the video-conferencing application and the captured surface.
  3. Users are limited in their ability to see and interact with controls exposed by the video-conferencing application while they are interacting with the captured surface. A non-comprehensive list of examples of such controls includes - embedded chat applications, emoji reactions, "knock-ins" by users asking to join the call, and multimedia controls.

It bears mentioning that Document Picture-in-Picture goes a long way towards addressing some of these issues. However, it not always a suitable solution, as not all use cases are adequately addressed by a floating window which will often be small, which obscures arbitrary other content on the screen, and whose size and positioning must be manually controlled by the user.

Permissions Policy Integration

This specification defines a [=policy-controlled feature=] identified by the string "captured-surface-control". Its [=policy-controlled feature/default allowlist=] is "self".

The API surfaces introduced by this specification can be categorized as either read-access or write-access. Note that only the write-access APIs ({{CaptureController/setZoomLevel}} and {{CaptureController/forwardWheel}}) are gated by the "captured-surface-control" permissions policy.

Zoom

Definition of Zoom

We define a concept of an integer "zoom level" that can be applied to [=display surfaces=] of any type, and which is independent of the user agent and the platform. It is expected that in the case of [=display surface/browser=] [=display surfaces=], this concept will match the concept of zoom level that user agents typically exposed to the user.

For a given [=display surface=] of type |surfaceType|, we define the user agent's set of supported zoom levels for |surfaceType| as a non-empty set of integers including at least the [=default zoom level=] (100), and not including any integers lesser than 1.

Permitted Event Types for setZoomLevel()

We define the permitted event types for setZoomLevel as a set composed of the following event types:

Zoom-control APIs

          partial interface CaptureController {
            sequence<long> getSupportedZoomLevels();
            long getZoomLevel();
            Promise<undefined> setZoomLevel(long zoomLevel);
            attribute EventHandler oncapturedzoomlevelchange;
          };
        
getSupportedZoomLevels()

This method allows applications to discover the set of [=zoom levels=] supported by the user agent.

When invoked, the user agent MUST run the following steps:

  1. If [=this=] is not [=actively capturing=], [=exception/throw=] an "{{InvalidStateError}}" {{DOMException}}.
  2. Let |surfaceType| be [=this=].[[\DisplaySurfaceType]].
  3. If |surfaceType| is not a [=supported display surface type=], [=exception/throw=] a "{{NotSupportedError}}" {{DOMException}}.
  4. Return a monotonically increasing sequence containing all of the values in the [=supported zoom levels=] for |surfaceType|.
getZoomLevel()

This method allows applications to discover the captured [=display surface=]'s [=zoom level=].

When invoked, the user agent MUST run the following steps:

  1. If [=this=] is not [=actively capturing=], [=exception/throw=] an "{{InvalidStateError}}" {{DOMException}}.
  2. If [=this=].[[\DisplaySurfaceType]] is not a [=supported display surface type=], [=exception/throw=] a "{{NotSupportedError}}" {{DOMException}}.
  3. Return [=this=].[[\Source]]'s [=zoom level=].
setZoomLevel()

This method allows applications to set the captured [=display surface=]'s [=zoom level=].

When invoked, the user agent MUST run the following steps:

  1. If [=this=] is not [=actively capturing=], return a promise [=reject|rejected=] with a {{DOMException}} object whose {{DOMException/name}} attribute has the value {{InvalidStateError}}.
  2. If [=this=] [=is self-capturing=], return a promise [=reject|rejected=] with a {{DOMException}} object whose {{DOMException/name}} attribute has the value {{InvalidStateError}}.
  3. Let |surfaceType| be [=this=].[[\DisplaySurfaceType]].
  4. If |surfaceType| is not a [=supported display surface type=], return a promise [=reject|rejected=] with a {{DOMException}} object whose {{DOMException/name}} attribute has the value {{NotSupportedError}}.
  5. Ensure that the code is running from within the context of an event handler which was triggered by the browser agent firing a trusted event, triggered by the user interacting with the user agent. To do so, run the following steps:

    1. Let |currentEvent:Event| be {{Window}}.{{Window/event}}.
    2. If |currentEvent| is {{undefined}}, return a promise [=reject|rejected=] with a {{DOMException}} object whose {{DOMException/name}} attribute has the value {{InvalidStateError}}.
    3. If |currentEvent|.{{Event/isTrusted}} is false, return a promise [=reject|rejected=] with a {{DOMException}} object whose {{DOMException/name}} attribute has the value {{InvalidStateError}}.
    4. If |currentEvent|.{{Event/type}} is not in permitted event types for setZoomLevel, return a promise [=reject|rejected=] with a {{DOMException}} object whose {{DOMException/name}} attribute has the value {{InvalidStateError}}.

      It follows from these steps that {{CaptureController/setZoomLevel()}} is only callable with [=transient activation=], because permitted event types for setZoomLevel only contains event types that confer this activation.

      In fact, our API shape implies a stronger guarantee - whereas [=transient activation=] persists for several seconds after the user action, the API shape here limits {{CaptureController/setZoomLevel()}} to being called immediately following the user's action.

  6. Let |targetZoomLevel| be the method's first argument.
  7. If |targetZoomLevel| is not included in the user agent's set of [=supported zoom levels=] for |surfaceType|, return a promise [=reject|rejected=] with a {{DOMException}} object whose {{DOMException/name}} attribute has the value {{InvalidStateError}}.
  8. Let |P| be a new {{Promise}}.
  9. Run the following steps [=in parallel=]:

    1. [=Request permission to use=] a {{PermissionDescriptor}} with its {{PermissionDescriptor/name}} member set to "captured-surface-control". If the result of the request is {{PermissionState/"denied"}}, [=reject=] |P| with a new {{DOMException}} object whose {{DOMException/name}} is {{NotAllowedError}} and abort these steps.
    2. Set [=this=].[[\Source]]'s [=zoom level=] to |targetZoomLevel|.
    3. [=Resolve=] |P|.
  10. Return |P|.
oncapturedzoomlevelchange

The user agent MUST fire a blank event on this {{EventHandler}} whenever [=this=].[[\Source]]'s [=zoom level=] changes.

Examples of causes include:

  • The user interacted with the user agent to change the zoom level of a captured tab.
  • The capturing application called {{CaptureController/setZoomLevel()}}.
  • The user changed the shared [=display surface=], choosing one which has a different [=zoom level=].

Scroll

Scrolling APIs

        partial interface CaptureController {
          constructor();
          Promise<undefined> forwardWheel(HTMLElement element);
        };
      
constructor

{{CaptureController}}'s constructor is extended to also define and initialize the following internal slots:

Internal Slot Initial value
[[\forwardWheelElement]] null
[[\forwardWheelEventListener]] null
forwardWheel()

This method allows applications to automatically forward wheel events from an {{HTMLElement}} to the viewport of a captured [=display surface=].

When invoked, the user agent MUST run the following steps:

  1. If [=this=] is not [=actively capturing=], return a promise [=reject|rejected=] with a {{DOMException}} object whose {{DOMException/name}} attribute has the value {{InvalidStateError}}.
  2. If [=this=] [=is self-capturing=], return a promise [=reject|rejected=] with a {{DOMException}} object whose {{DOMException/name}} attribute has the value {{InvalidStateError}}.
  3. Let |surfaceType| be [=this=].[[\DisplaySurfaceType]].
  4. If |surfaceType| is not a [=supported display surface type=], return a promise [=reject|rejected=] with a {{DOMException}} object whose {{DOMException/name}} attribute has the value {{NotSupportedError}}.
  5. Let |element| be the method's first argument.
  6. Let |P| be a new {{Promise}}.
  7. Run the following steps [=in parallel=]:
    1. [=Get the current permission state=] of "captured-surface-control". If the result is NOT {{PermissionState/"granted"}}, and the [=relevant global object=] does NOT have [=transient activation=], return a promise [=reject|rejected=] with a {{DOMException}} object whose {{DOMException/name}} attribute has the value {{InvalidStateError}}.

      This step ensures that on the one hand, permission prompts are not be shown without [=transient activation=], while on the one hand, if the permission is already {{PermissionState/"granted"}}, {{CaptureController/forwardWheel()}} may be called immediately after {{MediaDevices/getDisplayMedia()}} resolves, even if the [=transient activation=] that permitted the call to {{CaptureController/forwardWheel()}} has since expired.

    2. [=Request permission to use=] a {{PermissionDescriptor}} with its {{PermissionDescriptor/name}} member set to "captured-surface-control". If the result of the request is {{PermissionState/"denied"}}, [=reject=] |P| with a new {{DOMException}} object whose {{DOMException/name}} is {{NotAllowedError}} and abort these steps.
    3. If [=this=].{{CaptureController/[[forwardWheelElement]]}} is not null, [=remove an event listener=] with [=this=].{{CaptureController/[[forwardWheelElement]]}} as |eventTarget| and [=this=].{{CaptureController/[[forwardWheelEventListener]]}} as |listener|.
    4. Set [=this=].{{CaptureController/[[forwardWheelEventListener]]}} to null.
    5. Set [=this=].{{CaptureController/[[forwardWheelElement]]}} to |element|.
    6. If [=this=].{{CaptureController/[[forwardWheelElement]]}} is not null:
      1. Set [=this=].{{CaptureController/[[forwardWheelEventListener]]}} to an [=event listener=] defined as follows:
        type
        wheel
        [=event listener/callback=]
        The result of creating a new Web IDL {{EventListener}} instance representing a reference to a function of one argument of type {{Event}} |event|. This function executes the [=forward wheel event algorithm=] given [=this=] and |event|.
      2. [=Add an event listener=] with [=this=].{{CaptureController/[[forwardWheelElement]]}} as |eventTarget| and [=this=].{{CaptureController/[[forwardWheelEventListener]]}} as |listener|.
    7. [=Resolve=] |P|.
  8. Return |P|.

Subroutines

Subroutine: Actively capturing

To determine if a {{CaptureController}} |controller| is actively capturing, run the following steps:

  1. Let |source| be |controller|.[[\Source]].
  2. If |source| is null, return false.
  3. If |source| has been stopped, return false.
  4. Return true.

Subroutine: Is self-capturing

To determine if a {{CaptureController}} |controller| is is self-capturing, run the following steps:

  1. If |controller| is not [=actively capturing=], return false.
  2. If |controller|.[[\Source]] is a [=display surface=] of type [=display surface/browser=], and represents the [=relevant global object=]'s [=associated `Document`=], return true.
  3. Return false.

Subroutine: Supported display surface type

To determine if a [=display surface=] |surfaceType| is supported display surface type, run the following steps:

  1. If |surfaceType| is [=display surface/browser=], return true.
  2. Return false.

Whether [=display surface/window=] should be supported is under discussion.

Subroutine: Forward wheel event

The forward wheel event algorithm takes a {{CaptureController}} |controller| and a {{WheelEvent}} |event|, and runs the following steps:

  1. If |controller| is not [=actively capturing=], abort these steps.
  2. If [=this=] [=is self-capturing=], abort these steps.
  3. Let |surfaceType| be |controller|.[[\DisplaySurfaceType]].
  4. If |surfaceType| is not a [=supported display surface type=], abort these steps.
  5. Run the following steps [=in parallel=]:
    1. [=Get the current permission state=] of "captured-surface-control". If the result is NOT {{PermissionState/"granted"}}, abort these steps.
    2. If |event|.{{Event/isTrusted}} is false, abort these steps.
    3. Let [|scaledX|, |scaledY|] be the result of the [=scale element coordinates algorithm=] on [|event|.{{MouseEvent/offsetX}}, |event|.{{MouseEvent/offsetY}}] and [=this=].{{CaptureController/[[forwardWheelElement]]}}.
    4. [=Fire an event=] named `"wheel"` using {{WheelEvent}} with the {{MouseEvent/x}} attribute initialized to |scaledX|, the {{MouseEvent/y}} attribute initialized to |scaledY|, the {{WheelEvent/deltaX}} attribute initialized to |event|.{{WheelEvent/deltaX}} and the {{WheelEvent/deltaY}} attribute initialized to |event|.{{WheelEvent/deltaY}}, at |controller|.[[\Source]]'s viewport.

Subroutine: Scale element coordinates

The scale element coordinates algorithm takes {{double}} coordinates [|x|, |y|] and a {{CaptureController}} |controller|, and run the following steps:

  1. Let |scaleFactorX| be (|x| / |controller|.{{CaptureController/[[forwardWheelElement]]}}.{{Element/getBoundingClientRect()}}.{{DOMRect/width}}).
  2. Let |scaleFactorX| be (|x| / |controller|.{{CaptureController/[[forwardWheelElement]]}}.{{Element/getBoundingClientRect()}}.{{DOMRect/height}}).
  3. Let |surfaceWidth| be |controller|.[[\Source]]'s viewport's width.
  4. Let |surfaceHeight| be |controller|.[[\Source]]'s viewport's height.
  5. Let |scaledX| be (|scaleFactorX| * |surfaceWidth|).
  6. Let |scaledY| be (|scaleFactorY| * |surfaceHeight|).
  7. Return [|scaledX|, |scaledY|].

This subroutine assumes that |controller| is [=actively capturing=].

Privacy and Security Considerations

The API surfaces introduced in this specification allow a capturing application limited control over a captured application. These APIs allow the capturing application to gain access to additional pixels in the captured application. This specification employs multiple means to ensure that new capabilities are used in accordance with the user's intentions. Among these means:

Zoom-setting: Limitation to specific interactions

{{CaptureController/setZoomLevel()}} is only callable from event handlers of specific event types - the permitted event types for setZoomLevel. These are events dispatched directly by the user agent, triggered by user interaction. This specification intentionally excludes from this set such events as "mousemove", which users are liable to trigger inadvertently.

Scrolling: Limitation to specific interactions

The shape of {{CaptureController/forwardWheel()}} is intentionally chosen to limit the capturing application's control. The application designates a specific element which, when the user scrolls over it, the corresponding wheel events are forwarded to the captured application.

Limiting element types

This specification does not limit the type of {{Element}} for which either {{CaptureController/setZoomLevel()}} or {{CaptureController/forwardWheel()}} work. Such a limitation would accomplish nothing, because malicious applications could always overlay transparent permitted {{Element}} types on top of visible non-permitted {{Element}}s, thereby bypassing this restriction.

The limitation of interaction types is sufficient. This is accomplished by {{CaptureController/forwardWheel()}} through its shape, and by {{CaptureController/setZoomLevel()}} through its gating on event types.