The Capture-Handle Actions Mechanism

Abstract

This document proposes a mechanism by which an application APP can opt-in to exposing certain information with another application CAPTR, if CAPTR is screen-capturing the tab in which APP is running. It describes a mechanism for tab capture only.

Problem Description

Consider a web-application, running in one tab, which we’ll name "main_app." Assume main_app calls getDisplayMedia and the user chooses to share another tab, where an application is running which we’ll call "captured_app."

Note that:

main_app does not know what it is capturing.
captured_app does not know that it is being captured; let alone by whom.

Both of these traits are desirable in the general case, but there exists specific use cases where the user of main_app would benefit if main_app had available to it a limited set of standard instructions that captured_app has opted into receiving.

We wish to enable the specific use cases while keeping the general case as it was before.

Use-case #1: Driving Presentations from Video Conferencing Apps

Consider a collaborating presentation software and video-conferencing software. Assume the user is in a VC session. The user starts sharing a presentation. Both applications are interested in letting the VC app discover that it is capturing a slides session, so that the VC application will be able to expose controls to the user for flipping through slides. When the user clicks those controls, the VC app will be able to send messages to the presentation app, requesting that it do such things as flip through slides, etc.

The Capture-Handle Actions Mechanism

The capture-handle actions mechanism consists of two parts - one on the captured side, one on the capturing side.

Captured applications opt-in by registering support for standard actions they handle by calling {{MediaDevices/setSupportedCaptureActions}}.
Capturing applications may trigger these actions using {{MediaStreamTrack/sendCaptureAction}}.

There is disagreement on whether actions should be specified here or in a separate document.

Captured Side for Actions

Applications in top-level documents can declare the [=capture actions=] they support, if any. They would typically do so before even knowing if they are being captured. The intended use is for an application to expect to receive these actions from capturer applications wishing to control the progression of the captured session, in response to interaction with the user. Supported actions are declared by calling {{MediaDevices/setSupportedCaptureActions}} with an array of the names of actions the application is prepared to respond to.

Registering and responding to capture actions

{{MediaDevices}} is extended with a method - {{MediaDevices/setSupportedCaptureActions}} - which accepts an array of {{DOMString}}s. By calling this method, an application registers with the user agent a set of zero or more [=capture actions=] it wishes to respond to.

Capture actions are values defined in {{CaptureAction}}. They are meant to be interpreted as instructions from the capturing application to control the advancement of the presentation of the captured session, however the captured application wishes to define this. The intent is to support capturer applications implementing interactive controls for these actions, whose sending requires [=transient activation=] and [=consume user activation=].

            partial interface MediaDevices {
              undefined setSupportedCaptureActions(sequence<DOMString> actions);
              attribute EventHandler oncaptureaction;
            };

            enum CaptureAction {
              "next",
              "previous",
              "first",
              "last"
            };

setSupportedCaptureActions

When this method is invoked, the user agent MUST run the following steps:

If the [=relevant global object=]'s [=associated `Document`=] is either not [=Document/fully active=] or its [=browsing context=] is not a [=top-level browsing context=], then throw {{InvalidAccessError}}.
Let |actions| be the method's first argument.
If |actions| is non-empty, and this method was previously called with a non-empty array on [=this=] {{MediaDevices}} object, then throw {{InvalidStateError}}.
Remove from |actions| any value not found in {{CaptureAction}}.
Remove from |actions| any duplicates.
Set [=this=]'s {{MediaDevices/[[RegisteredCaptureActions]]}} to |actions|.
return `undefined` and run the remaining step [=in parallel=].
If this document is currently being captured as part of a browser display surface, then for each capturer of that surface, queue a task on that capturer's task-list to set all associated video {{MediaStreamTrack}}s' {{MediaDevices/[[AvailableCaptureActions]]}} to |actions|.

oncaptureaction of type {{EventHandler}}: The event type of this event handler is `"captureaction"`.

When {{MediaDevices}} is created, give it a [[\RegisteredCaptureActions]] internal slot, initialized to an empty list.

Capture Action Event

CaptureActionEvent

This event is fired on the captured application's {{MediaDevices}} object whenever an action it registered with {{MediaDevices/setSupportedCaptureActions}} has been triggered. This lets the application respond by executing its implementation of this action.

              [Exposed=Window]
              interface CaptureActionEvent : Event {
                constructor(optional CaptureActionEventInit init = {});
                readonly attribute CaptureAction action;
              };

action: The {{CaptureAction}} that was triggered.

CaptureActionEventInit

              dictionary CaptureActionEventInit : EventInit {
                DOMString action;
              };

action: The {{CaptureAction}} to initialize the event with.

Capturing Side for Actions

Capturing applications can enumerate available [=capture actions=] that are supported on the video track they have obtained, by using {{MediaStreamTrack/getSupportedCaptureActions}}, and can trigger those actions by using {{MediaStreamTrack/sendCaptureAction}}.

Enumerating supported actions and triggering them

When a {{MediaStreamTrack}} is a video track derived from screen-capture of a browser display surface, {{MediaStreamTrack/getSupportedCaptureActions}} returns the set of available [=capture actions=], if any, supported by the captured application associated with this video track.

            partial interface MediaStreamTrack {
              sequence<DOMString> getSupportedCaptureActions();
              Promise<undefined> sendCaptureAction(CaptureAction action);
            };

getSupportedCaptureActions

When this method is invoked, the user agent MUST return [=this=]' {{MediaDevices/[[AvailableCaptureActions]]}} if defined, or `[]` if not defined.

sendCaptureAction

When this method is invoked, the user agent MUST run the following steps:

If the [=relevant global object=] of [=this=] does not have [=transient activation=], return a promise [=rejected=] with {{InvalidStateError}}.
[=Consume user activation=].
Let |action| be the method's first argument.
If |action| is not in [=this=]' {{MediaDevices/[[AvailableCaptureActions]]}}, return a promise [=rejected=] with {{NotFoundError}}.
Let |p| be a new promise.
Run the following steps [=in parallel=]:
1. Queue a task on the task-list of the captured browser display surface's [=top-level browsing context=]'s [=active document=] to run the following steps:
  1. Let |target| be the the [=relevant global object=]'s [=associated `Document`=]'s associated navigator's {{MediaDevices}} object.
  2. If |action| is not in |target|'s {{MediaDevices/[[RegisteredCaptureActions]]}}, abort these steps.
  3. [=Fire an event=] named `"captureaction"`, using a {{CaptureActionEvent}} with {{CaptureActionEventInit/action}} set to |action|, at |target|.
2. Wait for the event to have been fired.
3. Resolve |p|.
Return |p|.

When a video {{MediaStreamTrack}} is created as part of the getDisplayMedia algorithm, whose source is a browser display surface, give it an [[\AvailableCaptureActions]] internal slot, initialized to the captured browser display surface's [=top-level browsing context=]'s [=Browsing context/active window=]'s associated navigator's {{MediaDevices}} object's {{MediaDevices/[[RegisteredCaptureActions]]}}.

While capture of a browser display surface is occurring, whenever that surface's [=top-level browsing context=] is navigated, then for each capturer of that surface, queue a task on that capturer's task-list to set all associated video {{MediaStreamTrack}}s' {{MediaDevices/[[AvailableCaptureActions]]}} to `[]`.