This document introduces an API for cropping a video track derived from display-capture of the current tab.
This document uses the definition of the following concepts from [[SCREEN-CAPTURE]]: display-surface and browser [=display-surface=].
This specification defines self-capture as the capture of a [=browser=] [=display-surface=] that is the rendered form of the [=top-level browsing context=] of the [=associated Document=] of the {{MediaDevices}} object from which the application initiated the capture session. A self-capture video track is a {{MediaStreamTrack}} sourced by self-capture.
Complex applications often comprise multiple [=documents=] in distinct [=iframes=], all
displayed within the same [=browsing context=]. Consider such an application. Assume one
of these documents, CAPTURING-DOC uses {{MediaDevices/getDisplayMedia()}} or
getViewportMedia
to capture the entire current [=browsing context=]. If this
document then wishes to crop the video track to the coordinates of some sub-section
CAPTURE-TARGET of a collaborating document CAPTURED-DOC, how can
CAPTURING-DOC do so performantly and reliably? Recall especially that changes
in layout due to scrolling, zooming or window resizing present additional challenges.
Consider a combo-application consisting of two major parts hosted in different iframes
within the same tab - a video-conferencing application and a productivity-suite
application. Assume the video-conferencing uses existing/upcoming APIs such as
{{MediaDevices/getDisplayMedia()}} and/or getViewportMedia
and captures the
entire tab. Now it needs to crop away everything other than a particular section of the
productivity-suite. It needs to crop away its own video-conferencing content, any speaker
notes and other private and/or irrelevant content in the productivity-suite, before
transmitting the resulting cropped video remotely.
Moreover, consider that it is likely that the two collaborating applications are cross-origin from each other. They can post messages, but all communication is asynchronous, and it's easier and more performant if information is transmitted sparingly between them. That precludes solutions involving posting of entire frames, as well as solutions which are too slow to react to changes in layout (e.g. scrolling, zooming and window-size changes).
It is worthwhile to note that most applications would likey prefer to use
getViewportMedia
in such scenarios. However, as of this writing,
getViewportMedia
is still unspecified and unimplemented. It will have
non-trivial requirements whose adoption will take some time and effort. As such, many
applications will likely use a combination of {{MediaDevices/getDisplayMedia()}} and
Region Capture for some time to come.
The combination of {{MediaDevices/getDisplayMedia()}} and Region Capture is also useful for applications that allow the users to choose whichever [=display-surface=] they wish, but offer distinct functionality depending on whether users choose to [=self-capture=] or, conversely, choose to capture a window or monitor. Such applications would only succeed in using Region Capture if the user chose to [=self-capture=]; otherwise, the attempt to apply cropping would be a no-op.
As presently defined, {{BrowserCaptureMediaStreamTrack/cropTo}}(cropTarget) returns a [=rejected=] {{Promise}} if the cropTarget is not associated with an {{Element}} within either the current [=top-level browsing context=] or any of its descendant browsing contexts. That means that all of the mechanisms introduced by this document are only relevant for [=self-capture=]. An immediate corollary is that navigation of the (shared) [=top-level browsing context=] breaks off the capture, and therefore also the cropping session.
The region-capture mechanism comprises two parts:
We define two crop-states for video tracks - cropped and uncropped. Tracks start out [=uncropped=], and may turn to [=cropped=] when {{BrowserCaptureMediaStreamTrack/cropTo}} is successfully called on them.
The [=cropping mechanism=] presented in this document ({{BrowserCaptureMediaStreamTrack/cropTo}}) relies on [=Crop-session Target=] rather than on direct node references. This serves a dual purpose.
It allows cropping by one document to coordinates specified in another document.
[=Tagging=] an {{Element}} as a potential crop-target allows the user agent to avoid unnecessary work on all other elements, like the calculation of bounding boxes and sending such coordinates cross-process.
CropTarget is an intentionally empty, opaque identifier that exposes nothing. Its sole purpose is to be handed to {{BrowserCaptureMediaStreamTrack/cropTo}} as input.
[Exposed=(Window,Worker), Serializable] interface CropTarget { // Intentionally empty; just an opaque identifier. };
There is no consensus yet on the name for {{CropTarget}}. This is under discussion in issue #18.
To create a CropTarget with element as input, run the following steps:
Let cropTarget be a new object of type {{CropTarget}}.
Let weakRef be a weak reference to element.
[=Create a CropTarget|Create=] cropTarget.[[\Element]] initialized to weakRef.
cropTarget keeps a weak reference to the element it represents. In other words, cropTarget will not prevent garbage collection of its element.
{{CropTarget}} objects are serializable. The [=serialization steps=], given value, serialized, and a boolean forStorage, are:
If forStorage is true
, throw with a new {{DOMException}}
object whose {{DOMException/name}} attribute has the value {{"DataCloneError"}}.
Set serialized.[[\CropTargetElement]] to value.[[\Element]].
The [=deserialization steps=], given serialized and value are:
Set value.[[\Element]] to serialized.[[\CropTargetElement]].
partial interface MediaDevices { Promise<CropTarget> produceCropTarget(Element element); };
Calling {{MediaDevices/produceCropTarget}} on an {{Element}} of a supported type associates that {{Element}} with a {{CropTarget}}. This {{CropTarget}} may be used as input to {{BrowserCaptureMediaStreamTrack/cropTo}}. We define a valid CropTarget as one returned by a previous call to {{MediaDevices/produceCropTarget()}} in the current [=top-level browsing context=] or any of its descendant browsing contexts.
When {{MediaDevices/produceCropTarget}} is called on a given element, the user agent [=create a CropTarget|creates a CropTarget=] with element as input. The user agent MUST return a {{Promise}} p. The user agent MUST resolve p only after it has finished all the necessary internal propagation of state associated with the new {{CropTarget}}, at which point the user agent MUST be ready to receive the new {{CropTarget}} as a valid parameter to {{BrowserCaptureMediaStreamTrack/cropTo}}.
When cloning an {{Element}} on which {{MediaDevices/produceCropTarget}} was previously called, the clone is not associated with any {{CropTarget}}. If {{MediaDevices/produceCropTarget}} is later called on the clone, a new {{CropTarget}} will be assigned to it.
There is no consensus yet on the following issues:
Recall that, as per [[SCREEN-CAPTURE]], when {{MediaDevices/getDisplayMedia()}} is called, it returns a {{Promise}}<{{MediaStream}}>, and that this {{MediaStream}} contains exactly one video track, whose type is {{MediaStreamTrack}}.
We specify that if the user chooses to capture a [=browser=] [=display-surface=], the user agent MUST instantiate the video track as either {{MediaStreamTrack}}, or as some sub-class of {{MediaStreamTrack}}, and that {{BrowserCaptureMediaStreamTrack/cropTo}} MUST be exposed on this track. For simplicity's sake, this document assumes that a subclass called {{BrowserCaptureMediaStreamTrack}} is used by the user agent.
The track MUST be initially [=uncropped=].
[Exposed = Window] interface BrowserCaptureMediaStreamTrack : MediaStreamTrack { Promise<undefined> cropTo(CropTarget? cropTarget); BrowserCaptureMediaStreamTrack clone(); };
Calls to this method instruct the user agent to start/stop cropping a [=self-capture video track=] to the bounding client rectangle of cropTarget.[[\Element]]. Since the track is restricted to the visible viewport of the [=display-surface=], the captured area will be the intersection of the visible viewport and the element bounding client rectangle. Whenever {{BrowserCaptureMediaStreamTrack/cropTo}} is invoked, the user agent MUST execute the following algorithm:
The user agent MUST validate cropTarget according to [=this=] track's current [=crop-state=].
If the user agent does not accept cropTarget, return a {{Promise}} [=rejected=] with an {{UnknownError}}.
Run the following steps in parallel:
If cropTarget is either {{undefined}} or a [=valid CropTarget=], the user agent MUST update [=this=] video track's [=crop-state=] according to cropTarget:
Call the track's state before this method invocation PRE-STATE, and after this method invocation POST-STATE. The user agent MUST resolve p when it is guaranteed that no more frames cropped (or uncropped) according to PRE-STATE have been delivered to the application, and that any additional frames delivered to the application will therefore be cropped (or uncropped) according to either POST-STATE or a later state.
The timing of the cropTo promise resolution and the timing of the actual cropping of video frames is observable to JavaScript through MediaStreamTrack transforms. It is expected that the first newly cropped video frame will be enqueued on the MediaStreamTrack ReadableStream just after the cropTo promise is resolved.
When a {{BrowserCaptureMediaStreamTrack}} is cloned, the user agent MUST produce a track which is initially [=uncropped=], regardless of the [=crop-state=] of the original track.
We define an {{Element}} for which a {{CropTarget}} was produced (through a call to {{MediaDevices/produceCropTarget}}) as a potential crop-target.
We define a [=potential crop-target=] which is targeted by a successful call to {{BrowserCaptureMediaStreamTrack/cropTo}} as the crop-session target.
Consider a frame produced on a [=cropped=] video track. The user agent calculates the intersection of (i) the [=top-level browsing context=]'s viewport and (ii) the bounding box of all pixels belonging to the [=crop-session target=]. This intersection is defined as the crop-session target's coordinates for that frame.
Consider a video track VT [=cropped=] to a given [=crop-session target=] TARGET. We define the behavior of the crop-session of the VT in the face of changes undergone by TARGET.
We define as an empty crop-session target the case where a [=crop-session target=] is attached to the DOM, yet consists of zero pixels which are drawn inside of the [=top-level browsing context|top-level browsing context's=] viewport.
Some examples of when this could happen include:
The user agent MUST NOT produce new frames on tracks with an [=empty crop-session target=]. For such a track, the user agent MUST resume the production of frames if the track either become [=uncropped=], or if its [=crop-session target=] stops being [=empty crop-session target|empty=].
We define as disconnected crop-session target a [=crop-session target=] that had been detached from the DOM.
The difference between an [=empty crop-session target=] and a [=disconnected crop-session target=], is that a [=disconnected crop-session target|disconnected=] one may become unreachable, in which case it would not produce any new frames. Nevertheless, the user agent MUST treat a [=disconnected crop-session target=] the same way it treats an [=empty crop-session target=]. The application may call {{BrowserCaptureMediaStreamTrack/cropTo}} on the track with either {{undefined}} or a new {{CropTarget}}, thereby allowing the production of frames on the track to be resumed.
Code in the capture-target:
const mainContentArea = navigator.getElementById('mainContentArea'); const cropTarget = await navigator.mediaDevices.produceCropTarget(mainContentArea); sendCropTarget(cropTarget); function sendCropTarget(cropTarget) { // Can send the crop-target to another document in this tab // using postMessage() or using any other means. // Possibly there is no other document, and this is just consumed locally. }
Code in the capturing-document:
async function startCroppedCapture(cropTarget) { const stream = await navigator.mediaDevices.getDisplayMedia(); const [track] = stream.getVideoTracks(); if (!!track.cropTo) { handleError(stream); return; } await track.cropTo(cropTarget); transmitVideoRemotely(track); }