Media Capture and Streams Extensions

In-browser camera and microphone picker

The existing {{MediaDevices/enumerateDevices()}} function exposes camera and microphone {{MediaDeviceInfo/label}}s to let applications build in-content user interfaces for camera and microphone selection. Applications have had to do this because {{MediaDevices/getUserMedia()}} did not offer a web compatible in-agent device picker. This specification aims to rectify that.

Due to the significant fingerprinting vector caused by device {{MediaDeviceInfo/label}}s, and the well-established nature of the existing APIs, the scope of this particular effort is limited to removing {{MediaDeviceInfo/label}}, leaving the overall constraints-based model intact. This helps ensure a migration path more viable than to a less-powerful API.

This specification augments the existing {{MediaDevices/getUserMedia()}} function instead of introducing a new less-powerful API to compete with it, for that reason as well.

getUserMedia "user-chooses" semantics

This specification introduces slightly altered semantics to the {{MediaDevices/getUserMedia()}} function called "user-chooses" that guarantee a picker will be shown to the user in cases where the user agent would otherwise choose for the user (that is: when application constraints do not narrow down the choices to a single device). This is orthogonal to permission, and offers a better and more consistent user experience across applications and user agents.

Unfortunately, since the "user-chooses" semantics may produce user agent prompts at different times and in different situations compared to the old semantics, they are somewhat incompatible with expectations in some existing web applications that tend to call {{MediaDevices/getUserMedia()}} repeatedly and lazily instead of using e.g. stream.clone().

Web compatibility and migration

User agents are encouraged to provide the new semantics as opt-in initially for web compatibility. User agents MUST deprecate (remove) {{MediaDeviceInfo/label}} from {{MediaDeviceInfo}} over time, though specific migration strategies are left to user agents. User agents SHOULD migrate to offering the new semantics by default (opt-out) over time.

Since the constraints-model remains intact, web compatibility problems are expected to be limited to:

Sites that never migrated show e.g. "Camera 1", "Camera 2" etc. instead of descriptive device labels
Sites with no device management strategy provoke a picker in the user agent every visit for users with more than a singular choice of camera or microphone (a feature of sorts)

MediaDevices Interface Extensions

partial interface MediaDevices {
  readonly attribute GetUserMediaSemantics defaultSemantics;
};

Attributes

defaultSemantics of type GetUserMediaSemantics, readonly

The default semantics of {{MediaDevices/getUserMedia()}} in this user agent.

User agents SHOULD default to "browser-chooses" for backwards compatibility, until a transition plan has been enacted where a majority of user agents collectively switch their defaults to "user-chooses" for improved user privacy, and usage metrics suggest this transition is feasible without major breakage.

MediaStreamConstraints dictionary extensions

partial dictionary MediaStreamConstraints {
  GetUserMediaSemantics semantics;
};

Dictionary {{MediaStreamConstraints}} Members

semantics of type {{GetUserMediaSemantics}}: In cases where the specified constraints do not narrow multiple choices between devices down to one per kind, specifies how the final determination of which devices to pick from the remaining choices MUST be made. If not specified, then the defaultSemantics are used.

GetUserMediaSemantics enum

enum GetUserMediaSemantics {
  "browser-chooses",
  "user-chooses"
};

GetUserMediaSemantics Enumeration description
`browser-chooses`	When application-specified constraints do not narrow multiple choices between devices down to one per kind, the user agent is allowed to make the final determination between the remaining choices.
`user-chooses`	When application-specified constraints do not narrow multiple choices between devices down to one per kind, the user agent MUST prompt the user to choose between the remaining choices, even if the application already has permission to some or all of them.

Algorithms

When the {{MediaDevices/getUserMedia()}} method is invoked, run the following steps before invoking the {{MediaDevices/getUserMedia()}} algorithm:

Let mediaDevices be the object on which this method was invoked.
Let constraints be the method's first argument.
Let semanticsPresent be true if constraints.semantics [= map/exists =], otherwise false.
Let semantics be constraints.semantics if it [= map/exists =], or the value of mediaDevices.defaultSemantics otherwise.
Replace step 6.5.1. of the {{MediaDevices/getUserMedia()}} algorithm in its entirety with the following two steps:
1. Let descriptor be a {{PermissionDescriptor}} with its {{PermissionDescriptor/name}} member set to the permission name associated with kind (e.g. [="camera"=] for "video", [="microphone"=] for "audio").
2. If the number of unique devices sourcing tracks of media type kind in candidateSet is greater than 1 and semantics is "user-chooses", then prompt the user to choose a device with descriptor, resulting in provided media. Otherwise, request permission to use a device with descriptor, while considering all devices being attached to a live and same-permission MediaStreamTrack in the current [=browsing context=] to mean having permission status {{PermissionState/"granted"}}, resulting in provided media.
  
  Same-permission in this context means a {{MediaStreamTrack}} that required the same level of permission to obtain as what is being requested.
  
  When asking the user’s permission, the user agent MUST disclose whether permission will be granted only to the device chosen, or to all devices of that kind.
  
  Let track be the provided media, which MUST be precisely one track of type kind from finalSet. If semantics is "browser-chooses" then the decision of which track to choose from finalSet is up to the User Agent, which MAY use the value of the computed "fitness distance" from the SelectSettings algorithm, the value of semanticsPresent, or any other internally-available information about the devices, as inputs to its decision. If semantics is "user-chooses", and the application has not narrowed down the choices to one, then the user agent MUST ask the user to make the final selection.
  
  Once selected, the source of the {{MediaStreamTrack}} MUST NOT change.
  
  User Agents are encouraged to default to or present a default choice based primarily on fitness distance, and secondarily on the user's primary or system default device for kind (when possible). User Agents MAY allow users to use any media source, including pre-recorded media files.

Examples

This example shows a setup with a start button and a camera selector using the new semantics (microphone is not shown for brievity but is equivalent).

<button id="start">Start</button>
<button id="chosenCamera" disabled>Camera: none</button>
<script>

let cameraTrack = null;

start.onclick = async () => {
  try {
    const stream = await navigator.mediaDevices.getUserMedia({
      video: {deviceId: localStorage.cameraId}
    });
    setCameraTrack(stream.getVideoTracks()[0]);
  } catch (err) {
    console.error(err);
  }
}

chosenCamera.onclick = async () => {
  try {
    const stream = await navigator.mediaDevices.getUserMedia({
      video: true,
      semantics: "user-chooses"
    });
    setCameraTrack(stream.getVideoTracks()[0]);
  } catch (err) {
    console.error(err);
  }
}

function setCameraTrack(track) {
  cameraTrack = track;
  const {deviceId, label} = track.getSettings();
  localStorage.cameraId = deviceId;
  chosenCamera.innerText = `Camera: ${label}`;
  chosenCamera.disabled = false;
}
</script>

MediaStreamTrack extensions

Transferable MediaStreamTrack

A {{MediaStreamTrack}} is a transferable object. This allows manipulating real-time media outside the context it was requested or created in, for instance in workers or third-party iframes.

To preserve the existing privacy and security infrastructure, in particular for capture tracks, the track source lifetime management remains tied to the context that created it. The transfer algorithm MUST ensure the following behaviors:

The context named originalContext that created a track named originalTrack remains in control of the originalTrack source, named trackSource, even when originalTrack is transferred into transferredTrack.
In particular, originalContext remains the proxy to privacy indicators of trackSource. transferredTrack or any of its clones are considered as tracks using trackSource as if they were tracks created in and controlled by originalContext.
When originalContext goes away, trackSource gets ended, thus transferredTrack gets ended.
When originalContext would have muted/unmuted originalTrack, transferredTrack gets muted/unmuted.
If transferredTrack is cloned in transferredTrackClone, transferredTrackClone is tied to trackSource. It is not tied to originalTrack in any way.
If transferredTrack is transferred into transferredAgainTrack, transferredAgainTrack is tied to trackSource. It is not tied to transferredTrack or originalTrack in any way.

The WebIDL changes to make the track transferable are the following:

[Exposed=(Window,Worker), Transferable]
partial interface MediaStreamTrack {
};

At creation of a {{MediaStreamTrack}} object, called track, run the following steps:

Initialize track.`[[IsDetached]]` to false.

The {{MediaStreamTrack}} transfer steps, given value and dataHolder, are:

If value.`[[IsDetached]]` is true, throw a "DataCloneError" DOMException.
Set dataHolder.`[[id]]` to value.{{MediaStreamTrack/id}}.
Set dataHolder.`[[kind]]` to value.{{MediaStreamTrack/kind}}.
Set dataHolder.`[[label]]` to value.{{MediaStreamTrack/label}}.
Set dataHolder.`[[readyState]]` to value.{{MediaStreamTrack/readyState}}.
Set dataHolder.`[[enabled]]` to value.{{MediaStreamTrack/enabled}}.
Set dataHolder.`[[muted]]` to value.{{MediaStreamTrack/muted}}.
Set dataHolder.`[[source]]` to value underlying source.
Set dataHolder.`[[constraints]]` to value active constraints.
Set dataHolder.`[[contentHint]]` to value application-set content hint.
Set value.`[[IsDetached]]` to true.
Set value.[[\ReadyState]] to {{MediaStreamTrackState/"ended"}} (without stopping the underlying source or firing an `ended` event).

{{MediaStreamTrack}} transfer-receiving steps, given dataHolder and track, are:

Initialize track.{{MediaStreamTrack/id}} to dataHolder.`[[id]]`.
Initialize track.{{MediaStreamTrack/kind}} to dataHolder.`[[kind]]`.
Initialize track.{{MediaStreamTrack/label}} to dataHolder.`[[label]]`.
Initialize track.{{MediaStreamTrack/readyState}} to dataHolder.`[[readyState]]`.
Initialize track.{{MediaStreamTrack/enabled}} to dataHolder.`[[enabled]]`.
Initialize track.{{MediaStreamTrack/muted}} to dataHolder.`[[muted]]`.
Set track application-set content hint to dataHolder.`[[contentHint]]`.
[=Initialize the underlying source=] of track to dataHolder.`[[source]]`.
Set track's constraints to dataHolder.`[[constraints]]`.

The underlying source is supposed to be kept alive between the transfer and transfer-receiving steps, or as long as the data holder is alive. In a sense, between these steps, the data holder is attached to the underlying source as if it was a track.

MediaStreamTrack Statistics

On microphone audio tracks, frame counters allow the application to tell the ratio of audio that is delivered as one quality indicator and the latency metrics measure the input delay from capture to application.

On camera and screenshare video tracks, frame counters allow the application to tell what the frame rate is, which may be lower than the target {{MediaTrackSettings/frameRate}}. For example, if the track is sourced from a camera then the production of frames could be slowed down if it's dark or frames could be dropped if the system is CPU starved. This could impact the total number of frames produced by the source and impact how many frames are delivered, discarded or dropped for other reasons.

partial interface MediaStreamTrack {
  [SameObject] readonly attribute
      (MediaStreamTrackAudioStats or MediaStreamTrackVideoStats)? stats;
};

Let the {{MediaStreamTrack}} have a [[\Stats]] internal slot initialized it to null, unless otherwise specified below.

If the track's is of {{MediaStreamTrack/kind}} "audio", run the following steps:

If the {{MediaStreamTrack}} is sourced from getUserMedia(), initialize {{MediaStreamTrack/[[Stats]]}} to a new instance of {{MediaStreamTrackAudioStats}} set up to expose audio stats for this {{MediaStreamTrack}}.

If the track's is of {{MediaStreamTrack/kind}} "video", run the following steps:

If the {{MediaStreamTrack}} is sourced from getUserMedia() or getDisplayMedia(), initialize {{MediaStreamTrack/[[Stats]]}} to a new instance of {{MediaStreamTrackVideoStats}} set up to expose video stats for this {{MediaStreamTrack}}.

Attributes

stats of type ({{MediaStreamTrackAudioStats}} or {{MediaStreamTrackVideoStats}}), readonly

When this getter is called, the user agenst MUST run the following steps:

Let track be the {{MediaStreamTrack}} that this getter is called on.
Return track.{{MediaStreamTrack/[[Stats]]}}.

The MediaStreamTrackAudioStats interface

[Exposed=Window]
interface MediaStreamTrackAudioStats {
  readonly attribute unsigned long long deliveredFrames;
  readonly attribute DOMHighResTimeStamp deliveredFramesDuration;
  readonly attribute unsigned long long totalFrames;
  readonly attribute DOMHighResTimeStamp totalFramesDuration;
  readonly attribute DOMHighResTimeStamp latency;
  readonly attribute DOMHighResTimeStamp averageLatency;
  readonly attribute DOMHighResTimeStamp minimumLatency;
  readonly attribute DOMHighResTimeStamp maximumLatency;
  undefined resetLatency();
  [Default] object toJSON();
};

The following metrics lack Working Group consensus: {{MediaStreamTrackAudioStats/deliveredFrames}}, {{MediaStreamTrackAudioStats/deliveredFramesDuration}}, {{MediaStreamTrackAudioStats/totalFrames}} and {{MediaStreamTrackAudioStats/totalFramesDuration}}. See Issue #129.

The {{MediaStreamTrackAudioStats}} expose frame counters for the {{MediaStreamTrack}} that created it. For this track, the user agent is required to count each audio frame from its source as follows:

A frame is considered a delivered audio frame if it either was delivered to a sink or would have been delivered to a sink, if one was connected.
The delivered audio frames duration is the total duration of all [= delivered audio frames =]. This measurement is incremented at the same time as [= delivered audio frames =] and is measured in milliseconds.
An audio frame that is discarded because it cannot be delivered on time, or it cannot be delivered for any other reason, is considered dropped.
The dropped audio frames duration is the total duration of all [= dropped audio frames =]. This measurement is incremented at the same time as [= dropped audio frames =] and is measured in milliseconds.

If the track is unmuted and enabled, the counters increase as audio is produced by the capture device. If no audio is flowing, such as if the track is muted or disabled, then the counters do not increase.
Input latency is the time, in milliseconds, between the point in time an audio input device has acquired a signal and the time it is available for consumption, which may include buffering by the user agent.

The latest input latency is the latest available [= input latency =] as estimated between the track's input device and delivery to any of its sinks.

The user agent updates its estimates at sufficient frequency to allow monitoring. The latency is representative of the experienced delay, but is not necessarily an exact measurement of the last individual audio frame that was delivered.

A sink that consumes audio may add additional processing latency not included in this measurement, such as playout delay or encode time.

Every time the [= latest input latency =] measurement is updated, the user agent also updates its average input latency, minimum input latency and maximum input latency which are the average, minimum and maximum observed measurements since the last latency reset time.

Let the {{MediaStreamTrackAudioStats}} have internal slots [[\DeliveredFrames]], [[\DeliveredFramesDuration]], [[\DroppedFrames]], [[\DroppedFramesDuration]], [[\Latency]], [[\AverageLatency]], [[\MinimumLatency]] and [[\MaximumLatency]], initialized to 0.

Let the {{MediaStreamTrackAudioStats}} also have internal slots [[\LastTask]] and [[\LastExposureTime]], initialized to undefined.

The expose audio frame counters steps are the following:

Let task be the current [=task=].
If {{MediaStreamTrackAudioStats/[[LastTask]]}} is equal to task, abort these steps.
Set {{MediaStreamTrackAudioStats/[[LastTask]]}} to task.
Set {{MediaStreamTrackAudioStats/[[DeliveredFrames]]}} to [= delivered audio frames =], set {{MediaStreamTrackAudioStats/[[DeliveredFramesDuration]]}} to [= delivered audio frames duration =], set {{MediaStreamTrackAudioStats/[[DroppedFrames]]}} to [= dropped audio frames =], set {{MediaStreamTrackAudioStats/[[DroppedFramesDuration]]}} to [= dropped audio frames duration =], set {{MediaStreamTrackAudioStats/[[Latency]]}} to the [= latest input latency =], set {{MediaStreamTrackAudioStats/[[AverageLatency]]}} to the [= average input latency =], set {{MediaStreamTrackAudioStats/[[MinimumLatency]]}} to the [= minimum input latency =] and set {{MediaStreamTrackAudioStats/[[MaximumLatency]]}} to the [= maximum input latency =].

Set {{MediaStreamTrackAudioStats/[[LastExposureTime]]}} to reflect the time that these metrics were exposed.

Only updating these counters once per [=task=] preserves the run-to-completion semantics defined in [[API-DESIGN-PRINCIPLES]].

Attributes

deliveredFrames of type unsigned long long, readonly: Upon getting, run the [= expose audio frame counters steps =] and return {{MediaStreamTrackAudioStats/[[DeliveredFrames]]}}.
deliveredFramesDuration of type DOMHighResTimeStamp, readonly: Upon getting, run the [= expose audio frame counters steps =] and return {{MediaStreamTrackAudioStats/[[DeliveredFramesDuration]]}}.
totalFrames of type unsigned long long, readonly: Upon getting, run the [= expose audio frame counters steps =] and return the sum of {{MediaStreamTrackAudioStats/[[DeliveredFrames]]}} and {{MediaStreamTrackAudioStats/[[DroppedFrames]]}}.
totalFramesDuration of type DOMHighResTimeStamp, readonly: Upon getting, run the [= expose audio frame counters steps =] and return the sum of {{MediaStreamTrackAudioStats/[[DeliveredFramesDuration]]}} and {{MediaStreamTrackAudioStats/[[DroppedFramesDuration]]}}.

Because audio capture devices produce audio in real-time, audio frames may be dropped if not processed in a timely manner.

The ratio of audio duration that was delivered, i.e. not dropped, can be calculated as {{MediaStreamTrackAudioStats/deliveredFramesDuration}} / {{MediaStreamTrackAudioStats/totalFramesDuration}}.
latency of type DOMHighResTimeStamp, readonly: Upon getting, run the [= expose audio frame counters steps =] and return {{MediaStreamTrackAudioStats/[[Latency]]}}.
averageLatency of type DOMHighResTimeStamp, readonly: Upon getting, run the [= expose audio frame counters steps =] and return {{MediaStreamTrackAudioStats/[[AverageLatency]]}}.
minimumLatency of type DOMHighResTimeStamp, readonly: Upon getting, run the [= expose audio frame counters steps =] and return {{MediaStreamTrackAudioStats/[[MinimumLatency]]}}.
maximumLatency of type DOMHighResTimeStamp, readonly: Upon getting, run the [= expose audio frame counters steps =] and return {{MediaStreamTrackAudioStats/[[MaximumLatency]]}}.

Methods

resetLatency

When called, run the following steps:

Run the [= expose audio frame counters steps =].
Set {{MediaStreamTrackAudioStats/[[AverageLatency]]}}, {{MediaStreamTrackAudioStats/[[MinimumLatency]]}} and {{MediaStreamTrackAudioStats/[[MaximumLatency]]}} to {{MediaStreamTrackAudioStats/[[Latency]]}}.
Set the [= latency reset time =] to {{MediaStreamTrackAudioStats/[[LastExposureTime]]}}.

toJSON

When called, run [[!WEBIDL]]'s [= default toJSON steps =].

The MediaStreamTrackVideoStats interface

[Exposed=Window]
interface MediaStreamTrackVideoStats {
  readonly attribute unsigned long long deliveredFrames;
  readonly attribute unsigned long long discardedFrames;
  readonly attribute unsigned long long totalFrames;
  [Default] object toJSON();
};

The {{MediaStreamTrackVideoStats}} expose frame counters for the {{MediaStreamTrack}} that created it. For this track, the user agent is required to count each video frame from its source as follows:

A frame is considered a delivered video frame if it either was delivered to a sink or would have been delivered to a sink, if one was connected. This is a subset of [= total video frames =] and it is incremented at the same time as [= total video frames =].
A video frame is considered discarded if it was discarded in order to achieve the target {{MediaTrackSettings/frameRate}}. This is a subset of [= total video frames =] and it is incremented at the same time as [= total video frames =].
The total number of frames that have been processed by this source, meaning it is known whether the frame was considered delivered, discarded or dropped for any other reason. The number of dropped frames for various unknown reasons can be calculated by subtracting [= delivered video frames =] and [= discarded video frames =] from [= total video frames =].

If the track is unmuted and enabled and the source is backed by a camera, total frames is incremented by frames produced by the camera. If no frames are flowing, such as if the track is muted or disabled, then total frames does not increment.

Let the {{MediaStreamTrackVideoStats}} have internal slots [[\DeliveredFrames]], [[\DiscardedFrames]] and [[\TotalFrames]], initialized to 0.

Let the {{MediaStreamTrackVideoStats}} also have an internal slot [[\LastTask]] initialized to null.

The expose video frame counters steps are the following:

Let task be the current [=task=].
If {{MediaStreamTrackVideoStats/[[LastTask]]}} is equal to task, abort these steps.
Set {{MediaStreamTrackVideoStats/[[LastTask]]}} to task.
Set {{MediaStreamTrackVideoStats/[[DeliveredFrames]]}} to [= delivered video frames =], set {{MediaStreamTrackVideoStats/[[DiscardedFrames]]}} to [= discarded video frames =] and set {{MediaStreamTrackVideoStats/[[TotalFrames]]}} to [= total video frames =].

Only updating these counters once per [=task=] preserves the run-to-completion semantics defined in [[API-DESIGN-PRINCIPLES]].

Attributes

deliveredFrames of type unsigned long long, readonly: Upon getting, run the [= expose video frame counters steps =] and return {{MediaStreamTrackVideoStats/[[DeliveredFrames]]}}.
discardedFrames of type unsigned long long, readonly: Upon getting, run the [= expose video frame counters steps =] and return {{MediaStreamTrackVideoStats/[[DiscardedFrames]]}}.
totalFrames of type unsigned long long, readonly: Upon getting, run the [= expose video frame counters steps =] and return {{MediaStreamTrackVideoStats/[[TotalFrames]]}}.

Methods

toJSON: When called, run [[!WEBIDL]]'s [= default toJSON steps =].

The powerEfficient constraint

MediaTrackSupportedConstraints dictionary extensions

partial dictionary MediaTrackSupportedConstraints {
  boolean powerEfficient = true;
};

Dictionary {{MediaTrackSupportedConstraints}} Members

powerEfficient of type {{boolean}}, defaulting to true: See powerEfficient for details.

MediaTrackCapabilities dictionary extensions

partial dictionary MediaTrackCapabilities {
  sequence<boolean> powerEfficient;
};

Dictionary {{MediaTrackCapabilities}} Members

powerEfficient of type sequence<{{boolean}}>: The source may operate in different configurations. If all configurations have the same power efficiency impact, a single false is reported. Otherwise, the source reports a list with both true and false as possible values. See powerEfficient for additional details.

MediaTrackSettings dictionary extensions

partial dictionary MediaTrackSettings {
  boolean powerEfficient;
};

Dictionary {{MediaTrackSettings}} Members

powerEfficient of type {{boolean}}: See powerEfficient for details.

Constrainable Properties

The constrainable properties in this document are defined below.

This constraint is not usable with getUserMedia as a mandatory constraint, since it's not in allowed required constraints for device selection. User Agents MAY use it for device selection though the current distance algorithm is not directly applicable as devices may have different power consumptions while having the same `powerEfficient` capabilities.

Property Name	Values	Notes
powerEfficient	{{ConstrainBoolean}}	Cameras can often operate in different configurations. Configurations are typically selected based on constraints that are related to observable parameters like width or height. Configurations may have less directly observable characteristics: power consumption, low light sensitivity, fast autofocus... The powerEfficient constraint allows web applications to favor selection of configurations that consume less power. This may be useful for web applications that may use the camera for an extended amount of time, like video conference web applications. On the other hand, applications that may use the camera for a small amount of time may prefer to not use the powerEfficient constraint. This constraint is only applicable to camera sources. As a constraint, setting it to true instructs the user agent to prefer configuration that it considers power efficient.

Property Name

Values

Notes

powerEfficient

Cameras can often operate in different configurations. Configurations are typically selected based on constraints that are related to observable parameters like width or height. Configurations may have less directly observable characteristics: power consumption, low light sensitivity, fast autofocus... The powerEfficient constraint allows web applications to favor selection of configurations that consume less power. This may be useful for web applications that may use the camera for an extended amount of time, like video conference web applications. On the other hand, applications that may use the camera for a small amount of time may prefer to not use the powerEfficient constraint. This constraint is only applicable to camera sources.

As a constraint, setting it to true instructs the user agent to prefer configuration that it considers power efficient.

The powerEfficientPixelFormat constraint

This constraint is somewhat redundant with the powerEfficient constraint. It may be removed in a future version of this specification.

MediaTrackSupportedConstraints dictionary extensions

partial dictionary MediaTrackSupportedConstraints {
  boolean powerEfficientPixelFormat = true;
};

Dictionary {{MediaTrackSupportedConstraints}} Members

powerEfficientPixelFormat of type {{boolean}}, defaulting to true: See powerEfficientPixelFormat for details.

MediaTrackCapabilities dictionary extensions

partial dictionary MediaTrackCapabilities {
  sequence<boolean> powerEfficientPixelFormat;
};

Dictionary {{MediaTrackCapabilities}} Members

powerEfficientPixelFormat of type sequence<{{boolean}}>: If the source only has power efficient pixel formats, a single true is reported. If the source only has power inefficient pixel formats, a single false is reported. If the script can control the feature, the source reports a list with both true and false as possible values. See powerEfficientPixelFormat for additional details.

MediaTrackSettings dictionary extensions

partial dictionary MediaTrackSettings {
  boolean powerEfficientPixelFormat;
};

Dictionary {{MediaTrackSettings}} Members

powerEfficientPixelFormat of type {{boolean}}: See powerEfficientPixelFormat for details.

Constrainable Properties

The constrainable properties in this document are defined below.

Property Name	Values	Notes
powerEfficientPixelFormat	{{ConstrainBoolean}}	Compressed pixel formats often need to be decoded, for instance for display purposes or when being encoded during a video call. The user agent SHOULD label compressed pixel formats that incur significant power penalty when decoded as power inefficient. The labeling is up to the user agent, but decoding MJPEG in software is an example of an expensive mode. Pixel formats that have not been labeled power inefficient by the user agent are for the purpose of this API considered power efficient. As a constraint, setting it to true allows filtering out inefficient pixel formats and setting it to false allows filtering out efficient pixel formats. As a setting, this reflects whether or not the current pixel format is considered power efficient by the user agent.

Property Name

Values

Notes

powerEfficientPixelFormat

Compressed pixel formats often need to be decoded, for instance for display purposes or when being encoded during a video call. The user agent SHOULD label compressed pixel formats that incur significant power penalty when decoded as power inefficient. The labeling is up to the user agent, but decoding MJPEG in software is an example of an expensive mode. Pixel formats that have not been labeled power inefficient by the user agent are for the purpose of this API considered power efficient.

As a constraint, setting it to true allows filtering out inefficient pixel formats and setting it to false allows filtering out efficient pixel formats.

As a setting, this reflects whether or not the current pixel format is considered power efficient by the user agent.

Exposing change of MediaStreamTrack configuration

The configuration (capabilities and settings) of a {{MediaStreamTrack}} may be changed dynamically outside the control of web applications. One example is when a user decides to switch on background blur through the operating system. Web applications might want to know that the configuration of a particular {{MediaStreamTrack}} has changed. For that purpose, a new event is defined below.

MediaStreamTrack Interface Extensions

partial interface MediaStreamTrack {
  attribute EventHandler onconfigurationchange;
};

The onconfigurationchange attribute is an [=event handler IDL attribute=] for the `onconfigurationchange` [=event handler=], whose [=event handler event type=] is configurationchange.

When the [=User Agent=] detects a change of configuration in a track's underlying source, the [=User Agent=] MUST run the following steps:

If track.{{MediaStreamTrack/muted}} is true, wait for track.{{MediaStreamTrack/muted}} to become false or track.{{MediaStreamTrack/readyState}} to be "ended".
[=Queue a task=] on [=current settings object=]'s [=environment settings object/responsible event loop=] to perform the following steps:

This task will run before any other task that may set track.{{MediaStreamTrack/muted}} to true.
1. If track.{{MediaStreamTrack/readyState}} is "ended", abort these steps.
2. If track's capabilities and settings are matching source configuration, abort these steps.
3. Update track's capabilities and settings according track's underlying source.
4. [=Fire an event=] named {{configurationchange}} on track.

These events are potentially triggered simultaneously on documents of different origins. [=User Agents=] MAY add fuzzing on the timing of events to avoid cross-origin activity correlation.

Example

This example shows how to monitor external background blur changes.

      const stream = await navigator.mediaDevices.getUserMedia({video: true});
      const [track] = stream.getVideoTracks();
      let {backgroundBlur} = track.getSettings();
      applyBlurInSoftwareInstead(!backgroundBlur);

      track.addEventListener("configurationchange", () => {
        if (backgroundBlur != track.getSettings().backgroundBlur) {
          backgroundBlur = track.getSettings().backgroundBlur;
          applyBlurInSoftwareInstead(!backgroundBlur);
        }
      });

Human face segmentation

Human face metadata describes the geometrical information of human faces in video frames. It can be set by web applications using the standard means when creating {{VideoFrameMetadata}} for {{VideoFrame}}s or it can be set by a user agent when the media track constraint, defined below, is used to enable face detection for the {{MediaStreamTrack}} which provides the {{VideoFrame}}s.

The facial metadata can be used by video encoders to enhance the quality of the faces in encoded video streams or for other suitable purposes.

partial dictionary VideoFrameMetadata {
  sequence<Segment> segments;
};

Members

segments of type sequence<{{Segment}}>: The set of known geometrical segments in the video frame.

dictionary Segment {
  required SegmentType type;
  required long        id;
  long                 partOf;
  required float       probability;
  Point2D              centerPoint;
  DOMRectInit          boundingBox;
};

enum SegmentType {
  "human-face",
  "left-eye",
  "right-eye",
  "eye",
  "mouth",
};

Dictionary {{Segment}} Members

type of type {{SegmentType}}

The type of segment which the segment refers to.

It must be one of the following values:

human-face: The segment describes a human face.
left-eye: The segment describes oculus sinister.
right-eye: The segment describes oculus dexter.
eye: The segment describes an eye, either left or right.
mouth: The segment describes a mouth.

id of type {{long}}

An identifier of the object described by the segment, unique within a sequence. If the same object can be tracked over multiple frames originating from the same {{MediaStreamTrack}} source or it can be matched to correspond to the same object in {{MediaStreamTrack}}s which are cloned from the same original {{MediaStreamTrack}}, the user agent SHOULD use the same {{Segment/id}} for the segments which describe the object. {{Segment/id}} is also used in conjunction with the member {{Segment/partOf}}.

The user agent MUST NOT select the value to assign to {{Segment/id}} in such a way that the detected objects could be correlated to match between different {{MediaStreamTrack}} sources unless the {{MediaStreamTrack}}s are cloned from the same original {{MediaStreamTrack}}.

partOf of type {{long}}

If defined, references another segment which has the member {{Segment/id}} set to the same value. The referenced segment corresponds to an object of which the object described by this segment is part of.

If undefined, the object described by this segment is not known to be part of any other object described by any segment associated with the {{MediaStreamTrack}}.

probability of type {{float}}

If nonzero, this is the estimate of the conditional probability that the segmented object actually is of the type indicated by the {{Segment/type}} member on the condition that the detection has been made. The value of this member must be always zero or above with a maximum of one. The special value of zero indicates that the probability estimate is not available.

centerPoint of type {{Point2D}}

The coordinates of the approximate center of the object described by this {{Segment}}. The object location in the frame can be specified even if it is obscured by other objects in front of it or it lies partially or fully outside of the frame.

The {{Point2D/x}} and {{Point2D/y}} values of the point are interpreted to represent a coordinate in a normalized square space. The origin of coordinates {x,y} = {0.0, 0.0} represents the upper left corner whereas the {x,y} = {1.0, 1.0} represents the lower right corner relative to the rendered frame.

boundingBox of type {{DOMRectInit}}

A bounding box surrounding the object described by this segment.

The object bounding box in the frame can be specified even if it is obscured by other objects in front of it or it lies partially or fully outside of the frame.

See the member {{Segment/centerPoint}} for the definition of the coordinate system.

{{MediaTrackSupportedConstraints}} dictionary extensions

partial dictionary MediaTrackSupportedConstraints {
  boolean humanFaceDetectionMode = true;
};

Dictionary {{MediaTrackSupportedConstraints}} Members

humanFaceDetectionMode of type {{boolean}}, defaulting to true: See humanFaceDetectionMode for details.

{{MediaTrackCapabilities}} dictionary extensions

partial dictionary MediaTrackCapabilities {
  sequence<DOMString> humanFaceDetectionMode;
};

Dictionary {{MediaTrackCapabilities}} Members

humanFaceDetectionMode of type sequence<{{DOMString}}>: The sequence of supported face detection modes. Each string MUST be one of the members of {{HumanFaceDetectionModeEnum}}. See humanFaceDetectionMode for additional details.

{{MediaTrackConstraintSet}} dictionary extensions

partial dictionary MediaTrackConstraintSet {
  ConstrainDOMString humanFaceDetectionMode;
};

Dictionary {{MediaTrackConstraintSet}} Members

humanFaceDetectionMode of type {{ConstrainDOMString}}: See humanFaceDetectionMode for details.

partial dictionary MediaTrackSettings {
  DOMString humanFaceDetectionMode;
};

Dictionary {{MediaTrackSettings}} Members

humanFaceDetectionMode of type {{DOMString}}: See humanFaceDetectionMode for details.

enum HumanFaceDetectionModeEnum {
  "none",
  "bounding-box",
  "bounding-box-with-landmark-center-point",
};

{{HumanFaceDetectionModeEnum}} Enumeration Description

none: This {{MediaStreamTrack}} source does not set metadata in {{VideoFrameMetadata}} of {{VideoFrame}}s related to human faces or human face landmarks, that is, to any {{Segment}} which has the {{Segment/type}} set to any of the alternatives listed in enumeration {{SegmentType}}. As an input, this is interpreted as a command to turn off the setting of human face and landmark detection.
bounding-box: This source sets metadata related to human faces (segment type of {{SegmentType/"human-face"}}) including bounding box information in the member {{Segment/boundingBox}} of each {{Segment}} related to a detected face. The source does not set the human face landmark information. As an input, this is interpreted as a command to enable the setting of human face detection and to find the bounding box of each detected face.
bounding-box-with-landmark-center-point: With this setting, the source sets a superset of the metadata compared to the {{HumanFaceDetectionModeEnum/"bounding-box"}} setting. The source sets the same metadata and additionally metadata related to human face landmarks (all other {{SegmentType}}s except {{SegmentType/"human-face"}}) including center point information in the member {{Segment/centerPoint}} of each {{Segment}} related to a detected landmark. As an input, this is interpreted as a command to enable the setting of human face and face landmark detection and to set bounding box related information to face segment metadata and to set the center point information of each detected face landmark.

Constrainable Properties

The constrainable properties in this section are defined below.

Property Name	Values	Notes
humanFaceDetectionMode	{{ConstrainDOMString}}	This string (or each string, when a list) should be one of the members of {{HumanFaceDetectionModeEnum}}. As a {{MediaStreamTrack}} constraint, its value allows choosing whether face and face landmark detection is preferred. As a setting, this reflects which face geometrical properties the user agent detects and sets in the metadata of the {{VideoFrame}}s obtained from the track.

Property Name

Values

Notes

humanFaceDetectionMode

This string (or each string, when a list) should be one of the members of {{HumanFaceDetectionModeEnum}}.

As a {{MediaStreamTrack}} constraint, its value allows choosing whether face and face landmark detection is preferred.

As a setting, this reflects which face geometrical properties the user agent detects and sets in the metadata of the {{VideoFrame}}s obtained from the track.

Examples

// main.js:
// Open camera with face detection enabled
const stream = await navigator.mediaDevices.getUserMedia({
  video: { humanFaceDetectionMode: 'bounding-box' }
});
const [videoTrack] = stream.getVideoTracks();
if (videoTrack.getSettings().humanFaceDetectionMode != 'bounding-box') {
  throw('Face bounding box detection is not supported');
}

// Use a video worker and show to user.
const videoElement = document.querySelector('video');
const videoWorker = new Worker('video-worker.js');
videoWorker.postMessage({track: videoTrack}, [videoTrack]);
const {data} = await new Promise(r => videoWorker.onmessage);
videoElement.srcObject = new MediaStream([data.videoTrack]);

// video-worker.js:
self.onmessage = async ({data: {track}}) => {
  const generator = new VideoTrackGenerator();
  parent.postMessage({videoTrack: generator.track}, [generator.track]);
  const {readable} = new MediaStreamTrackProcessor({track});
  const transformer = new TransformStream({
    async transform(frame, controller) {
      for (const segment of frame.metadata().segments || []) {
        if (segment.type === 'human-face') {
          // the metadata is coming directly from the video track with
          // bounding-box face detection enabled
          console.log(
            `Face @ (${segment.boundingBox.x}, ${segment.boundingBox.y}), size ` +
            `${segment.boundingBox.width}x${segment.boundingBox.height}`);
        }
      }
      controller.enqueue(frame);
    }
  });
  await readable.pipeThrough(transformer).pipeTo(generator.writable);
};

Introduction

Terminology

In-browser camera and microphone picker

getUserMedia "user-chooses" semantics

Web compatibility and migration

MediaDevices Interface Extensions

Attributes

MediaStreamConstraints dictionary extensions

Dictionary {{MediaStreamConstraints}} Members

GetUserMediaSemantics enum

Algorithms

Examples

MediaStreamTrack extensions

Transferable MediaStreamTrack

MediaStreamTrack Statistics

Attributes

The MediaStreamTrackAudioStats interface

Attributes

Methods

The MediaStreamTrackVideoStats interface

Attributes

Methods

The powerEfficient constraint

MediaTrackSupportedConstraints dictionary extensions

Dictionary {{MediaTrackSupportedConstraints}} Members

MediaTrackCapabilities dictionary extensions

Dictionary {{MediaTrackCapabilities}} Members

MediaTrackSettings dictionary extensions

Dictionary {{MediaTrackSettings}} Members

Constrainable Properties

The powerEfficientPixelFormat constraint

MediaTrackSupportedConstraints dictionary extensions

Dictionary {{MediaTrackSupportedConstraints}} Members

MediaTrackCapabilities dictionary extensions

Dictionary {{MediaTrackCapabilities}} Members

MediaTrackSettings dictionary extensions

Dictionary {{MediaTrackSettings}} Members

Constrainable Properties

Exposing MediaStreamTrack source heuristic reactions support

Exposing MediaStreamTrack source automatic face framing support

Processing considerations

Examples

Exposing MediaStreamTrack source eye gaze correction support

Processing considerations

Examples

VoiceIsolation constraint

Processing considerations

Exposing change of MediaStreamTrack configuration

MediaStreamTrack Interface Extensions

Example

Human face segmentation

{{VideoFrameMetadata}}

Members

{{Segment}}

Dictionary {{Segment}} Members

{{MediaTrackSupportedConstraints}} dictionary extensions

Dictionary {{MediaTrackSupportedConstraints}} Members

{{MediaTrackCapabilities}} dictionary extensions

Dictionary {{MediaTrackCapabilities}} Members

{{MediaTrackConstraintSet}} dictionary extensions

Dictionary {{MediaTrackConstraintSet}} Members

{{MediaTrackSettings}}

Dictionary {{MediaTrackSettings}} Members

{{HumanFaceDetectionModeEnum}}

{{HumanFaceDetectionModeEnum}} Enumeration Description

Constrainable Properties

Examples

MediaStream in workers