This document defines a set of JavaScript APIs that allow local media, including audio and video, to be requested from a platform.

This document is not complete. It is subject to major changes and, while early experimentations are encouraged, it is therefore not intended for implementation. The API is based on preliminary work done in the WHATWG.

Introduction

This document defines APIs for requesting access to local multimedia devices, such as microphones or video cameras.

This document also defines the MediaStream API, which provides the means to control where multimedia stream data is consumed, and provides some control over the devices that produce the media. It also exposes information about devices able to capture and render media.

This specification defines conformance criteria that apply to a single product: the User Agent that implements the interfaces that it contains.

Conformance requirements phrased as algorithms or specific steps may be implemented in any manner, so long as the end result is equivalent. (In particular, the algorithms defined in this specification are intended to be easy to follow, and not intended to be performant.)

Implementations that use ECMAScript [[ECMA-262]] to implement the APIs defined in this specification must implement them in a manner consistent with the ECMAScript Bindings defined in the Web IDL specification [[!WEBIDL]], as this specification uses that specification and terminology.

Terminology

HTML Terms:

The EventHandler interface represents a callback used for event handlers as defined in [[!HTML5]].

The concepts queue a task and fires a simple event are defined in [[!HTML5]].

The terms event handlers and event handler event types are defined in [[!HTML5]].

source

A source is the "thing" providing the source of a media stream track. The source is the broadcaster of the media itself. A source can be a physical webcam, microphone, local video or audio file from the user's hard drive, network resource, or static image. Note that this document describes the use of microphone and camera type sources only, the use of other source types is described in other documents.

An application that has no prior authorization regarding sources is only given the number of available sources, their type and any relationship to other devices. Additional information about sources can become available when applications are authorized to use a source (see ).

Sources do not have constraints — tracks have constraints. When a source is connected to a track, it must produce media that conforms to the constraints present on that track. Multiple tracks can be attached to the same source. User Agent processing, such as downsampling, MAY be used to ensure that all tracks have appropriate media.

Sources are detached from a track when the track is ended for any reason.

Sources have constrainable properties which have capabilities and settings. The constrainable properties are "owned" by the source and are common to any (multiple) tracks that happen to be using the same source (e.g., if two different track objects bound to the same source ask for the same capability or setting information, they will get back the same answer).

Setting (Source Setting)

A setting refers to the immediate, current value of the source's constrainable properties. Settings are always read-only.

A source's settings can change dynamically over time due to environmental conditions, sink configurations, or constraint changes. A source's settings must always conform to the current set of mandatory constraints on all attached tracks. A source that cannot conform to mandatory constraints causes affected tracks to become overconstrained and therefore muted. A user agent attempts to ensure that sources adhere to optional constraints as closely as possible, see .

Although settings are a property of the source, they are only exposed to the application through the tracks attached to the source. This is exposed via the ConstrainablePattern interface.

Capabilities

For each constrainable property, there is a capability that describes whether it is supported by the source and if so, the range of supported values. As with settings, capabilities are exposed to the application via the ConstrainablePattern interface.

The values of the supported capabilities must be normalized to the ranges and enumerated types defined in this specification.

A getCapabilities() call on a track returns the same underlying per-source capabilities for all tracks connected to the source.

Source capabilities are effectively constant. Applications should be able to depend on a specific source having the same capabilities for any session.

Open Issue: Is "session" the correct term? Should it be "top-level browsing context" as defined in HTML spec?

This API is intentionally simplified. Capabilities are not capable of describing interactions between different values. For instance, it is not possible to accurately describe the capabilities of a camera that can produce a high resolution video stream at a low frame rate and lower resolutions at a higher frame rate. Capabilities describe the complete range of each value. Interactions between constraints are exposed by attempting to apply constraints.

Constraints

Constraints provide a general control surface that allows applications to both select an appropriate source for a track and, once selected, to influence how a source operates.

Constraints limit the range of operating modes that a source can use when providing media for a track. Without provided track constraints, implementations are free to select a source's settings from the full ranges of its supported capabilities. Implementations may also adjust source settings at any time within the bounds imposed by all applied constraints.

getUserMedia() uses constraints to help select an appropriate source for a track and configure it. Additionally, the ConstrainablePattern interface on tracks includes an API for dynamically changing the track's constraints at any later time.

A track will not be connected to a source using getUserMedia() if its initial constraints cannot be satisfied. However, the ability to meet the constraints on a track can change over time, and constraints can be changed. If circumstances change such that constraints cannot be met, the ConstrainablePattern interface defines an appropriate error to inform the application. explains how constraints interact in more detail.

In general, user agents will have more flexibility to optimize the media streaming experience the fewer constraints are applied, so application authors are strongly encouraged to use mandatory constraints sparingly.

For each constrainable property, a constraint exists whose name corresponds with the relevant source setting name and capability name.

RTCPeerConnection
RTCPeerConnection is defined in [[WEBRTC10]].

MediaStream API

Introduction

The two main components in the MediaStream API are the MediaStreamTrack and MediaStream interfaces. The MediaStreamTrack object represents media of a single type that originates from one media source in the User Agent, e.g. video produced by a web camera. A MediaStream is used to group several MediaStreamTrack objects into one unit that can be recorded or rendered in a media element.

Each MediaStream can contain zero or more MediaStreamTrack objects. All tracks in a MediaStream are intended to be synchronized when rendered. This is not a hard requirement, since it might not be possible to synchronize tracks from sources that have different clocks. Different MediaStream objects do not need to be synchronized.

While the intent is to synchronize tracks, it could be better in some circumstances to permit tracks to lose synchronization. In particular, when tracks are remotely sourced and real-time [[WEBRTC10]], it can be better to allow loss of synchronization than to accumulate delays or risk glitches and other artifacts. Implementations are expected to understand the implications of choices regarding synchronization of playback and the effect that these have on user perception.

A single MediaStreamTrack can represent multi-channel content, such as stereo or 5.1 audio or stereoscopic video, where the channels have a well defined relationship to each other. Information about channels might be exposed through other APIs, such as [[WEBAUDIO]], but this specification provides no direct access to channels.

A MediaStream object has an input and an output that represent the combined input and output of all the object's tracks. The output of the MediaStream controls how the object is rendered, e.g., what is saved if the object is recorded to a file or what is displayed if the object is used in a video element. A single MediaStream object can be attached to multiple different outputs at the same time.

A new MediaStream object can be created from existing media streams or tracks using the MediaStream() constructor. The constructor argument can either be an existing MediaStream object, in which case all the tracks of the given stream are added to the new MediaStream object, or an array of MediaStreamTrack objects. The latter form makes it possible to compose a stream from different source streams.

Both MediaStream and MediaStreamTrack objects can be cloned. A cloned MediaStream contains clones of all member tracks from the original stream. A cloned MediaStreamTrack has a set of constraints that is independent of the instance it is cloned from, which allows media from the same source to have different constraints applied for different consumers. The MediaStream object is also used in contexts outside getUserMedia, such as [[WEBRTC10]].

MediaStream

The MediaStream() constructor composes a new stream out of existing tracks. It takes an optional argument of type MediaStream or an array of MediaStreamTrack objects. When the constructor is invoked, the User Agent must run the following steps:

  1. Let stream be a newly constructed MediaStream object.

  2. Initialize stream's id attribute to a newly generated value.

  3. If the constructor's argument is present, construct a set of tracks, tracks based on the type of argument:

  4. Run the steps for addTrack on stream for each MediaStreamTrack in tracks.

  5. If stream's track set is empty or only contains ended tracks, set stream's active attribute to false, otherwise set it to true.

  6. Return stream.

The tracks of a MediaStream are stored in a track set. The track set MUST contain the MediaStreamTrack objects that correspond to the tracks of the stream. The relative order of the tracks in the set is User Agent defined and the API will never put any requirements on the order. The proper way to find a specific MediaStreamTrack object in the set is to look it up by its id.

An object that reads data from the output of a MediaStream is referred to as a MediaStream consumer. The list of MediaStream consumers currently include media elements (such as <video> and <audio>) [[HTML5]], Web Real-Time Communications (WebRTC; RTCPeerConnection) [[WEBRTC10]], media recording (MediaRecorder) [[mediastream-recording]], image capture (ImageCapture) [[image-capture]], and web audio (MediaStreamAudioSourceNode) [[WEBAUDIO]].

MediaStream consumers must be able to handle tracks being added and removed. This behavior is specified per consumer.

A MediaStream object is said to be active when it has at least one MediaStreamTrack that has not ended. A MediaStream that does not have any tracks or only has tracks that are ended is inactive.

When a MediaStream goes from being active to inactive, the User Agent MUST queue a task that sets the object's active attribute to false and fire a simple event named inactive at the object. When a MediaStream goes from being inactive to active, the User Agent MUST queue a task that sets the object's active attribute to true and fire a simple event named active at the object.

If the stream's activity status changed due to a user request, the task source [[!HTML5]] for this task is the user interaction task source [[!HTML5]]. Otherwise the task source for this task is the networking task source [[!HTML5]].

Constructor()

See the MediaStream constructor algorithm

Constructor(MediaStream stream)

See the MediaStream constructor algorithm

Constructor(sequence<MediaStreamTrack> tracks)

See the MediaStream constructor algorithm

readonly attribute DOMString id

When a MediaStream object is created, the User Agent MUST generate an identifier string, and MUST initialize the object's id attribute to that string. A good practice is to use a UUID [[rfc4122]], which is 36 characters long in its canonical form.

The id attribute MUST return the value to which it was initialized when the object was created.

sequence<MediaStreamTrack> getAudioTracks()

Returns a sequence of MediaStreamTrack objects representing the audio tracks in this stream.

The getAudioTracks() method MUST return a sequence that represents a snapshot of all the MediaStreamTrack objects in this stream's track set whose kind is equal to "audio". The conversion from the track set to the sequence is user agent defined and the order does not have to be stable between calls.

sequence<MediaStreamTrack> getVideoTracks()

Returns a sequence of MediaStreamTrack objects representing the video tracks in this stream.

The getVideoTracks() method MUST return a sequence that represents a snapshot of all the MediaStreamTrack objects in this stream's track set whose kind is equal to "video". The conversion from the track set to the sequence is user agent defined and the order does not have to be stable between calls.

sequence<MediaStreamTrack> getTracks()

Returns a sequence of MediaStreamTrack objects representing all the tracks in this stream.

The getTracks() method MUST return a sequence that represents a snapshot of all the MediaStreamTrack objects in this stream's track set, regardless of kind. The conversion from the track set to the sequence is User Agent defined and the order does not have to be stable between calls.

MediaStreamTrack? getTrackById(DOMString trackId)

The getTrackById() method MUST return either a MediaStreamTrack object from this stream's track set whose id is equal to trackId, or null, if no such track exists.

void addTrack(MediaStreamTrack track)

Adds the given MediaStreamTrack to this MediaStream.

When the addTrack() method is invoked, the User Agent MUST run the following steps:

  1. Let track be the MediaStreamTrack argument and stream this MediaStream object.

  2. If track is already in stream's track set, then abort these steps.

  3. Add track to stream's track set.

void removeTrack(MediaStreamTrack track)

Removes the given MediaStreamTrack object from this MediaStream.

When the removeTrack() method is invoked, the User Agent MUST remove the MediaStreamTrack object, indicated by the method's argument, from the stream's track set, if present.

MediaStream clone()

Clones the given MediaStream and all its tracks.

When the MediaStream.clone() method is invoked, the User Agent MUST run the following steps:

  1. Let streamClone be a newly constructed MediaStream object.

  2. Initialize streamClone's id attribute to a newly generated value.

  3. Let clonedTracks be a list that contains the result of running MediaStreamTrack.clone() on all the tracks in the stream on which this method was called.

  4. Let clonedTracks be streamClone's track set.

  5. Return streamClone.
readonly attribute boolean active

This attribute is true if the MediaStream is active and false otherwise.

attribute EventHandler onactive

This event handler, of type active, is executed when the MediaStream becomes active.

attribute EventHandler oninactive

This event handler, of type inactive, is executed when the MediaStream becomes inactive.

attribute EventHandler onaddtrack

This event handler, of type addtrack, is executed when a MediaStreamTrack is added to the MediaStream.

attribute EventHandler onremovetrack

This event handler, of type removetrack, is executed when a MediaStreamTrack is removed from the MediaStream.

MediaStreamTrack

A MediaStreamTrack object represents a media source in the User Agent. Several MediaStreamTrack objects can represent the same media source, e.g., when the user chooses the same camera in the UI shown by two consecutive calls to getUserMedia() .

The data from a MediaStreamTrack object does not necessarily have a canonical binary form; for example, it could just be "the video currently coming from the user's video camera". This allows User Agents to manipulate media in whatever fashion is most suitable on the user's platform.

A script can indicate that a track no longer needs its source with the MediaStreamTrack.stop() method. When all tracks using a source have been stopped, the given permission for that source is revoked and the source is stopped. If the data is being generated from a live source (e.g., a microphone or camera), then the User Agent SHOULD remove any active "on-air" indicator for that source. An implementation may use a per source reference count to keep track of source usage, but the specifics are out of scope for this specification.

If there is no stored permission to use that source, the User Agent SHOULD also remove the "permission granted" indicator for the source.

Life-cycle and Media Flow

Life-cycle

A MediaStreamTrack has two states in its life-cycle: live and ended. A newly created MediaStreamTrack can be in either state depending on how it was created. For example, cloning an ended track results in a new ended track. The current state is reflected by the object's readyState attribute.

In the live state, the track is active and media is available for use by consumers (but may be replaced by zero-information-content if the MediaStreamTrack is muted or disabled, see below).

A muted or disabled MediaStreamTrack renders either silence (audio), black frames (video), or a zero-information-content equivalent. For example, a video element sourced by a muted or disabled MediaStreamTrack (contained within a MediaStream ), is playing but the rendered content is the muted output. When all tracks connected to a source are muted or disabled, the "on-air" or "recording" indicator for that source can be turned off; when the track is no longer muted or disabled, it MUST be turned back on.

The muted/unmuted state of a track reflects whether the source provides any media at this moment. The enabled/disabled state is under application control and determines whether the track outputs media (to its consumers). Hence, media from the source only flows when a MediaStreamTrack object is both unmuted and enabled.

A MediaStreamTrack is muted when the source is temporarily unable to provide the track with data. A track can be muted by a user. Often this action is outside the control of the application. This could be as a result of the user hitting a hardware switch or toggling a control in the operating system / browser chrome. A track can also be muted by the User Agent.

Applications are able to enable or disable a MediaStreamTrack to prevent it from rendering media from the source. A muted track will however, regardless of the enabled state, render silence and blackness. A disabled track is logically equivalent to a muted track, from a consumer point of view.

For a newly created MediaStreamTrack object, the following applies. The track is always enabled unless stated otherwise (for example when cloned) and the muted state reflects the state of the source at the time the track is created.

A MediaStreamTrack object is said to end when the source of the track is disconnected or exhausted.

A MediaStreamTrack can be detached from its source. It means that the track is no longer dependent on the source for media data. If no other MediaStreamTrack is using the same source, the source will be stopped. MediaStreamTrack attributes such as kind and label MUST NOT change values when the source is detached.

When a MediaStreamTrack object ends for any reason (e.g., because the user rescinds the permission for the page to use the local camera, or because the application invoked the stop() method on the MediaStreamTrack object, or because the User Agent has instructed the track to end for any reason) it is said to be ended.

When a MediaStreamTrack track ends for any reason other than the stop() method being invoked, the User Agent MUST queue a task that runs the following steps:

  1. If the track's readyState attribute has the value ended already, then abort these steps.

  2. Set track's readyState attribute to ended.

  3. Detach track's source.

  4. Fire a simple event named ended at the object.

If the end of the stream was reached due to a user request, the event source for this event is the user interaction event source.

Media Flow

There are two concepts related to the media flow for a live MediaStreamTrack : muted / not muted, and enabled / disabled.

Muted refers to the input to the MediaStreamTrack. If live samples are not made available to the MediaStreamTrack it is muted.

Muted is out of control for the application, but can be observed by the application by reading the muted attribute and listening to the associated events mute and unmute. There can be several reasons for a MediaStreamTrack to be muted: the user pushing a physical mute button on the microphone, the user toggling a control in the operating system, the user clicking a mute button in the browser chrome, the User Agent (on behalf of the user) mutes, etc.

Enabled/disabled on the other hand is available to application to control (and observe) via the enabled attribute.

The result for the consumer is the same in the meaning that whenever MediaStreamTrack is muted or disabled (or both) the consumer gets zero-information-content, which means silence for audio and black frames for video. In other words, media from the source only flows when a MediaStreamTrack object is both unmuted and enabled. For example, a video element sourced by a muted or disabled MediaStreamTrack (contained in a MediaStream ), is playing but rendering blackness.

For a newly created MediaStreamTrack object, the following applies: the track is always enabled unless stated otherwise (for example when cloned) and the muted state reflects the state of the source at the time the track is created.

Tracks and Constraints

Constraints are set on tracks and may affect sources.

Whether Constraints were provided at track initialization time or need to be established later at runtime, the APIs defined in the ConstrainablePattern Interface allow the retrieval and manipulation of the constraints currently established on a track.

Each track maintains an internal version of the Constraints structure, namely a mandatory set of constraints (no duplicates) and an optional ordered list of individual constraint objects (may contain duplicates). The internal stored constraint structure is exposed to the application by the constraints attribute, and may be modified by the applyConstraints() method.

When applyConstraints() is called, a User Agent MUST queue a task to evaluate those changes when the task queue is next serviced.

If the MediaStreamError event named overconstrained is thrown, the track MUST be muted until either new satisfiable constraints are applied or the existing constraints become satisfiable.

Interface Definition

readonly attribute DOMString kind

The MediaStreamTrack.kind attribute MUST return the string "audio" if the object represents an audio track or "video" if object represents a video track.

readonly attribute DOMString id

Unless a MediaStreamTrack object is created as a part of a special purpose algorithm that specifies how the track id must be initialized, the User Agent MUST generate an identifier string and initialize the object's id attribute to that string. See MediaStream.id for guidelines on how to generate such an identifier.

An example of an algorithm that specifies how the track id must be initialized is the algorithm to represent an incoming network component with a MediaStreamTrack object. [[WEBRTC10]]

MediaStreamTrack.id attribute MUST return the value to which it was initialized when the object was created.

readonly attribute DOMString label

User Agents MAY label audio and video sources (e.g., "Internal microphone" or "External USB Webcam"). The MediaStreamTrack.label attribute MUST return the label of the object's corresponding source, if any. If the corresponding source has or had no label, the attribute MUST instead return the empty string.

attribute boolean enabled

The MediaStreamTrack.enabled attribute controls the enabled state for the object.

On getting, the attribute MUST return the value to which it was last set. On setting, it MUST be set to the new value, regardless of whether the MediaStreamTrack object has been detached from its source or not.

Thus, after a MediaStreamTrack is detached from its source, its enabled attribute still changes value when set; it just doesn't do anything with that new value.

readonly attribute boolean muted

The MediaStreamTrack.muted attribute MUST return true if the track is muted, and false otherwise.

attribute EventHandler onmute

This event handler, of type mute, is executed when the MediaStreamTrack source is temporarily unable to provide data.

attribute EventHandler onunmute

This event handler, of type unmute, is executed when the MediaStreamTrack source is live again after having been temporarily unable to provide data.

readonly attribute boolean _readonly

If the track (audio or video) source is a local microphone or camera that is shared so that constraints applied to the track cannot modify the source's settings, the readonly attribute MUST return the value true. Otherwise, it must return the value false.

readonly attribute boolean remote

If the track is sourced by a non-local source, the remote attribute MUST return the value true. Otherwise, it must return the value false.

readonly attribute MediaStreamTrackState readyState

The readyState attribute represents the state of the track. It MUST return the value as most recently set by the User Agent.

attribute EventHandler onended

This event handler, of type ended, is executed when the MediaStreamTrack source will no longer provide any data, either due to a user action (revoked permission, removal of capture device) or due to an error.

MediaStreamTrack clone()

Clones the given MediaStreamTrack.

When the MediaStreamTrack.clone() method is invoked, the User Agent MUST run the following steps:

  1. Let trackClone be a newly constructed MediaStreamTrack object.

  2. Initialize trackClone's id attribute to a newly generated value.

  3. Let trackClone inherit this track's underlying source, kind, label, readyState, and enabled attributes, as well as its currently active constraints.

  4. Return trackClone.

void stop()

When a MediaStreamTrack object's stop() method is invoked, the User Agent MUST run following steps:

  1. Let track be the current MediaStreamTrack object.

  2. If track is sourced by a non-local source, then abort these steps.

  3. Set track's readyState attribute to ended.

  4. Detach track's source.

The task source for the tasks queued for the stop() method is the DOM manipulation task source.

MediaTrackCapabilities getCapabilities()

See ConstrainablePattern Interface for the definition of this method.

MediaTrackConstraints getConstraints()

See ConstrainablePattern Interface for the definition of this method.

MediaTrackSettings getSettings()

See ConstrainablePattern Interface for the definition of this method.

Promise<void> applyConstraints()
MediaTrackConstraints constraints

A new constraint structure to apply to this object.

See ConstrainablePattern Interface for the definition of this method.
attribute EventHandler onoverconstrained

See ConstrainablePattern Interface for the definition of this event handler.

live

The track is active (the track's underlying media source is making a best-effort attempt to provide data in real time).

The output of a track in the live state can be switched on and off with the enabled attribute.

ended

The track has ended (the track's underlying media source is no longer providing data, and will never provide more data for this track). Once a track enters this state, it never exits it.

For example, a video track in a MediaStream ends when the user unplugs the USB web camera that acts as the track's media source.

Track Source Types

camera

A valid source type only for video MediaStreamTrack s. The source is a local video-producing camera source.

microphone

A valid source type only for audio MediaStreamTrack s. The source is a local audio-producing microphone source.

MediaTrackSupportedConstraints

MediaTrackSupportedConstraints represents the list of constraints recognized by a User Agent for controlling the Capabilities of a MediaStreamTrack object.

Future specification can extend the MediaTrackSupportedConstraints dictionary by defining a partial dictionary with dictionary members of type boolean and an identifier that is a Property Name registered in the [[!RTCWEB-CONSTRAINTS]] registry.

boolean width
boolean height
boolean aspectRatio
boolean frameRate
boolean facingMode
boolean volume
boolean sampleRate
boolean sampleSize
boolean echoCancellation
boolean deviceId
boolean groupId

MediaTrackCapabilities

MediaTrackCapabilities represents the Capabilities of a MediaStreamTrack object.

Future specification can extend the MediaTrackCapabilities dictionary by defining a partial dictionary with dictionary members of appropriate type and an identifier that is a Property Name registered in the [[!RTCWEB-CONSTRAINTS]] registry.

(long or LongRange) width
(long or LongRange) height
(double or DoubleRange) aspectRatio
(double or DoubleRange) frameRate
DOMString facingMode
(double or DoubleRange) volume
(long or LongRange) sampleRate
(long or LongRange) sampleSize
sequence<boolean> echoCancellation
DOMString deviceId
DOMString groupId

MediaTrackConstraints

sequence<MediaTrackConstraintSet> advanced

See Constraints and ConstraintSet for the definition of this element.

Future specification can extend the MediaTrackConstraintSet dictionary by defining a partial dictionary with dictionary members of appropriate type and an identifier that is a Property Name registered in the [[!RTCWEB-CONSTRAINTS]] registry.

ConstrainLong width
ConstrainLong height
ConstrainDouble aspectRatio
ConstrainDouble frameRate
ConstrainDOMString facingMode
ConstrainDouble volume
ConstrainLong sampleRate
ConstrainLong sampleSize
ConstrainBoolean echoCancellation
ConstrainDOMString deviceId
ConstrainDOMString groupId

MediaTrackSettings

MediaTrackSettings represents the Settings of a MediaStreamTrack object.

Future specification can extend the MediaTrackSettings dictionary by defining a partial dictionary with dictionary members of appropriate type and an identifier that is a Property Name registered in the [[!RTCWEB-CONSTRAINTS]] registry.

long width
long height
double aspectRatio
double frameRate
DOMString facingMode
double volume
long sampleRate
long sampleSize
boolean echoCancellation
DOMString deviceId
DOMString groupId

MediaStreamTrackEvent

The addtrack and removetrack events use the MediaStreamTrackEvent interface.

Firing a track event named e with a MediaStreamTrack track means that an event with the name e, which does not bubble (except where otherwise stated) and is not cancelable (except where otherwise stated), and which uses the MediaStreamTrackEvent interface with the track attribute set to track, MUST be created and dispatched at the given target.

Constructor(DOMString type, MediaStreamTrackEventInit eventInitDict)

Constructs a new MediaStreamTrackEvent.

readonly attribute MediaStreamTrack track

The track attribute represents the MediaStreamTrack object associated with the event.

MediaStreamTrack track = null

The model: sources, sinks, constraints, and settings

Browsers provide a media pipeline from sources to sinks. In a browser, sinks are the <img>, <video>, and <audio> tags. Traditional sources include streamed content, files, and web resources. The media produced by these sources typically does not change over time - these sources can be considered to be static.

The sinks that display these sources to the user (the actual tags themselves) have a variety of controls for manipulating the source content. For example, an <img> tag scales down a huge source image of 1600x1200 pixels to fit in a rectangle defined with width="400" and height="300".

The getUserMedia API adds dynamic sources such as microphones and cameras - the characteristics of these sources can change in response to application needs. These sources can be considered to be dynamic in nature. A <video> element that displays media from a dynamic source can either perform scaling or it can feed back information along the media pipeline and have the source produce content more suitable for display.

Note: This sort of feedback loop is obviously just enabling an "optimization", but it's a non-trivial gain. This optimization can save battery, allow for less network congestion, etc...

Note that MediaStream sinks (such as <video>, <audio>, and even RTCPeerConnection) will continue to have mechanisms to further transform the source stream beyond that which the Settings, Capabilities, and Constraints described in this specification offer. (The sink transformation options, including those of RTCPeerConnection, are outside the scope of this specification.)

The act of changing or applying a track constraint may affect the settings of all tracks sharing that source and consequently all down-level sinks that are using that source. Many sinks may be able to take these changes in stride, such as the <video> element or RTCPeerConnection. Others like the Recorder API may fail as a result of a source setting change.

The RTCPeerConnection is an interesting object because it acts simultaneously as both a sink and a source for over-the-network streams. As a sink, it has source transformational capabilities (e.g., lowering bit-rates, scaling-up / down resolutions, and adjusting frame-rates), and as a source it could have its own settings changed by a track source (though in this specification sources with the remote attribute set to true do not consider the current constraints applied to a track).

To illustrate how changes to a given source impact various sinks, consider the following example. This example only uses width and height, but the same principles apply to all of the Settings exposed in this specification. In the first figure a home client has obtained a video source from its local video camera. The source's width and height settings are 800 pixels and 600 pixels, respectively. Three MediaStream objects on the home client contain tracks that use this same deviceId. The three media streams are connected to three different sinks: a <video> element (A), another <video> element (B), and a peer connection (C). The peer connection is streaming the source video to a remote client. On the remote client there are two media streams with tracks that use the peer connection as a source. These two media streams are connected to two <video> element sinks (Y and Z).

Changing media stream source effects: before the requested change

Note that at this moment, all of the sinks on the home client must apply a transformation to the original source's provided dimension settings. B is scaling the video down, A is scaling the video up (resulting in loss of quality), and C is also scaling the video up slightly for sending over the network. On the remote client, sink Y is scaling the video way down, while sink Z is not applying any scaling.

Using the ConstrainablePattern interface, one of the tracks requests a higher resolution (1920 by 1200 pixels) from the home client's video source.

Changing media stream source     effects: after the requested change

Note that the source change immediately affects all of the tracks and sinks on the home client, but does not impact any of the sinks (or sources) on the remote client. With the increase in the home client source video's dimensions, sink A no longer has to perform any scaling, while sink B must scale down even further than before. Sink C (the peer connection) must now scale down the video in order to keep the transmission constant to the remote client.

While not shown, an equally valid settings change request could be client's side). In addition to impacting sink Y and Z in the same manner as A, B and C were impacted earlier, it could lead to re-negotiation with the peer connection on the home client in order to alter the transformation that it is applying to the home client's video source. Such a change is NOT REQUIRED to change anything related to sink A or B or the home client's video source.

Note that this specification does not define a mechanism by which a change to the remote client's video source could automatically trigger a change to the home client's video source. Implementations may choose to make such source-to-sink optimizations as long as they only do so within the constraints established by the application, as the next example demonstrates.

It is fairly obvious that changes to a given source will impact sink consumers. However, in some situations changes to a given sink may also cause mplementations to adjust a source's settings. This is illustrated in the following figures. In the first figure below, the home client's video source is sending a video stream sized at 1920 by 1200 pixels. The video source is also unconstrained, such that the exact source dimensions are flexible as far as the application is concerned. Two MediaStream objects contain tracks with the same deviceId, and those MediaStream s are connected to two different <video> element sinks A and B. Sink A has been sized to width="1920" and height="1200" and is displaying the source's video content without any transformations. Sink B has been sized smaller and, as a result, is scaling the video down to fit its rectangle of 320 pixels across by 200 pixels down.

Changing media stream sinks may affect sources: before the requested change

When the application changes sink A to a smaller dimension (from 1920 to 1024 pixels wide and from 1200 to 768 pixels tall), the browser's media pipeline may recognize that none of its sinks require the higher source resolution, and needless work is being done both on the part of the source and sink A. In such a case and without any other constraints forcing the source to continue producing the higher resolution video, the media pipeline MAY change the source resolution:

Changing media stream sinks may affect sources: after the requested change

In the above figure, the home client's video source resolution was changed to the greater of that from sink A and B in order to optimize playback. While not shown above, the same behavior could apply to peer connections and other sinks.

It is possible that constraints can be applied to a track which a source is unable to satisfy, either because the source itself cannot satisfy the constraint or because the source is already satisfying a conflicting constraint. When this happens, the promise returned from applyConstraints() will be rejected, without applying any of the new constraints. Since no change in constraints occurs in this case, there is also no required change to the source itself as a result of this condition. Here is an example of this behavior.

In this example, two media streams each have a video track that share the same source. The first track initially has no constraints applied. It is connected to sink N. Sink N has a resolution of 800 by 600 pixels and is scaling down the source's resolution of 1024 by 768 to fit. The other track has a mandatory constraint forcing off the source's fill light; it is connected to sink P. Sink P has a width and height equal to that of the source.

Overconstrained application

Now, the first track adds a mandatory constraint that the fill light should be forced on. At this point, both mandatory constraints cannot be satisfied by the source (the fill light cannot be simultaneously on and off at the same time). Since this state was caused by the first track's attempt to apply a conflicting constraint, the constraint application fails and there is no change in the source's settings nor to the constraints on either track.

Let's look at a slightly different situation starting from the same point. In this case, instead of the first track attempting to apply a conflicting constraint, the user physically locks the camera into a mode where the fill light is on. At this point the source can no longer satisfy the second track's mandatory constraint that the fill light be off. The second track is transitioned into the muted state and receives an overconstrained event. At the same time, the source notes that its remaining active sink only requires a resolution of 800 by 600 and so it adjusts its resolution down to match (this is an optional optimization that the User Agent is allowed to make given the situation).

Overconstrained occurrence

At this point, it is the responsibility of the application to address the problem that led to the overconstrained situation, perhaps by removing the fill light mandatory constraint on the second track or by closing the second track altogether and informing the user.

MediaStreams as Media Elements

A MediaStream may be assigned to media elements as defined in HTML5 [[HTML5]] A MediaStream is not preloadable or seekable and represents a simple, potentially infinite, linear media timeline. The timeline starts at 0 and increments linearly in real time as long as the MediaStream is playing. The timeline does not increment when the MediaStream is paused.

Direct Assignment to Media Elements

User Agents that support this specification MUST support the following partial interface, which allows a MediaStream to be assigned directly to a media element.

attribute MediaStream? srcObject

Holds the MediaStream that provides media for this element. This attribute overrides both the src attribute and any <source> elements. Specifically, if srcObject is specified, the User Agent MUST use it as the source of media, even if the src attribute is also set or <source> children are present. If the value of srcObject is replaced or set to null the User Agent MUST re-run the media element load algorithm

We may want to allow direct assignment of other types as well

Loading and Playing a MediaStream in a Media Element

The User Agent runs the media element load algorithm to obtain media for the media element to display. As defined in the [[HTML5]] specification, this algorithm has two basic phases: resource selection algorithm chooses the resource to play and resolves its URI. Then the resource fetch phase loads the resource. Both these phases are potentially simplified when using a MediaStream. First of all, srcObject takes priority over other means of specifying the resource, and it provides the object itself rather than a URI. Therefore, there is no need to run the resource selection algorithm. Secondly, when the User Agent reaches the resource fetch algorithm with a MediaStream, the MediaStream is a local object so there's nothing to fetch. Therefore, the following modifications/restrictions to the media element load algorithm apply:

Media Element Attributes when Playing a MediaStream

The nature of the MediaStream places certain restrictions on the behavior and attribute values of the associated media element and on the operations that can be performed on it, as shown below:

Legal values for the properties of a media element bound to a MediaStream
Attribute Name Attribute Type Valid Values When Using a MediaStream Additional considerations
currentSrc DOMString the empty string When srcObject is specified the User Agent MUST set this to the empty string.
preload DOMString none A MediaStream cannot be preloaded.
buffered TimeRanges buffered.length MUST return 0. A MediaStream cannot be preloaded. Therefore, the amount buffered is always an empty TimeRange.
networkState unsigned short NETWORK_IDLE The media element does not fetch the MediaStream so there is no network traffic.
readyState unsigned short HAVE_NOTHING, HAVE_ENOUGH_DATA A MediaStream may be created before there is any data available, for example when a stream is received from a remote peer. The value of the readyState of the media element MUST be HAVE_NOTHING before the first media arrives and HAVE_ENOUGH_DATA once the first media has arrived.
currentTime double Any non-negative integer. The initial value is 0 and the values increments linearly in real time whenever the stream is playing. The value is the current stream position, in seconds. On any attempt to set this attribute, the User Agent must throw an InvalidStateError exception.
duration unrestricted double Infinity A MediaStream does not have a pre-defined duration.
seeking boolean false A MediaStream is not seekable. Therefore, this attribute MUST always have the value false.
defaultPlaybackRate double 1.0 A MediaStream is not seekable. Therefore, this attribute MUST always have the value 1.0 and any attempt to alter it MUST fail.
playbackRate double 1.0 A MediaStream is not seekable. Therefore, this attribute MUST always have the value 1.0 and any attempt to alter it MUST fail.
played TimeRanges played.length MUST return 1.
played.start(0) MUST return 0.
played.end(0) MUST return the last known currentTime .
A MediaStream's timeline always consists of a single range, starting at 0 and extending up to the currentTime.
seekable TimeRanges seekable.length MUST return 0. A MediaStream is not seekable.
loop boolean true, false Setting the loop attribute has no effect since a MediaStream has no defined end and therefore cannot be looped.

Error Handling

All promises in this specification, when they are rejected, are rejected with an object that implements the MediaStreamError interface.

MediaStreamError

All errors defined in this specification implement the following interface:

readonly attribute DOMString name

The name of the error

readonly attribute DOMString? message

A User Agent-dependent string offering extra human-readable information about the error.

readonly attribute DOMString? constraintName

This attribute is only used for some types of errors. For MediaStreamError with a name of ConstraintNotSatisfiedError or of OverconstrainedError, this attribute MUST be set to the name of the constraint that caused the error.

We use MediaStreamError rather than deriving from Error until the situation with adding additional information into an error has been clarified.
Do we want to allow the constraintName attribute to contain multiple constraint names? In many cases the error is raised as soon as a single unsatisfied mandatory constraint is found, but in others it may be possible to determine that multiple constraints are not satisfied.

The following interface is defined for cases when a MediaStreamError is raised as an event:

Constructor(DOMString type, MediaStreamErrorEventInit eventInitDict)

Constructs a new MediaStreamErrorEvent.

readonly attribute MediaStreamError? error

The MediaStreamError describing the error that triggered the event (if any).

MediaStreamError? error = null

The MediaStreamError describing the error associated with the event (if any)

Error names

The table below lists the error names defined in this specification.

MediaStreamError names
Name Description Note
NotSupportedError The operation is not supported. Same as defined in [[DOM4]]
PermissionDeniedError The user did not grant permission for the operation.
ConstraintNotSatisfiedError One of the mandatory Constraints could not be satisfied. The constraintName attribute gets set to the name of the constraint that caused the error
OverconstrainedError Due to changes in the environment, one or more mandatory constraints can no longer be satisfied. The constraintName attribute gets set to the name of the constraint that caused the error
NotFoundError The object can not be found here. Same as defined in [[DOM4]]
AbortError The operation was aborted. Same as defined in [[DOM4]]
SourceUnavailableError The source of the MediaStream could not be accessed due to a hardware error (e.g. lock from another process).

Event summary

The following event fires on MediaStream objects:

Event name Interface Fired when...
active Event The MediaStream became active (see inactive).
inactive Event The MediaStream became inactive.
addtrack MediaStreamTrackEvent A new MediaStreamTrack has been added to this stream. Note that this event is not fired when the script directly modifies the tracks of a MediaStream.
removetrack MediaStreamTrackEvent A MediaStreamTrack has been removed from this stream. Note that this event is not fired when the script directly modifies the tracks of a MediaStream.

The following event fires on MediaStreamTrack objects:

Event name Interface Fired when...
mute Event The MediaStreamTrack object's source is temporarily unable to provide data.
unmute Event The MediaStreamTrack object's source is live again after having been temporarily unable to provide data.
overconstrained MediaStreamErrorEvent

This error event fires for each affected track (when multiple tracks share the same source) after the user agent has evaluated the current constraints against a given deviceId and is not able to configure the source within the limitations established by the union of imposed constraints.

Due to being over-constrained, the User Agent must mute each affected track.

The affected track(s) will remain muted until the application adjusts the constraints to accommodate the source's current effective capabilities.

ended MediaStreamErrorEvent

The MediaStreamTrack object's source will no longer provide any data, either because the user revoked the permissions, or because the source device has been ejected, or because the remote peer permanently stopped sending data.

When the end of MediaStreamTrack is the result of an error, the error attribute of the event object is set to describe the said error.

The following event fires on MediaDevices objects:

Event name Interface Fired when...
devicechange Event The set of media devices, available to the User Agent, has changed. The current list devices can be retrieved with the enumerateDevices() method.

Enumerating Local Media Devices

This section describes an API that the script can use to query the User Agent about connected media input and output devices (for example a web camera or a headset).

NavigatorUserMedia

readonly attribute MediaDevices mediaDevices

Returns the MediaDevices object associated with this Navigator object.

MediaDevices

The MediaDevices object which is the entry point to the API used to examine and get access to media devices available to the User Agent.

When a new media input or output device is made available, the user agent MUST queue a task fires a simple event named devicechange at the MediaDevices object.

attribute EventHandler ondevicechange

This event handler, of type devicechange, is executed when the set of media devices available to the user agent has changed.

Promise<sequence<MediaDeviceInfo>> enumerateDevices ()

Collects information about the User Agents available media input and output devices.

Returns a promise. The promise will be fulfilled with a sequence of MediaDeviceInfo dictionaries representing the User Agent's available media input and output devices if enumeration is successful.

Camera and microphone sources should be enumerable. Specifications that add additional types of source will provide recommendations about whether the source type should be enumerable.

When the enumerateDevices() method is called, the User Agent must run the following steps:

  1. Let p be a new promise.

  2. Run the following steps asynchronously:

    1. Let resultList be an empty list.

    2. If this method has been called previously within this application session, let oldList be the list of MediaDeviceInfo objects that was produced at that call (resultList); otherwise, let oldList be an empty list.

    3. Probe the User Agent for available media devices, and run the following sub steps for each discovered device, device:

      1. If device is represented by a MediaDeviceInfo object in oldList, append that object to resultList, abort these steps and continue with the next device (if any).

      2. Let deviceInfo be a new MediaDeviceInfo object to represent device.

      3. If device belongs to the same physical device as a device, already represented in oldList or resultList, initialize deviceInfo's groupId member to the groupId value of the existing MediaDeviceInfo object. Otherwise, let deviceInfo's groupId member be a newly generated unique identifier.

      4. Append deviceInfo to resultList.

    4. If none of the local devices are attached to an active MediaStreamTrack in the current browsing context, and if no persistent permission to access these local devices have been granted to the page's origin, let filteredList be a copy of resultList, and all its elements, where the label member is the empty string.

    5. If filteredList is a non-empty list, then resolve p with filteredList. Otherwise, resolve p with resultList.

  3. Return p.

Access control model

The algorithm described above means that the access to media device information depends on whether or not permission has been granted to the page's origin to use media devices.

If no such access has been granted, the MediaDeviceInfo dictionary will contain the deviceId, kind, and groupId.

If access has been granted for a media device, the MediaDeviceInfo dictionary will contain the deviceId, kind, label, and groupId.

Device Info

readonly attribute DOMString deviceId

A unique identifier for the represented device.

All enumerable devices have an identifier that MUST be unique to the application and persistent across sessions. Unique and stable identifiers let the application save, identify the availability of, and directly request specific sources.

This identifier MUST be un-guessable by other applications to prevent the identifier being used to correlate the same user across different applications.

Since deviceId persists across sessions and to reduce its potential as a fingerprinting mechanism, deviceId is to be treated as other persistent storage mechanisms such as cookies [[COOKIES]]. User agents should reset per-application device identifiers when other persistent storages are cleared.

readonly attribute MediaDeviceKind kind

Describes the kind of the represented device.

readonly attribute DOMString label

A label describing this device (for example "External USB Webcam"). If the device has no associated label, then this attribute MUST return the empty string.

readonly attribute DOMString groupId

Returns the group identifier of the represented device. Two devices have the same group identifier if they belong to the same physical device; for example a monitor with a built in camera and microphone.

audioinput

Represents an audio input device; for example a microphone.

audiooutput

Represents an audio output device; for example a pair of headphones.

videoinput

Represents a video input device; for example a webcam.

Obtaining local multimedia content

This section extends NavigatorUserMedia and MediaDevices with APIs to request permission to access media input devices available to the User Agent.

When on an insecure origin [[mixed-content]], user agents are encouraged to warn about usage of MediaDevices.getUserMedia, navigator.getUserMedia, and any prefixed variants in their developer tools, error logs, etc. It is explicitly permitted for user agents to remove these APIs entirely when on an insecure origin, as long as they remove all of them at once (e.g., they should not leave just the prefixed version available on insecure origins.)

NavigatorUserMedia Interface Extensions

The definition of getUserMedia() in this section reflects two major changes from the method definition that has existed here for many months.

First, the official definition for the getUserMedia() method, and the one which developers are encouraged to use, is now at MediaDevices.getUserMedia(). This decision reflected consensus as long as the original API remained available here under the Navigator object for backwards compatibility reasons, since the working group acknowledges that early users of these APIs have been encouraged to define getUserMedia as "var getUserMedia = navigator.getUserMedia || navigator.webkitGetUserMedia || navigator.mozGetUserMedia;" in order for their code to be functional both before and after official implementations of getUserMedia() in popular browsers. To ensure functional equivalence, the getUserMedia() method here is defined in terms of the method under MediaDevices.

Second, the decision to change all other callback-based methods in the specification to be based on Promises instead required that the Navigator.getUserMedia() definition reflect this in its use of the MediaDevices.getUserMedia() method. Because Navigator.getUserMedia() is now the only callback-based method remaining in the specification, there is ongoing discussion as to a) whether it still belongs in the specification, and b) if it does, whether its syntax should remain callback-based or change in some way to use Promises. Input on these questions is encouraged, particularly from developers actively using today's implementations of this functionality.

Note that the other methods that changed from a callback-based syntax to a Promises-based syntax were not considered to have been implemented widely enough in any form to have to consider legacy usage.

void getUserMedia(MediaStreamConstraints constraints, NavigatorUserMediaSuccessCallback successCallback, NavigatorUserMediaErrorCallback errorCallback)

Prompts the user for permission to use their Web cam or other video or audio input.

The constraints argument is a dictionary of type MediaStreamConstraints.

The successCallback will be invoked with a suitable MediaStream object as its argument if the user accepts valid tracks as described in MediaDevices.getUserMedia().

The errorCallback will be invoked if there is a failure in finding valid tracks or if the user denies permission, as described in MediaDevices.getUserMedia().

When the getUserMedia() method is called, the User Agent MUST run the following steps:

  1. Let constraints be the method's first argument.

  2. Let successCallback be the callback indicated by the method's second argument.

  3. Let errorCallback be the callback indicated by the method's third argument.

  4. Invoke MediaDevices.getUserMedia() with constraints as the argument, and let p be the resulting promise.

  5. Upon fulfillment of p with value stream, run the following step:

    1. Invoke successCallback with stream as the argument.

  6. Upon rejection of p with reason r, run the following step:

    1. Invoke errorCallback with r as the argument.

MediaDevices Interface Extensions

The definition of getUserMedia() in this section reflects two major changes from the method definition that has existed under NavigatorUserMedia for many months.

First, the official definition for the getUserMedia() method, and the one which developers are encouraged to use, is now the one defined here under MediaDevices. This decision reflected consensus as long as the original API remained available at NavigatorUserMedia.getUserMedia() under the Navigator object for backwards compatibility reasons, since the working group acknowledges that early users of these APIs have been encouraged to define getUserMedia as "var getUserMedia = navigator.getUserMedia || navigator.webkitGetUserMedia || navigator.mozGetUserMedia;" in order for their code to be functional both before and after official implementations of getUserMedia() in popular browsers. To ensure functional equivalence, the getUserMedia() method under NavigatorUserMedia is defined in terms of the method here.

Second, the method defined here is Promises-based, while the one defined under NavigatorUserMedia is currently still callback-based. Developers expecting to find getUserMedia() defined under NavigatorUserMedia are strongly encouraged to read the detailed Note given there.

The getSupportedConstraints method is provided to allow the application to determine which constraints the User Agent recognizes.

MediaTrackSupportedConstraints getSupportedConstraints()

Returns a dictionary whose members are the constrainable properties known to the User Agent. A supported constrainable property MUST be represented by a member whose name is the constraint name and whose value is true. Any constrainable properties not supported by the User Agent MUST not be present in the returned dictionary. The values returned represent what the browser implements and will not change during a session.

Promise<MediaStream> getUserMedia( MediaStreamConstraints constraints)

Prompts the user for permission to use their Web cam or other video or audio input.

(Remove when other issues are removed. This is only here to keep the issues from being renumbered)

The constraints argument is a dictionary of type MediaStreamConstraints.

Returns a promise. The promise will be fulfilled with a suitable MediaStream object if the user accepts valid tracks as described below.

The promise will be rejected if there is a failure in finding valid tracks or if the user denies permission, as described below.

When the getUserMedia() method is called, the User Agent MUST run the following steps:

  1. Let p be a new promise.

  2. Let constraints be the method's first argument.

  3. Run the following steps asynchronously:

    1. Let requestedMediaTypes be the set of media types in constraints with either a dictionary value or a value of "true".

    2. If requestedMediaTypes is the empty set, let error be a new MediaStreamError object whose name attribute has the value NotSupportedError and jump to the step labeled Error Task below.

    3. Let finalSet be an (initially) empty set.

    4. For each media type T in requestedMediaTypes,

      1. For each possible source for that media type, construct an unconstrained MediaStreamTrack with that source as its source.

        Call this set of tracks the candidateSet.

      2. If the value of the T entry of constraints is "true", set CS to the empty constraint set (no constraint). Otherwise, continue with CS set to the value of the T entry of constraints.
      3. Run the SelectSettings algorithm on each track in CandidateSet with CS as the constraint set. If the algorithm does not return undefined, add the track to finalSet. This eliminates devices unable to satisfy the constraints, by verifying that at least one settings dictionary exists that satisfies the constraints.

      If finalSet is the empty set, let error be a new MediaStreamError object whose name attribute has the value NotFoundError and jump to the step labeled Error Task below.

    5. Optionally, e.g., based on a previously-established user preference, for security reasons, or due to platform limitations, jump to the step labeled Permission Failure below.

    6. Prompt the user in a User Agent specific manner for permission to provide the entry script's origin with a MediaStream object representing a media stream.

      The provided media MUST include precisely one track of each media type in requestedMediaTypes from the finalSet. The decision of which tracks to choose from the finalSet is completely up to the user agent and may be determined by asking the user. Once selected, the source of a MediaStreamTrack MUST not change.

      The user agent MAY use the value of the computed "fitness distance" from the SelectSettings algorithm, or any other internally-available information about the devices, as an input to the selection algorithm.

      User Agents are encouraged to default to using the user's primary or system default camera and/or microphone (when possible) to generate the media stream. User Agents MAY allow users to use any media source, including pre-recorded media files.

      If the user grants permission to use local recording devices, User Agents are encouraged to include a prominent indicator that the devices are "hot" (i.e. an "on-air" or "recording" indicator), as well as a "device accessible" indicator indicating that the page has been granted access to the source.

      If the user denies permission, jump to the step labeled Permission Failure below. If the user never responds, this algorithm stalls on this step.

      If the user grants permission but a hardware error such as an OS/program/webpage lock prevents access, jump to the step labeled Unavailable Failure below.

      If the user grants permission but device access fails for any reason other than those listed above, jump to the step labeled General Failure below.

    7. Let stream be the MediaStream object for which the user granted permission.

    8. Run the ApplyConstraints() algorithm on all tracks in stream with the appropriate constraints.

    9. Resolve p with stream.

    10. Abort these steps.

    11. Permission Failure: Let error be a new MediaStreamError object whose name attribute has the value PermissionDeniedError and jump to the step labeled Error Task below.

    12. Constraint Failure: Let error be a new MediaStreamError object whose name attribute has the value ConstraintNotSatisfiedError and whose constraintName attribute is set to the name of the constraint that caused the error.

    13. Unavailable Failure: Let error be a new MediaStreamError object whose name attribute has the value SourceUnavailableError and jump to the step labeled Error Task below.

    14. General Failure: Let error be a new MediaStreamError object whose name attribute has the value AbortError and jump to the step labeled Error Task below.

    15. Error Task: Reject p with error.

  4. Return p.

In the algorithm above, constraints are checked twice - once at device selection, and once after access approval. Time may have passed between those checks, so it is concievable that the selected device is no longer suitable. In this case, a SourceUnavailable error will result.

MediaStreamConstraints

The MediaStreamConstraints dictionary is used to instruct the User Agent what sort of MediaStreamTracks to include in the MediaStream returned by getUserMedia().

(boolean or MediaTrackConstraints) video = false

If true, it requests that the returned MediaStream contain a video track. If a Constraints structure is provided, it further specifies the nature and settings of the video Track. If false, the MediaStream MUST not contain a video Track.

(boolean or MediaTrackConstraints) audio = false

If true, it requests that the returned MediaStream contain an audio track. If a Constraints structure is provided, it further specifies the nature and settings of the audio Track. If false, the MediaStream MUST not contain an audio Track.

NavigatorUserMediaSuccessCallback

MediaStream stream

Add explanation of handleEvent

NavigatorUserMediaErrorCallback

MediaStreamError error

Add explanation of handleEvent

Implementation Suggestions

Resource reservation

The User Agent is encouraged to reserve resources when it has determined that a given call to getUserMedia() will be successful. It is preferable to reserve the resource prior to resolving the returned promise. Subsequent calls to getUserMedia() (in this page or any other) should treat the resource that was previously allocated, as well as resources held by other applications, as busy. Resources marked as busy should not be provided as sources to the current web page, unless specified by the user. Optionally, the user agent may choose to provide a stream sourced from a busy source but only to a page whose origin matches the owner of the original stream that is keeping the source busy.

This document recommends that in the permission grant dialog or device selection interface (if one is present), the user be allowed to select any available hardware as a source for the stream requested by the page (provided the resource is able to fulfill any specified mandatory constraints). Although not specifically recommended as best practice, note that some User Agents may support the ability to substitute a video or audio source with local files and other media. A file picker may be used to provide this functionality to the user.

This document also recommends that the user be shown all resources that are currently busy as a result of prior calls to getUserMedia() (in this page or any other page that is still alive) and be allowed to terminate that stream and utilize the resource for the current page instead. If possible in the current operating environment, it is also suggested that resources currently held by other applications be presented and treated in the same manner. If the user chooses this option, the track corresponding to the resource that was provided to the page whose stream was affected must be removed.

Stored Permissions

When permission is requested for a device, the User Agent may choose to store that permission, if granted, for later use by the same origin, so that the user does not need to grant permission again at a later time. Such storing MUST only be done when the page is secure (served over HTTPS and having no mixed content). It is a User Agent choice whether it offers functionality to store permission to each device separately, all devices of a given class, or all devices; the choice needs to be apparent to the user, and permission must have been granted for the entire set whose permission is being stored, e.g., to store permission to use all cameras the user must have given permission to use all cameras and not just one.

When permission is not stored, permission should last only until such time as all MediaStreamTracks sourced from that device have been stopped.

Handling multiple devices

A MediaStream may contain more than one video and audio track. This makes it possible to include video from two or more webcams in a single stream object, for example. However, the current API does not allow a page to express a need for multiple video streams from independent sources.

It is recommended for multiple calls to getUserMedia() from the same page be allowed as a way for pages to request multiple discrete video and/or audio streams.

Note also that if multiple getUserMedia() calls are done by a page, the order in which they request resources, and the order in which they complete, is not constrained by this specification.

A single call to getUserMedia() will always return a stream with either zero or one audio tracks, and either zero or one video tracks. If a script calls getUserMedia() multiple times before reaching a stable state, this document advises the UI designer that the permission dialogs should be merged, so that the user can give permission for the use of multiple cameras and/or media sources in one dialog interaction. The constraints on each getUserMedia call can be used to decide which stream gets which media sources.

Constrainable Pattern

The Constrainable pattern allows applications to inspect and adjust the properties of objects implementing it. It is broken out as a separate set of definitions so that it can be referred to by other specifications. The core concept is the Capability, which consists of a constrainable property of an object and the set of its possible values, which may be specified either as a range or as an enumeration. For example, a camera might be capable of framerates (a property) between 20 and 50 frames per second (a range) and may be able to be positioned (a property) facing towards the user, away from the user, or to the left or right of the user (an enumerated set). The application can examine a constrainable property's supported Capabilities via the getCapabilities() accessor.

The application can select the (range of) values it wants for an object's Capabilities by means of basic and/or advanced ConstraintSets and the applyConstraints() method. A ConstraintSet consists of the names of one or more properties of the object plus the desired value (or a range of desired values) for each property. Each of those property/value pairs can be considered to be an individual constraint. For example, the application may set a ConstraintSet containing two constraints, the first stating that the framerate of a camera be between 30 and 40 frames per second (a range) and the second that the camera should be facing the user (a specific value). How the individual constraints interact depends on whether and how they are given in the basic Constraint structure, which is a ConstraintSet with an additional 'advanced' property, or whether they are in a ConstraintSet in the advanced list. The behavior is as follows: all 'min', 'max', and 'exact' constraints in the basic Constraint structure are together treated as the 'required' set, and if it is not possible to satisfy simultaneously all of those individual constraints for the indicated property names, the User Agent MUST reject the returned promise. Otherwise, it must apply the required constraints. Next, it will consider any ConstraintSets given in the 'advanced' list, in the order in which they are specified, and will try to satisfy/apply each complete ConstraintSet (i.e., all constraints in the ConstraintSet together), but will skip a ConstraintSet if and only if it cannot satisfy/apply it in its entirety. Next, the User Agent MUST attempt to apply, individually, any 'ideal' constraints or a constraint given as a bare value for the property. Of these properties, it MUST satisfy the largest number that it can, in any order. Finally, the User Agent MUST resolve the returned promise.

Any constraint provided via this API will only be considered if the given constrainable property is supported by the browser. JavaScript application code is expected to first check, via getSupportedConstraints(), that all the named properties that are used are supported by the browser. The reason for this is that WebIDL drops any unsupported names from the dictionary holding the constraints, so the browser does not see them and the unsupported names end up being silently ignored. This will cause confusing programming errors as the JavaScript code will be setting constraints but the browser will be ignoring them. Browsers that support (recognize) the name of a required constraint but cannot satisfy it will generate an error, while browsers that do not support the constrainable property will not generate an error.

The following examples may help to understand how constraints work. The first shows a basic Constraint structure. Three constraints are given, each of which the User Agent will attempt to satisfy individually. Depending upon the resolutions available for this camera, it is possible that not all three constraints can be satisfied at the same time. If so, the User Agent will satisfy two if it can, or only one if not even two constraints can be satisfied together. Note that if not all three can be satisfied simultaneously, it is possible that there is more than one combination of two constraints that could be satisfied. If so, the user agent will choose.

var supports = navigator.mediaDevices.getSupportedConstraints();
if(!supports["aspectRatio"]) {
  // Treat like an error.
}
 var constraints =
  {
    width: 1280,
    height: 720,
    aspectRatio: 1.5
  };

This next example adds a small bit of complexity. The ideal values are still given for width and height, but this time with minimum requirements on each as well that must be satisfied. If it cannot satisfy either the width or height minimum it will reject the promise. Otherwise, it will try to satisfy the width, height, and aspectRatio target values as well and then resolve the promise.

var supports = navigator.mediaDevices.getSupportedConstraints();
if(!supports["aspectRatio"]) {
  // Treat like an error.
}
 var constraints =
  {
    width: {min: 640, ideal: 1280},
    height: {min: 480, ideal: 720},
    aspectRatio: 1.5
  };

This example illustrates the full control possible with the Constraints structure by adding the 'advanced' property. In this case, the User Agent behaves the same way with respect to the required constraints, but before attempting to satisfy the ideal values it will process the 'advanced' list. In this example the 'advanced' list contains two ConstraintSets. The first specifies width and height constraints, and the second specifies an aspectRatio constraint. Note that in the advanced list, these bare values are treated as 'exact' values. This example represents the following: "I need my video to be at least 640 pixels wide and at least 480 pixels high. My preference is for precisely 1920x1280, but if you can't give me that, give me an aspectRatio of 4x3 if at all possible. If even that is not possible, give me a resolution as close to 1280x720 as possible."

var supports = navigator.mediaDevices.getSupportedConstraints();
if(!supports["width"] || !supports["height"]) {
  // Treat like an error.
}
 var constraints =
  {
    width: {min: 640, ideal: 1280},
    height: {min: 480, ideal: 720},
    advanced: [{width: 1920, height: 1280},
               {aspectRatio: 1.3333333333}]
  };

The ordering of advanced ConstraintSets is significant. In the preceding example it is impossible to satisfy both the 1920x1280 ConstraintSet and the 4x3 aspect ratio ConstraintSet at the same time. Since the 1920x1280 occurs first in the list, the User Agent will attempt to satisfy it first. Application authors can therefore implement a backoff strategy by specifying multiple optional ConstraintSets for the same property. For example, an application might specify three optional ConstraintSets, the first asking for a framerate greater than 500, the second asking for a framerate greater than 400, and the third asking for one greater than 300. If the User Agent is capable of setting a framerate greater than 500, it will (and the subsequent two ConstraintSets will be trivially satisfied). However, if the User Agent cannot set the framerate above 500, it will skip that ConstraintSet and attempt to set the framerate above 400. If that fails, it will then try to set it above 300. If the User Agent cannot satisfy any of the three ConstraintSets, it will set the framerate to any value it can get. If the developer wanted to insist on 300 as a lower bound, he could provide that as a 'min' value in the basic ConstraintSet. In that case, the User Agent would fail altogether if it couldn't get a value over 300, but would choose a value over 500 if possible, then try for a value over 400.

Note that, unlike basic constraints, the constraints within a ConstraintSet in the advanced list must be satisfied together or skipped together. Thus, {width: 1920, height: 1280} is a request for that specific resolution, not a request for that width or that height. One can think of the basic constraints as requesting an or (non-exclusive) of the individual constraints, while each advanced ConstraintSet is requesting an and of the individual constraints in the ConstraintSet. An application may inspect the full set of Constraints currently in effect via the getConstraints() accessor.

The specific value that the User Agent chooses for a constrainable property is referred to as a Setting. For example, if the application applies a ConstraintSet specifying that the framerate must be at least 30 frames per second, and no greater than 40, the Setting can be any intermediate value, e.g., 32, 35, or 37 frames per second. The application can query the current settings of the object's constrainable properties via the getSettings() accessor.

Interface Definition

Due to the limitations of the interface definition language used in this specification, it is not possible for other interfaces to inherit or implement ConstrainablePattern. Therefore the WebIDL definitions given are only templates to be copied. Each interface that wishes to make use of the functionality defined here will have to provide its own copy of the WebIDL for the functions and interfaces given here. However it can refer to the semantics defined here, which will not change. See MediaStreamTrack Interface Definition for an example of this.

Capabilities getCapabilities()

The getCapabilities() method returns the dictionary of the names of the constrainable properties that the object supports.

It is possible that the underlying hardware may not exactly map to the range defined in the registry entry. Where this is possible, the entry SHOULD define how to translate and scale the hardware's setting onto the values defined in the entry. For example, suppose that a registry entry defines a hypothetical fluxCapacitance property that ranges from -10 (min) to 10 (max), but there are common hardware devices that support only values of "off" "medium" and "full". The registry entry might specify that for such hardware, the user agent should map the range value of -10 to "off", 10 to "full", and 0 to "medium". It might also indicate that given a ConstraintSet imposing a strict value of 3, the User Agent should attempt to set the value of "medium" on the hardware, and and that getSettings() should return a fluxCapacitance of 0, since that is the value defined as corresponding to "medium".

Constraints getConstraints()

The getConstraints method returns the Constraints that were the argument to the most recent successful call of applyConstraints(), maintaining the order in which they were specified. Note that some of the optional ConstraintSets returned may not be currently satisfied. To check which ConstraintSets are currently in effect, the application should use getSettings.

Settings getSettings()

The getSettings() method returns the current settings of all the constrainable properties of the object, whether they are platform defaults or have been set by applyConstraints(). Note that the actual setting of a property MUST be a single value.

Promise<void> applyConstraints()
Constraints constraints

A new constraint structure to apply to this object.

The applyConstraints() algorithm for applying constraints is stated below. Here are some preliminary definitions that are used in the statement of the algorithm:

We use the term settings dictionary for the set of values that might be applied as settings to the object.

We define the fitness distance between a settings dictionary and a constraint set CS as the sum, for each constraint provided for a constraint name in CS, of the following values:

  1. If the constraint is not supported by the browser, the fitness distance is 0.

  2. If the constraint is required ('min', 'max', or 'exact'), and the settings dictionary's value for the constraint does not satisfy the constraint, the fitness distance is positive infinity.

  3. If no ideal value is specified, the fitness distance is 0.
  4. For all positive numeric non-required constraints (such as height, width, frameRate, aspectRatio, sampleRate and sampleSize), the fitness distance is the result of the formula
    (actual == ideal) ? 0 : |actual -
                      ideal|/max(|actual|,|ideal|)
  5. For all string and enum non-required constraints (sourceId, groupId, facingMode, echoCancellation), the fitness distance is the result of the formula
    (actual == ideal) ? 0 : 1

More definitions:

  • We refer to each element of a ConstraintSet (other than the special term 'advanced') as a 'constraint' since it is intended to constrain the acceptable settings for the given property from the full list or range given in the corresponding Capability of the ConstrainablePattern object to a value that is within the range or list of values it specifies.
  • We refer to the "effective Capability" C of an object O as the possibly proper subset of the possible values of C (as returned by getCapabilities) taking into consideration environmental limitations and/or restrictions placed by other constraints. For example given a ConstraintSet that constrains the aspectRatio, height, and width properties, the values assigned to any two of the properties limit the effective Capability of the third. The set of effective Capabilities may be platform dependent. For example, on a resource-limited device it may not be possible to set properties P1 and P2 both to 'high', while on another less limited device, this may be possible.
  • A settings dictionary, which is a set of values for the constrainable properties of an object O, satisfies ConstraintSet CS if the fitness distance between the set and CS is less than infinity.
  • A set of ConstraintSets CS1...CSn (n >= 1) can be satisfied by an object O if it is possible to find a settings dictionary of O that satisfies CS1...CSn simultaneously.
  • To apply a set of ConstraintSets CS1...CSn to object O is to choose such a sequence of values that satisfy CS1...CSn and assign them as the settings for the properties of O.

We define the SelectSettings algorithm as follows:

  1. Each constraint specifies one or more values (or a range of values) for its property. A property MAY appear more than once in the list of 'advanced' ConstraintSets. If an empty object or list has been given as the value for a constraint, it MUST be interpreted as if the constraint were not specified (in other words, an empty constraint == no constraint).

    Note that unknown properties are discarded by WebIDL, which means that unknown/unsupported required constraints will silently disappear. To avoid this being a surprise, application authors are expected to first use the getSupportedConstraints() method as shown in the Examples below.

  2. Let object be the ConstrainablePattern object on which this algorithm is applied. Let copy be an unconstrained copy of object (i.e., copy should behave as if it were object with all ConstraintSets removed.)
  3. For every possible settings dictionary of copy compute its fitness distance, treating bare values of properties as ideal values. Let candidates be the set of settings dictionaries for which the fitness distance is finite.

  4. If candidates is empty, return undefined as the result of the function.

  5. Iterate over the 'advanced' ConstraintSets in newConstraints in the order in which they were specified. For each ConstraintSet:
    1. compute the fitness distance between it and each settings dictionary in candidates, treating bare values of properties as exact.

    2. If the fitness distance is finite for one or more settings dictionaries in candidates, keep those settings dictionaries in candidates, discarding others.

      If the fitness distance is infinite for all settings dictionaries in candidates, ignore this ConstraintSet.

  6. Select one settings dictionary from the list of possible settings, and return this as the result of the SelectSettings() algorithm. The UA SHOULD use the one with the smallest fitness distance, as calculated in step 3.

When applyConstraints is called, the User Agent MUST run the following steps:

  1. Let p be a new promise.

  2. Let newContraints be the argument to this function.

  3. Run the following steps asynchronously:

    1. Let successfulSettings be the result of running the SelectSettings algorithm with newConstraints as the constraint set.

    2. If successfulSettings is undefined, reject p with a new MediaStreamError with name ConstraintNotSatisfied and constraintName set to any of the required constraints that could not be satisfied, and abort these steps. existingConstraints remain in effect in this case.

    3. In a single operation, remove existingConstraints from object, apply newConstraints, and apply successfulSettings as the current settings.
    4. Finally, resolve p. From this point on until applyConstraints() is called successfully again, getConstraints() must return the newConstraints that were passed as an argument to this call.

  4. Return p.

Any implementation that has the same result as the algorithm above is an allowed implementation. For instance, the implementation may choose to keep track of the maximum and minimum values for a setting that are OK under the constraints considered, rather than keeping track of all possible values for the setting.

When picking a settings dictionary, the UA can use any information available to it. Examples of such information may be whether the selection is done as part of device selection in getUserMedia, whether the energy usage of the camera varies between the settings dictionaries, or whether using a settings dictionary will cause the device driver to apply resampling.

The User Agent MAY choose new settings for the constrainable properties of the object at any time. When it does so it MUST attempt to satisfy the current Constraints, in the manner described in the algorithm above.

attribute EventHandler onoverconstrained
This event handler, of type overconstrained, is executed when the User Agent is no longer able to satisfy the requiredConstraints from the currently valid Constraints.

When executed, the event handler is passed a MediaStreamErrorEvent as parameter, which references a MediaStreamError whose name is OverconstrainedError, and whose constraintName attribute is set to one of the requiredConstraints that can no longer be satisfied. The message attribute of the MediaStreamError SHOULD contain a string that is useful for debugging. The conditions under which this error might occur are platform and application-specific. For example, the user might physically manipulate a camera in a way that makes it impossible to provide a resolution that satisfies the constraints. The User Agent MAY take other actions as a result of the overconstrained situation.

An example of Constraints that could be passed into applyConstraints() or returned as a value of constraints is below. It uses the properties defined in the Track property registry.

var supports = navigator.mediaDevices.getSupportedConstraints();
if(!supports["facingMode"]) {
  // Treat like an error.
}
var constraints = {
  width: {
    min: 640
  },
  height: {
    min: 480
  },
  advanced: [{
      width: 650
    }, {
      width: {
        min: 650
      }
    }, {
      frameRate: 60
    }, {
      width: {
        max: 800
      }
    }, {
      facingMode: "user"
    }]
};

Here is another example, specifically for a video track where I must have a particular camera and have separate preferences for the width and height:

var supports = navigator.mediaDevices.getSupportedConstraints();
if(!supports["deviceId"]) {
  // Treat like an error.
}
var constraints = {
  deviceId: {exact: "20983-20o198-109283-098-09812"},
  advanced: [{
      width: {
        min: 800,
        max: 1200
      }
    }, {
      height: {
        min: 600
      }
    }]
};

And here's one for an audio track:

var supports = navigator.mediaDevices.getSupportedConstraints();
if(!supports["deviceId"] || !supports["volume"]) {
  // Treat like an error.
}
var constraints = {
  advanced: [{
      deviceId: "64815-wi3c89-1839dk-x82-392aa"
    }, {
      volume: 0.5
    }]
};

Here's an example of use of ideal:

var supports = navigator.mediaDevices.getSupportedConstraints();
if(!supports["aspectRatio"] || !supports["facingMode"]) {
  // Treat like an error.
}
var gotten = navigator.mediaDevices.getUserMedia({
  video: {
    width: {min: 320, ideal: 1280, max: 1920},
    height: {min: 240, ideal: 720, max: 1080},
    framerate: 30,     // Shorthand for ideal.
    // facingMode: "environment" would be optional.
    facingMode: {exact: "environment"}
  }});

Here's an example of "I want 720p, but I can accept up to 1080p and down to VGA.":

var supports = navigator.mediaDevices.getSupportedConstraints();
if(!supports["width"] || !supports["height"]) {
  // Treat like an error.
}
var gotten = navigator.mediaDevices.getUserMedia({video: {
  width: {min: 640, ideal: 1280, max: 1920},
  height: {min: 480, ideal: 720, max: 1080},
}});

Here's an example of "I want a front-facing camera and it must be VGA.":

var supports = navigator.mediaDevices.getSupportedConstraints();
if(!supports["width"] || !supports["height"] || !supports["facingMode"]) {
  // Treat like an error.
}
var gotten = navigator.mediaDevices.getUserMedia({video: {
  facingMode: {exact: "user"},
  width: {exact: 640},
  height: {exact: 480}
}});

The Property Registry

There is a single IANA registry that defines the constrainable properties of all objects that implement the Constrainable pattern. The registry entries MUST contain the name of each property along with its set of legal values. The registry entries for MediaStreamTrack are defined below. The syntax for the specification of the set of legal values depends on the type of the values. In addition to the standard atomic types (boolean, long, double, DOMString), legal values include lists of any of the atomic types, plus min-max ranges, as defined below.

List values MUST be interpreted as disjunctions. For example, if a property 'facingMode' for a camera is defined as having legal values ["left", "right", "user", "environment"], this means that 'facingMode' can have the values "left", "right", "environment", and "user". Similarly Constraints restricting 'facingMode' to ["user", "left", "right"] would mean that the User Agent should select a camera (or point the camera, if that is possible) so that "facingMode" is either "user", "left", or "right". This Constraint would thus request that the camera not be facing away from the user, but would allow the User Agent to allow the user to choose other directions.

double max

The maximum legal value of this property.

double min

The minimum value of this Property.

double exact

The exact required value for this property.

double ideal

The ideal (target) value for this property.

long max

The maximum legal value of this property.

long min

The minimum value of this property.

long exact

The exact required value for this property.

long ideal

The ideal (target) value for this property.

boolean exact

The exact required value for this property.

boolean ideal

The ideal (target) value for this property.

(DOMString or sequence<DOMString>) exact

The exact required value for this property.

(DOMString or sequence<DOMString>) ideal

The ideal (target) value for this property.

Capabilities

Capabilities is a dictionary containing one or more key-value pairs, where each key MUST be a constrainable property defined in the registry, and each value MUST be a subset of the set of values defined for that property in the registry. The exact syntax of the value expression depends on the type of the property, and its type is as defined in the Values column of the registry. The Capabilities dictionary specifies the subset of the constrainable properties and values from the registry that the User Agent supports. Note that a User Agent MAY support only a subset of the properties that are defined in the registry, and MAY support a subset of the set values for those properties that it does support. Note that Capabilities are returned from the User Agent to the application, and cannot be specified by the application. However, the application can control the Settings that the User Agent chooses for constrainable properties by means of Constraints.

An example of a Capabilities dictionary is shown below. This example is not very realistic in that a browser would actually be required to support more constrainable properties than just these.

{
  frameRate: {
    min: 1.0,
    max: 60.0
  },
  facingMode: ["user", "environment"]
}

The next example below points out that capabilities for range values provide ranges for individual constrainable properties, not combinations. This is particularly relevant for video width and height, since the ranges for width and height are reported separately. In the example, if the User Agent can only provide 640x480 and 800x600 resolutions the relevant capabilities returned would be:

{
  width: {
    min: 640,
    max: 800
  },
  height: {
    min: 480,
    max: 600
  },
  aspectRatio: {
    min: 1.3333333333,
    max: 1.3333333333
  }
}

Note in the example above that the aspectRatio would make clear that arbitrary combination of widths and heights are not possible, although it would still suggest that more than two resolutions were available.

Settings

Settings is a dictionary containing one or more key-value pairs. It MUST contain each key returned in getCapabilities(). There MUST be a single value for each key and the value MUST be a member of the set defined for that property by getCapabilities(). The Settings dictionary contains the actual values that the User Agent has chosen for the object's constrainable properties. The exact syntax of the value depends on the type of the property.

A conforming User Agent MUST support all the constrainable properties defined in this specification.

An example of a Settings dictionary is shown below. This example is not very realistic in that a browser would actually be required to support more constrainable properties than just these.

{
  frameRate: 30.0,
  facingMode: "user"
}

Constraints and ConstraintSet

Due to the limitations of WebIDL, interfaces implementing the Constrainable Pattern cannot simply subclass Constraints and ConstraintSet as they are defined here. Instead they must provide their own definitions that follow this pattern. See MediaTrackConstraints for an example of this.

Each member of a ConstraintSet corresponds to a constrainable property and specifies a subset of the property's legal Capability values. Applying a ConstraintSet instructs the User Agent to restrict the settings of the corresponding constrainable properties to the specified values or ranges of values. A given property MAY occur both in the basic Constraints set and in the advanced ConstraintSets list, and MAY occur at most once in each ConstraintSet in the advanced list.

sequence<ConstraintSet> advanced

The list of ConstraintSets that the User Agent MUST attempt to satisfy, in order, skipping only those that cannot be satisfied. The order of these ConstraintSets is significant. In particular, when they are passed as an argument to applyConstraints, the User Agent MUST try to satisfy them in the order that is specified. Thus if optional ConstraintSets C1 and C2 can be satisfied individually, but not together, then whichever of C1 and C2 is first in this list will be satisfied, and the other will not. The User Agent MUST attempt to satisfy all optional ConstraintSets in the list, even if some cannot be satisfied. Thus, in the preceding example, if optional constraint C3 is specified after C1 and C2, the User Agent will attempt to satisfy C3 even though C2 cannot be satisfied. Note that a given property name may occur only once in each ConstraintSet but may occur in more than one ConstraintSet.

Examples

This sample code exposes a button. When clicked, the button is disabled and the user is prompted to offer a stream. The user can cause the button to be re-enabled by providing a stream (e.g., giving the page access to the local camera) and then disabling the stream (e.g., revoking that access).

<input type="button" value="Start" onclick="start()" id="startBtn">
<script>
 var startBtn = document.getElementById('startBtn');

 function start() {
     navigator.mediaDevices.getUserMedia({
         audio: true,
         video: true
     }).then(gotStream).catch(logError);
     startBtn.disabled = true;
 }

 function gotStream(stream) {
     stream.oninactive = function () {
         startBtn.disabled = false;
     };
 }

 function logError(error) {
     log(error.name + ": " + error.message);
 }
</script>

This example allows people to take photos of themselves from the local video camera. Note that the Image Capture specification [[image-capture]] provides a simpler way to accomplish this.

<article>
 <style scoped>
  video { transform: scaleX(-1); }
  p { text-align: center; }
 </style>
 <h1>Snapshot Kiosk</h1>
 <section id="splash">
  <p id="errorMessage">Loading...</p>
 </section>
 <section id="app" hidden>
  <p><video id="monitor" autoplay></video> <canvas id="photo"></canvas>
  <p><input type=button value="&#x1F4F7;" onclick="snapshot()">
 </section>
 <script>
 var video = document.getElementById('monitor');
 var canvas = document.getElementById('photo');

 navigator.mediaDevices.getUserMedia({
     video: true
 }).then(function (stream) {
     video.srcObject = stream;
     stream.oninactive = noStream;
     video.onloadedmetadata = function () {
         canvas.width = video.videoWidth;
         canvas.height = video.videoHeight;
         document.getElementById('splash').hidden = true;
         document.getElementById('app').hidden = false;
     };
 }).catch(function (reason) {
     document.getElementById('errorMessage').textContent = 'No camera available.';
 });

 function snapshot() {
     canvas.getContext('2d').drawImage(video, 0, 0);
 }
 </script>
</article>

Privacy and Security Considerations

This section is non-normative; it specifies no new behavior, but instead summarizes information already present in other parts of the specification.

This document extends the Web platform with the ability to manage input devices for media - in this iteration, microphones, and cameras. It also allows the manipulation of audio output devices (speakers and headphones).

Without authorization (to the "drive-by web"), it offers the ability to tell how many devices there are of each class. The identifiers for the devices are designed to not be useful for a fingerprint that can track the user between origins, but the number of devices adds to the fingerprint surface. It recommends to treat the per-origin persistent identifier deviceId as other persistent storages (e.g. cookies) are treated.

When authorization is given, this document describes how to get access to, and use, media data from the devices mentioned. This data may be sensitive; advice is given that indicators should be supplied to indicate that devices are in use, but both the nature of authorization and the indicators of in-use devices are platform decisions.

Authorization may be given on a case-by-case basis, or be persistent. In the case of a case-by-case authorization, it is important that the user be able to say "no" in a way that prevents the UI from blocking user interaction until permission is given - either by offering a way to say a "persistent NO" or by not using a modal permissions dialog.

It is possible to use constraints so that the failure of a getUserMedia call will return information about devices on the system without prompting the user, which increases the surface available for fingerprinting. The User Agent should consider limiting the rate at which failed getUserMedia calls are allowed in order to limit this additional surface.

In the case of persistent authorization, it is important that it is easy to find the list of granted permissions and revoke permissions that the user wishes to revoke.

Once permission has been granted, the User Agent should make two things readily apparent to the user:

Developers of sites with persistent permissions should be careful that these permissions not be abused.

In particular, they should not make it possible to automatically send audio or video streams from authorized media devices to an end point that a third party can select.

Indeed, if a site offered URLs such as https://webrtc.example.org/?call=user that would automatically set up calls and transmit audio/video to user, it would be open for instance to the following abuse:

Users who have granted permanent permissions to https://webrtc.example.org/ could be tricked to send their audio/video streams to an attacker EvilSpy by following a link or being redirected to https://webrtc.example.org/?user=EvilSpy.

IANA Registrations

Track Constrainable Property Registrations

IANA is requested to register the following constrainable properties as specified in [[!RTCWEB-CONSTRAINTS]]:

The following constrainable properties are defined to apply to both video and audio MediaStreamTrack objects:

Property Name Values Notes
sourceType SourceTypeEnum The type of the source of the MediaStreamTrack. Note that the setting of this property is uniquely determined by the source that is attached to the Track. In particular, getCapabilities() will return only a single value for sourceType. This property can therefore be used for initial media selection with getUserMedia(). However, it is not useful for subsequent media control with applyConstraints(), since any attempt to set a different value will result in an unsatisfiable ConstraintSet.
deviceId DOMString The application-unique identifier for the source of the MediaStreamTrack. The same identifier MUST be valid between sessions of this application, but MUST also be different for other applications. Some sort of GUID is recommended for the identifier. Note that the setting of this property is uniquely determined by the source that is attached to the Track. In particular, getCapabilities() will return only a single value for deviceId. This property can therefore be used for initial media selection with getUserMedia(). However, it is not useful for subsequent media control with applyConstraints(), since any attempt to set a different value will result in an unsatisfiable ConstraintSet.
groupId DOMString The group identifier for the source of the MediaStreamTrack. Two devices have the same group identifier if they belong to the same physical device; for example, the audio input and output devices representing the speaker and microphone of the same headset would have the same groupId.

The following constrainable properties are defined to apply only to video MediaStreamTrack objects:

Property Name Values Notes
width ConstrainLong The width or width range, in pixels. As a capability, the range should span the video source's pre-set width values with min being the smallest width and max being the largest width.
height ConstrainLong The height or height range, in pixels. As a capability, the range should span the video source's pre-set height values with min being the smallest height and max being the largest height.
frameRate ConstrainDouble The exact frame rate (frames per second) or frame rate range. If this frame rate cannot be determined (e.g. the source does not natively provide a frame rate, or the frame rate cannot be determined from the source stream), then this value MUST refer to the User Agent's vsync display rate.
aspectRatio ConstrainDouble The exact aspect ratio (width in pixels divided by height in pixels, represented as a double rounded to the tenth decimal place) or aspect ratio range.
facingMode ConstrainDOMString This string (or each string, when a list) should be one of the members of VideoFacingModeEnum. The members describe the directions that the camera can face, as seen from the user's perspective. Note that getConstraints may not return exactly the same string for strings not in this enum. This preserves the possibility of using a future version of WebIDL enum for this property.
user

The source is facing toward the user (a self-view camera).

environment

The source is facing away from the user (viewing the environment).

left

The source is facing to the left of the user.

right

The source is facing to the right of the user.

Below is an illustration of the video facing modes in relation to the user.
Illustration of video facing modes in relation to user

The following constrainable properties are defined to apply only to audio MediaStreamTrack objects:

Property Name Values Notes
volume ConstrainDouble The volume or volume range, as a multiplier of the linear audio sample values. A volume of 0.0 is silence, while a volume of 1.0 is the maximum supported volume. A volume of 0.5 will result in an approximately 6 dBSPL change in the sound pressure level from the maximum volume. Note that any ConstraintSet that specifies values outside of this range of 0 to 1 can never be satisfied.
sampleRate ConstrainLong The sample rate in samples per second for the audio data.
sampleSize ConstrainLong The linear sample size in bits. This constraint can only be satisfied for audio devices that produce linear samples.
echoCancellation boolean When one or more audio streams is being played in the processes of various microphones, it is often desirable to attempt to remove the sound being played from the input signals recorded by the microphones. This is referred to as echo cancellation. There are cases where it is not needed and it is desirable to turn it off so that no audio artifacts are introduced. This allows applications to control this behavior.

Change Log

This section will be removed before publication.

Changes since February 2, 2015

  1. Added getUserMedia() implementation suggestion as discussed in issue #67
  2. Issue 139: Clarified in SelectSettings algorithm that an empty constraint is to be treated as no constraint.
  3. Issue 128: Clarified in SelectSettings that bare values mean exact in the advanced array and ideal otherwise.
  4. [PR#37/Bug 26654] Webidl in Constrainable
  5. [Issue #141] MediaDeviceInfo.label and .groupId should not be nullable
  6. [PR #150] Update text for how constraint dictionaries get extended

Changes since October 27, 2014

  1. Bug 26953: Added more detail to the definition of volume.
  2. Clarified in section 4.1 that synchronization is only an intention because some tracks cannot be synchronized.
  3. Introduced and made consistent use of the term 'constrainable property' everywhere we refer to a property which can have Capabilities, Constraints, and Settings.
  4. Changed constraint definition text using concepts and some direct text from PR 61.
  5. Bug 113 (old 25771): Explanation of constraints in GUM call. Rewrote algorithm with separate SelectSettings step, used both in GUM and in applyConstraints.

Changes since September 24, 2014

  1. Bug 25809: Added note warning about abuse of call-me URLs.
  2. Bug 26918: Added note on clearing deviceId when clearing cookies.
  3. Bug 25777: Added example of capabilities when only two video sizes are available.
  4. Bug 26654: Added ConstrainBoolean.
  5. Bug 26810: All callback-based methods have been converted to use Promises, except for the version of getUserMedia() defined under NavigatorUserMedia.

Changes since September 9, 2014

  1. Bug 22214: How long do permissions persist?
  2. Define algorithm for processing non-required constraints.
  3. Bug 24933: deviceId is not registered as constraints, so apps can't choose device based on the device enumeration
  4. Bug 25609: MediaStreamErrorEvent is incomplete

Changes since August 17, 2014

  1. Bug 25988: Need a list of MediaStreamError "name" values
  2. Bug 26623: Use commonest spelling of "cancellation"
  3. Bug 25767: Missing Ref to Image Capture spec
  4. Bug 22271: Terminology section should not have conformance requirements

Changes since July 4, 2014

  1. Bug 22251: Added new NotFoundError, AbortError, SourceUnavailable errors to gUM call.
  2. Bug 25786: User Agent allowance of files to be substituted for any input device is now permitted but not listed as best practice, i.e., no longer specifically recommended.

Changes since June 19, 2014

  1. Bug 22354: Added privacy and security section.
  2. Bug 25784: "on air" indication is underspecified - separated "access granted" and "on air" indicators.
  3. Bug 26192: add onoverconstrained to MediaStreamTrack
  4. Bug 25776: add groupID to MediaTrackConstraintSet
  5. Bug 25780: Clarify step 3 of MediaStream.clone
  6. Bug 25804: Change 'remote' attribute definition
  7. Bug 25650: In getUserMedia algorithm if user denies permission spec is wrongly redirecting to Constraint Failure.
  8. Bug 25605: Definition of MediaStreamTrackEvent is not complete
  9. Bug 25651: All the links in spec should redirect to specified contents without failure.
  10. Bug 25725: getUserMedia constraints should be non-nullable
  11. Bug 25763: does the ID really have to be exactly 36 char long?
  12. Bug 24934: invalid definition for the "seekable" attribute when MediaStream is set to srcObject.
  13. Removed MediaStreamTrack new state (sourceType none removed as a consequence) (as discussed in bug 25787).
  14. Bug 25801: Remove getNativeSettings()

Changes since May 7, 2014

  1. Clarified that skipping of optional/advanced ConstraintSets is only permitted if they cannot be satisfied, not merely because the User Agent wishes to.
  2. Bug 25855: Clarification about conformance requirements phrased as algorithms
  3. Bug 25803: Mark section entitled "The model: sources, sinks, constraints, and settings" as non-normative
  4. Bug 24015: Add callback to indicate when available media devices change (introduced Navigator.mediaDevices)
  5. Bug 25860: make sure we have a bug to have a getTracks that gives you all the tracks
  6. Bug 25884: applied constraint syntax consensus as realized in June 9 WG email from Peter Thatcher.
  7. Moved getSupportedConstraints() method to MediaDevices object.
  8. Added stricter requirements on the getSupportedConstraints() return value.
  9. Added issue note in Constrainable Pattern section that ideal is not yet defined.
  10. Added issue note for applyConstraints that how multiple unorderedConstraints are to be satisfied together is not yet defined.
  11. Added informative notes that WebIDL discards unknown required properties and that application authors need to use the getSupportedConstraints() method.
  12. Cleaned up the MediaStream API intro section (mainly MediaStream behavior that have moved to MediaStreamTrack).
  13. The concept of MediaStreamTrack with a detachable source is now used throughout the spec (removed language saying that a MST could be disassociated from its track).
  14. Moved peerIdentity related text to WebRTC.

Changes since March 21, 2014

  1. New webIDL for Constrainable and Constraints.
  2. Bug 24931: changed MediaError to MediaStreamError.
  3. Bug 23817: Redundant TOC headers 8.1 & 9.1
  4. Bug 25230: readyState attribute must be inherited while cloning a MediaStreamTrack
  5. Bug 25249: Source should be detached when a MediaStreamTrack stops for any reason other than stop
  6. Updated Event Summary section to match the spec regarding MediaStreamTrack.stop() (as discussed in bug 25248)
  7. Made the MediaStream() constructor behave like addTrack() WRT adding ended tracks (as discussed in bug 25250).
  8. Bug 25262: MediaStream Constructor algorithm must also check for MediaStreamTracks "ended" state while initializing "active" state.
  9. Bug 25276: Initialization for VideoTrack.selected attribute is missing while specifying steps for "Loading and Playing a MediaStream in a Media Element"
  10. Changed syntax of constraints to use 'require' and 'advanced' and support non-required, non-advanced constraints.
  11. Bug 25360: MediaStreamTrack should not be considered as ended just because remote peer stopped sending data.
  12. Bug 25275: VideoTrackList.selectedIndex initialization conflicts with HTML5 spec, "if no track is selected".
  13. Removed mentioning of MediaStream received from other peer (as discussed in bug 25361).
  14. Bug 22263: Clarify synchronization of tracks in a MediaStream
  15. Bug 25441: Overconstrained muted state should not link with MediaStreamTrack.readyState

Changes since February 18, 2014

  1. Bug 24928: Remove MediaStream state check from addTrack() algorithm.
  2. Bug 24930: Remove MediaStream state check from the removeTrack() algorithm.
  3. Added native settings to tracks.
  4. Removed videoMediaStreamTrack and audioMediaStreamTrack since they are no longer necessary.

Changes since December 25, 2013

  1. Make optional constraints a list of ConstraintSets. Make ConstraintSet an object.
  2. Remove noaccess, move peerIdentity
  3. Add constraints for sampleRate, sampleSize, and echoCancellation.
  4. Aligned text in remainder of document with Constrainable changes.
  5. Removed statements that constraints are not applied to read-only sources

Changes since November 5, 2013

  1. ACTION-25: Switch mediastream.inactive to mediastream.active.
  2. ACTION-26: Rewrite stop to only detach the track's source.
  3. Bug 22338: Arbitrary changing of tracks.
  4. Bug 23125: Use double rather than float.
  5. Bug 22712: VideoFacingMode enum needs an illustration.
  6. Moved constraints into a separate Constrainable interface.
  7. Created a separate section on error handling.

Changes since October 17, 2013

  1. Bug 23263: Add output device enumeration to GetSources
  2. Introduced the Constrainable interface.
  3. Change consensus note on constraints in IANA section.
  4. Removed createObjectURL.
  5. Bug 22209: Should not use MUST requirements on values provided by the developer.

Changes since August 24, 2013

  1. Bug 22269: Renamed getSourceInfos() to getSources() and made the result async.
  2. Bug 22229: Editorial input
  3. Bug 22243: Clarify readonly track
  4. Bug 22259: Disabled mediastreamtrack and state of media element
  5. Bug 22226: Remove check of same source from MediaStream constructor algorithm
  6. Replaced ended with inactive for MediaStream (resolves bug 21618).
  7. Bug 22264: MediaStream.ended set to true on creation
  8. Bug 22272: Permission revocation via MediaStreamTrack.stop()
  9. Bug 22248: Relationship between MediaStreamTrack and HTML5 VideoTrack/AudioTrack after MediaStream assignment
  10. Bug 22247: Setting loop attribute on a media element reading from a MediaStream

Changes since July 4, 2013

  1. Bug 21967: Added paragraph on MediaStreamTrack enabled state and updated cloning algorithm.
  2. Bug 22210: Make getUserMedia() algorithm use all numbered items.
  3. Bug 22250: Fixed accidentally overridden error.
  4. Bug 22211: Added async error when no valid media type is requested.
  5. Bug 22216: Made NavigatorUserMediaError extend DOMError.
  6. Bug 22249: Throw on attempts to set currentTime on media elements playing MediaStream objects.
  7. Bug 22246: Made media.buffered have length 0.
  8. Bug 22692: Updated media element to use HAVE_NOTHING state before media arrives on the played MediaStream and HAVE_ENOUGH_DATA as soon as media arrives.

May 29, 2013

  1. Bug 22252: fixed usage of MUST in MediaStream() constructor description.
  2. Bug 22215: made MediaStream.ended readonly.
  3. Bug 21967: clarified MediaStreamTrack.enabled state initial value.
  4. Added aspectRatio constraint, capability, and state.
  5. Updated usage of MediaStreams in media elements.

May 15, 2013

  1. Added explanatory section for constraints, capabilities, and states.
  2. Added VideoFacingModeEnum (including left and right options).
  3. Added getSourceInfos() and SourceInfo dictionary.
  4. Added isolated streams.

April 29, 2013

  1. Removed remaining photo APIs and references (since we have a separate Image Capture Spec).

March 20, 2013

  1. Added readonly and remote attributes to MediaStreamTrack
  2. Removed getConstraint(), setConstraint(), appendConstraint(), and prependConstraint().
  3. Added source states. Added states() method on tracks. Moved sourceType and sourceId to be states.
  4. Added source capabilities. Added capabilities() method on tracks.
  5. Added clarifying text about MediaStreamTrack lifecycle and mediaflow.
  6. Made MediaStreamTrack cloning explicit.
  7. Removed takePhoto() and friends from VideoStreamTrack (we have a separate Image Capture Spec).
  8. Made getUserMedia() error callback mandatory.

December 12, 2012

  1. Changed error code to be string instead of number.
  2. Added core of settings proposal allowing for constraint changes after stream/track creation.

November 15 2012

  1. Introduced new representation of tracks in a stream (removed MediaStreamTrackList).
  2. Updated MediaStreamTrack.readyState to use an enum type (instad of unsigned short constants).
  3. Renamed MediaStream.label to MediaStream.id (the definition needs some more work).

October 1 2012

  1. Limited the track kind values to "audio" and "video" only (could previously be user defined as well).
  2. Made MediaStream extend EventTarget.
  3. Simplified the MediaStream constructor.

June 23 2012

  1. Rename title to "Media Capture and Streams".
  2. Update document to comply with HTML5.
  3. Update image describing a MediaStream.
  4. Add known issues and various other editorial changes.

June 22 2012

  1. Update wording for constraints algorithm.

June 19 2012

  1. Added "Media Streams as Media Elements section".

June 12 2012

  1. Switch to respec v3.

June 5 2012

  1. Added non-normative section "Implementation Suggestions".
  2. Removed stray whitespace.

June 1 2012

  1. Added media constraint algorithm.

Apr 23 2012

  1. Remove MediaStreamRecorder.

Apr 20 2012

  1. Add definitions of MediaStreams and related objects.

Dec 21 2011

  1. Changed to make wanted media opt in (rather than opt out). Minor edits.

Nov 29 2011

  1. Changed examples to use MediaStreamOptions objects rather than strings. Minor edits.

Nov 15 2011

  1. Removed MediaStream stuff. Refers to webrtc 1.0 spec for that part instead.

Nov 9 2011

  1. Created first version by copying the webrtc spec and ripping out stuff. Put it on github.

Acknowledgements

The editors wish to thank the Working Group chairs and Team Contact, Harald Alvestrand, Stefan Håkansson, and Dominique Hazaël-Massieux, for their support. Substantial text in this specification was provided by many people including Jim Barnett, Harald Alvestrand, Travis Leithead, and Stefan Håkansson.