This document defines a set of JavaScript APIs that let a Web application manage how audio is rendered on the user audio output devices.
The WebRTC and Device and Sensors Working Group intend to publish this specification as a Candidate Recommendation soon. Consequently, this is a Request for wide review of this document.
This proposal allows JavaScript to direct the audio output of a media element to permitted devices other than the system or user agent default. This can be helpful in a variety of real-time communication scenarios as well as general media applications. For example, an application can use this API to programmatically direct output to a device such as a Bluetooth headset or speakerphone.
This section specifies additions to the {{HTMLMediaElement}} [[HTML]] when the Audio Output Devices API is supported.
When the {{HTMLMediaElement}} constructor is invoked, the user agent MUST add the following initializing step:
Let the element have a [[\SinkId]] internal slot,
initialized to ""
.
partial interface HTMLMediaElement { [SecureContext] readonly attribute DOMString sinkId; [SecureContext] Promise<undefined> setSinkId (DOMString sinkId); };
This attribute contains the ID of the audio device through which output is being delivered, or the empty string if output is delivered through the user-agent default device. If nonempty, this ID should be equal to the {{MediaDeviceInfo/deviceId}} attribute of one of the {{MediaDeviceInfo}} values returned from {{MediaDevices/enumerateDevices()}}.
On getting, the attribute MUST return the value of the {{HTMLMediaElement/[[SinkId]]}} slot.
Sets the ID of the audio device through which audio output should be rendered if the application is permitted to play out of a given device.
When this method is invoked, the user agent must run the following steps:
Let document be the
current settings object's
relevant global object's
associated Document
.
If document is not allowed to use the feature identified by "speaker-selection", return a promise rejected with a new {{DOMException}} whose name is {{NotAllowedError}}.
Let element be the {{HTMLMediaElement}} object on which this method was invoked.
Let sinkId be the method's first argument.
If sinkId is equal to element's
{{HTMLMediaElement/[[SinkId]]}},
return a promise resolved with undefined
.
Let p be a new promise.
Run the following substeps in parallel:
If sinkId is not the empty string and does not match any audio output device identified by the result that would be provided by {{MediaDevices/enumerateDevices()}}, reject p with a new {{DOMException}} whose name is {{NotFoundError}} and abort these substeps.
If sinkId is not the empty string, and the application would not be permitted to play audio through the device identified by sinkId if it weren't the current user agent default device, reject p with a new {{DOMException}} whose name is {{NotAllowedError}} and abort these substeps.
Switch the underlying audio output device for element to the audio device identified by sinkId.
If this substep is successful and the media element's {{HTMLMediaElement/paused}} attribute is false, audio MUST stop playing out of the device represented by the element's {{HTMLMediaElement/sinkId}} attribute and will start playing out of the device identified by sinkId
If the preceding substep failed, reject p with a new {{DOMException}} whose name is {{AbortError}}, and abort these substeps.
Queue a task that runs the following steps:
Set element's {{HTMLMediaElement/[[SinkId]]}} to sinkId.
Resolve p.
Return p.
New audio devices may become available to the user agent, or an audio device (identified by a media element's {{HTMLMediaElement/sinkId}} attribute) that had previously become [= unavailable =] may become available again, for example, if it is unplugged and later plugged back in.
In this scenario, the user agent must run the following steps:
Let sinkId be the identifier for the newly available device.
For each media element whose {{HTMLMediaElement/sinkId}} attribute is equal to sinkId:
If the media element's {{HTMLMediaElement/paused}} attribute is false, start rendering this object's audio out of the device represented by the {{HTMLMediaElement/sinkId}} attribute.
The following paragraph is non-normative.
If the application wishes to react to the device
change, the application can listen to the
devicechange
event and query
{{MediaDevices/enumerateDevices()}} for the list of updated
devices.
This section specifies additions to the {{MediaDevices}} when the Audio Output Devices API is supported.
partial interface MediaDevices { Promise<MediaDeviceInfo> selectAudioOutput(optional AudioOutputOptions options = {}); };
Prompts the user to select a specific audio output device.
When the {{selectAudioOutput}} method is called, the [=user agent=] MUST run the following steps:
If the [=relevant global object=] of [=this=] does not have [=transient activation=], return a promise rejected with a {{DOMException}} object whose {{DOMException/name}} attribute has the value {{InvalidStateError}}.
Let options be the method's first argument.
Let deviceId be options.deviceId
.
Let p be a new promise.
Run the following steps in parallel:
Let descriptor be a {{PermissionDescriptor}} with its [=powerful feature/name=] set to "speaker-selection"
If descriptor's [=permission state=] is {{PermissionState/"denied"}}, reject p with a new {{DOMException}} whose {{DOMException/name}} attribute has the value {{NotAllowedError}}, and abort these steps.
Probe the [=user agent=] for available audio output devices.
If there is no audio output device, reject p with a new {{DOMException}} whose {{DOMException/name}} attribute has the value {{NotFoundError}} and abort these steps.
If deviceId is not ""
run the following sub steps:
If deviceId matches a a device id previously exposed by {{MediaDevices/selectAudioOutput}} in this or an earlier browsing session, or matches a device id of an audio output device with the same groupId as an audio input device previously exposed by {{MediaDevices/getUserMedia()}} in this or an earlier browsing session, the user agent MAY decide, based on its previous decision of whether to persist this id or not for this set of origins, to run the following sub steps:
Let device be the device identified by deviceId, if available.
If device is available, resolve p with either deviceId or a freshly rotated device id for device, and abort the in-parallel steps.
[=Prompt the user to choose=] an audio output device, with descriptor.
If the result of the request is {{PermissionState/"denied"}}, reject p with a new {{DOMException}} whose {{DOMException/name}} attribute has the value {{NotAllowedError}} and abort these steps.
Let selectedDevice be the user-selected audio output device.
Let deviceInfo be the result of [=creating a device info object=] to represent selectedDevice, with mediaDevices.
Add deviceInfo.{{MediaDeviceInfo/deviceId}} to [[\explicitlyGrantedAudioOutputDevices]].
Resolve p with deviceInfo.
Return p.
Once a device is exposed after a call to {{MediaDevices/selectAudioOutput}}, it MUST be listed by {{MediaDevices/enumerateDevices()}} for the current browsing context.
If the promise returned by {{MediaDevices/selectAudioOutput}} is resolved, then the user agent MUST ensure the document is both immediately allowed to play media in an {{HTMLMediaElement}}, and immediately allowed to start an {{AudioContext}}, without needing any additional user gesture.
This is imprecise due to the current lack of standardization of autoplay in browsers.
This dictionary describes the options that can be used to obtain access to an audio output device.
dictionary AudioOutputOptions { DOMString deviceId = ""; };
""
When the value of this dictionary member
is not ""
, and matches the id previously exposed by
{{MediaDevices/selectAudioOutput}} or
a device id of an audio output device with the same groupId as an
audio input device previously exposed by
{{MediaDevices/getUserMedia()}} in this or an earlier session, the user
agent MAY opt to skip prompting the user in favor of resolving
with this id or a new rotated id for the same device, assuming
that device is currently available.
Applications that wish to rely on user agents supporting persisted device ids must pass these through {{MediaDevices/selectAudioOutput}} successfully before they will work with {{HTMLMediaElement/setSinkId}}. The reason for this is that it exposes fingerprinting information, but at the risk of prompting the user if the device is not available or the user agent decides not to honor the device id.
This document extends the Web platform with the ability to direct audio output to non-default devices, when user permission is given. User permission is necessary because playing audio out of a non-default device may be unexpected behavior to the user, and may cause a nuisance. For example, suppose a user is in a library or other quiet public place where she is using a laptop with system audio directed to a USB headset. Her expectation is that the laptop’s audio is private and she will not disturb others. If any Web application can direct audio output through arbitrary output devices, a mischievous website may play loud audio out of the laptop’s external speakers without the user’s consent.
To prevent these kinds of nuisance scenarios, the user agent must acquire the user’s consent to access non-default audio output devices. This would prevent the library example outlined earlier, because the application would not be permitted to play out audio from the system speakers.
The specification adds no permission requirement to the default audio output device.
The user agent may explicitly obtain user consent to play audio out of non-default output devices using {{MediaDevices/selectAudioOutput}}.
Implementations MUST also support implicit consent via the {{MediaDevices/getUserMedia()}} permission prompt; when an audio input device is permitted and opened via {{MediaDevices/getUserMedia()}} , this also permits access to any associated audio output devices (i.e., those with the same {{MediaDeviceInfo/groupId}}). This conveniently handles the common case of wanting to route both input and output audio through a headset or speakerphone device.
On page load, run the following step:
On the relevant global object, create an internal slot: [[\explicitlyGrantedAudioOutputDevices]], used to store devices that the user grants explicitly through {{MediaDevices/selectAudioOutput}}, initialized to an empty set.
This specification specifies the exposure decision algorithm for devices other than camera and microphone. The algorithm runs as follows, with device, microphoneList and cameraList as input:
Let document be the
current settings object's
relevant global object's
associated Document
.
Let deviceInfo be the result of [=creating a device info object=] to represent device.
If document is not
allowed to use the feature identified by "speaker-selection",
or deviceInfo.{{MediaDeviceInfo/kind}} is not {{ MediaDeviceKind/"audiooutput" }},
return false
.
If deviceInfo.{{MediaDeviceInfo/deviceId}}
is in [[\explicitlyGrantedAudioOutputDevices]], return true
.
If deviceInfo.{{MediaDeviceInfo/groupId}}
is the same as the groupId
of any microphone in microphoneList,
return true
.
return false
.
The Audio Output Devices API is a [=powerful feature=] that is identified by the [=powerful feature/name=] "speaker-selection".
It defines the following types and algorithms:
A permission covers access to at least one non-default speaker output device.
The semantics of the descriptor is that it queries for access to any non-default speaker output device. Thus, if a query for the "speaker-selection" [=powerful feature=] returns {{PermissionState/"granted"}}, the client knows that at least one of the {{AudioOutputOptions/deviceId}}s previously shared with it can be passed to {{MediaDevices/selectAudioOutput}} without incurring a permission prompt, and if {{PermissionState/"denied"}} is returned, it knows that no {{MediaDevices/selectAudioOutput}} request for an audio output device will succeed.
If the User Agent considers permission given to some, but not all, audio output devices, a query will return {{PermissionState/"granted"}}.
If the User Agent considers permission denied to all audio output devices, a query will return {{PermissionState/"denied"}}.
This specification defines one
[=policy-controlled feature=] identified by the string
"speaker-selection"
.
It has a [=policy-controlled feature/default allowlist=]
of "self"
.
A [=document=]'s [=Document/permissions policy=] determines whether any content in that document is [=allowed to use=] {{MediaDevices/selectAudioOutput}} to prompt the user for an audio output device, or [=allowed to use=] {{HTMLMediaElement/setSinkId}} to change the device through which audio output should be rendered, to a non-system-default user-permitted device. For {{MediaDevices/selectAudioOutput}} this is enforced by the [=prompt the user to choose=] algorithm.
This specification defines conformance criteria that apply to a single product: the user agent that implements the interfaces that it contains.
Conformance requirements phrased as algorithms or specific steps may be implemented in any manner, so long as the end result is equivalent. (In particular, the algorithms defined in this specification are intended to be easy to follow, and not intended to be performant.)
Implementations that use ECMAScript to implement the APIs defined in this specification must implement them in a manner consistent with the ECMAScript Bindings defined in the Web IDL specification [[!WEBIDL]], as this specification uses that specification and terminology.
The following people have contributed directly to the development of this specification: Harald Alvestrand, Rick Byers, Dominique Hazael-Massieux (via the HTML5Apps project), Philip Jägenstedt, Victoria Kirst, Shijun Sun, Martin Thomson, Chris Wilson.