This document refines behaviors of the {{MediaDevices}} functions {{MediaDevices/enumerateDevices()}} and {{MediaDevices/getUserMedia()}} to reduce their fingerprinting surface and enable user selection of camera and microphone through device pickers implemented in the user agent.

This is an unofficial proposal.

Introduction

This document contains proposed extensions and modifications to the [[GETUSERMEDIA]] specification.

New features and modifications to existing features proposed here may be considered for addition into the main specification post Recommendation. Deciding factors will include maturity of the extension or modification, consensus on adding it, and implementation experience.

A concrete long-term goal is reducing the fingerprinting surface of {{MediaDevices/enumerateDevices()}} by deprecating exposure of the device {{MediaDeviceInfo/label}} in its results. This requires relieving applications of the burden of building user interfaces to select cameras and microphones in-content, by offering this in user agents as part of {{MediaDevices/getUserMedia()}} instead.

Miscellaneous other smaller features are under consideration as well, such as constraints to control multi-channel audio beyond stereo.

Terminology

This document uses the definitions {{MediaDevices}}, {{MediaStreamTrack}}, {{MediaStreamConstraints}} and {{ConstrainablePattern}} from [[!GETUSERMEDIA]].

The terms [=permission state=], [=request permission to use=], and prompt the user to choose are defined in [[!permissions]].

In-browser camera and microphone picker

The existing {{MediaDevices/enumerateDevices()}} function exposes camera and microphone {{MediaDeviceInfo/label}}s to let applications build in-content user interfaces for camera and microphone selection. Applications have had to do this because {{MediaDevices/getUserMedia()}} did not offer a web compatible in-agent device picker. This specification aims to rectify that.

Due to the significant fingerprinting vector caused by device {{MediaDeviceInfo/label}}s, and the well-established nature of the existing APIs, the scope of this particular effort is limited to removing {{MediaDeviceInfo/label}}, leaving the overall constraints-based model intact. This helps ensure a migration path more viable than to a less-powerful API.

This specification augments the existing {{MediaDevices/getUserMedia()}} function instead of introducing a new less-powerful API to compete with it, for that reason as well.

getUserMedia "user-chooses" semantics

This specification introduces slightly altered semantics to the {{MediaDevices/getUserMedia()}} function called "user-chooses" that guarantee a picker will be shown to the user in cases where the user agent would otherwise choose for the user (that is: when application constraints do not narrow down the choices to a single device). This is orthoginal to permission, and offers a better and more consistent user experience across applications and user agents.

Unfortunately, since the "user-chooses" semantics may produce user agent prompts at different times and in different situations compared to the old semantics, they are somewhat incompatible with expectations in some existing web applications that tend to call {{MediaDevices/getUserMedia()}} repeatedly and lazily instead of using e.g. stream.clone().

Web compatibility and migration

User agents are encouraged to provide the new semantics as opt-in initially for web compatibility. User agents MUST deprecate (remove) {{MediaDeviceInfo/label}} from {{MediaDeviceInfo}} over time, though specific migration strategies are left to user agents. User agents SHOULD migrate to offering the new semantics by default (opt-out) over time.

Since the constraints-model remains intact, web compatibility problems are expected to be limited to:

MediaDevices Interface Extensions

partial interface MediaDevices {
  readonly attribute GetUserMediaSemantics defaultSemantics;
};

Attributes

defaultSemantics of type GetUserMediaSemantics, readonly

The default semantics of {{MediaDevices/getUserMedia()}} in this user agent.

User agents SHOULD default to "browser-chooses" for backwards compatibility, until a transition plan has been enacted where a majority of user agents collectively switch their defaults to "user-chooses" for improved user privacy, and usage metrics suggest this transition is feasible without major breakage.

MediaStreamConstraints dictionary extensions

partial dictionary MediaStreamConstraints {
  GetUserMediaSemantics semantics;
};

Dictionary {{MediaStreamConstraints}} Members

semantics of type {{GetUserMediaSemantics}}

In cases where the specified constraints do not narrow multiple choices between devices down to one per kind, specifies how the final determination of which devices to pick from the remaining choices MUST be made. If not specified, then the defaultSemantics are used.

GetUserMediaSemantics enum

enum GetUserMediaSemantics {
  "browser-chooses",
  "user-chooses"
};
GetUserMediaSemantics Enumeration description
browser-chooses

When application-specified constraints do not narrow multiple choices between devices down to one per kind, the user agent is allowed to make the final determination between the remaining choices.

user-chooses

When application-specified constraints do not narrow multiple choices between devices down to one per kind, the user agent MUST prompt the user to choose between the remaining choices, even if the application already has permission to some or all of them.

Algorithms

When the {{MediaDevices/getUserMedia()}} method is invoked, run the following steps before invoking the {{MediaDevices/getUserMedia()}} algorithm:

  1. Let mediaDevices be the object on which this method was invoked.

  2. Let constraints be the method's first argument.

  3. Let semanticsPresent be true if constraints.semantics [= map/exists =], otherwise false.

  4. Let semantics be constraints.semantics if present, or the value of mediaDevices.defaultSemantics otherwise.

  5. Replace step 6.5.1. of the {{MediaDevices/getUserMedia()}} algorithm in its entirety with the following two steps:

    1. Let descriptor be a {{PermissionDescriptor}} with its {{PermissionDescriptor/name}} member set to the permission name associated with kind (e.g. {{PermissionName/"camera"}} for "video", {{PermissionName/"microphone"}} for "audio"), and, optionally, consider its {{DevicePermissionDescriptor/deviceId}} member set to any appropriate device's deviceId.

    2. If the number of unique devices sourcing tracks of media type kind in candidateSet is greater than 1 and semantics is "user-chooses", then prompt the user to choose a device with descriptor, resulting in provided media. Otherwise, request permission to use a device with descriptor, while considering all devices being attached to a live and same-permission MediaStreamTrack in the current [=browsing context=] to mean having permission status {{PermissionState/"granted"}}, resulting in provided media.

      Same-permission in this context means a {{MediaStreamTrack}} that required the same level of permission to obtain as what is being requested.

      When asking the user’s permission, the user agent MUST disclose whether permission will be granted only to the device chosen, or to all devices of that kind.

      Let track be the provided media, which MUST be precisely one track of type kind from finalSet. If semantics is "browser-chooses" then the decision of which track to choose from finalSet is up to the User Agent, which MAY use the value of the computed "fitness distance" from the SelectSettings algorithm, the value of semanticsPresent, or any other internally-available information about the devices, as inputs to its decision. If semantics is "user-chooses", and the application has not narrowed down the choices to one, then the user agent MUST ask the user to make the final selection.

      Once selected, the source of the {{MediaStreamTrack}} MUST NOT change.

      User Agents are encouraged to default to or present a default choice based primarily on fitness distance, and secondarily on the user's primary or system default device for kind (when possible). User Agents MAY allow users to use any media source, including pre-recorded media files.

Examples

This example shows a setup with a start button and a camera selector using the new semantics (microphone is not shown for brievity but is equivalent).

<button id="start">Start</button>
<button id="chosenCamera" disabled>Camera: none</button>
<script>

let cameraTrack = null;

start.onclick = async () => {
  try {
    const stream = await navigator.mediaDevices.getUserMedia({
      video: {deviceId: localStorage.cameraId}
    });
    setCameraTrack(stream.getVideoTracks()[0]);
  } catch (err) {
    console.error(err);
  }
}

chosenCamera.onclick = async () => {
  try {
    const stream = await navigator.mediaDevices.getUserMedia({
      video: true,
      semantics: "user-chooses"
    });
    setCameraTrack(stream.getVideoTracks()[0]);
  } catch (err) {
    console.error(err);
  }
}

function setCameraTrack(track) {
  cameraTrack = track;
  const {deviceId, label} = track.getSettings();
  localStorage.cameraId = deviceId;
  chosenCamera.innerText = `Camera: ${label}`;
  chosenCamera.disabled = false;
}
</script>