Media Capture Depth Stream Extensions

This specification extends the Media Capture and Streams specification [[GETUSERMEDIA]] to allow a depth-only stream or combined depth+color stream to be requested from the web platform using APIs familiar to web authors.

Dependencies

The MediaStreamTrack and MediaStream interfaces this specification extends are defined in [[!GETUSERMEDIA]].

The concepts Constraints, Capabilities, ConstraintSet, and Settings, and types of constrainable properties are defined in [[!GETUSERMEDIA]].

The ConstrainDOMString type is defined in [[!GETUSERMEDIA]].

MediaTrackSettings, MediaTrackConstraints, MediaTrackSupportedConstraints, MediaTrackCapabilities, and MediaTrackConstraintSet dictionaries this specification extends are defined in [[!GETUSERMEDIA]].

The getUserMedia() is defined in [[!GETUSERMEDIA]].

The concepts muted and disabled as applied to MediaStreamTrack are defined in [[!GETUSERMEDIA]].

The terms source and consumer are defined in [[!GETUSERMEDIA]].

The MediaDeviceKind enumeration is defined in [[!GETUSERMEDIA]].

The video element and Canvas Pixel ArrayBuffer interfaces are defined in [[!HTML]].

The meaning of dictionary member being present or not present is defined in [[WEBIDL]].

Terminology

The term depth+color stream means a MediaStream object that contains one or more MediaStreamTrack objects whose videoKind of Settings is "depth" (depth stream track) and one or more MediaStreamTrack objects whose videoKind of Settings is "color" (color stream track).

The term depth-only stream means a MediaStream object that contains one or more MediaStreamTrack objects whose videoKind of Settings is "depth" (depth stream track) only.

The term color-only stream means a MediaStream object that contains one or more MediaStreamTrack objects whose videoKind of Settings is "color" (color stream track) only, and optionally of kind "audio".

The term depth stream track means a MediaStreamTrack object whose videoKind of Settings is "depth". It represents a media stream track whose source is a depth camera.

The term color stream track means a MediaStreamTrack object whose videoKind of Settings is "color". It represents a media stream track whose source is a color camera.

Depth map

A depth map is an abstract representation of a frame of a depth stream track. A depth map is a two-dimensional array that contains information relating to the perpendicular distance of the surfaces of scene objects to camera's near plane. The numeric values in the depth map are referred to as depth map values and represent distances to near plane normalized against the distance between far and near plane.

Normalized depth map value means that it's range is from 0 to 1, where maximum depth map value of 1 corresponds to distances equal to far plane. Normalized depth map value is represented using floating-point or unsigned fixed-point formats [OpenGL ES 3.0.5]#subsection.2.1.6.

Depth map's near plane and far plane are concepts of 3D graphics that define camera viewing volume (view frustum). Their definition is outside the scope of this specification.

Extensions

If the implementation is unable to report the value represented by any of the dictionary members, they are not present in the dictionary.

MediaTrackSupportedConstraints dictionary

MediaTrackSupportedConstraints dictionary represents the list of Constraints recognized by a user agent for controlling the Capabilities of a MediaStreamTrack object.

Partial dictionary MediaTrackSupportedConstraints extends the original dictionary defined in [[!GETUSERMEDIA]]. The dictionary value true represents an applicable constraint.

An applicable constraint is not omitted by the user agent in step 6.2.2 in the getUserMedia() algorithm.

          partial dictionary MediaTrackSupportedConstraints {
              // Applies to both depth stream track and color stream track:
              boolean videoKind = true;
          };

MediaTrackCapabilities dictionary

MediaTrackCapabilities dictionary represents the Capabilities of a MediaStreamTrack object.

Partial dictionary MediaTrackCapabilities extends the original MediaTrackCapabilities dictionary defined in [[!GETUSERMEDIA]].

          partial dictionary MediaTrackCapabilities {
              // Applies to both depth stream track and color stream track:
              DOMString videoKind;
          };

`MediaTrackConstraintSet` dictionary

ConstraintSet dictionary specifies each member's set of allowed values.

The allowed values for ConstrainDOMString type are defined in [[!GETUSERMEDIA]].

          partial dictionary MediaTrackConstraintSet {
              // Applies to both depth stream track and color stream track:
              ConstrainDOMString videoKind;
          };

`MediaTrackSettings` dictionary

MediaTrackSettings dictionary represents the Settings of a MediaStreamTrack object.

Partial dictionary MediaTrackSettings extends the original MediaTrackSettings dictionary.

          partial dictionary MediaTrackSettings {
              // Applies to both depth stream track and color stream track:
              DOMString           videoKind;
          };

Video kind constrainable property

The videoKind constrainable property is defined to apply to both color stream track and depth stream track. The videoKind member specifies the video kind of the source.

        enum VideoKindEnum {
            "color",
            "depth"
        };

The VideoKindEnum enumeration defines the valid video kinds: color for color stream track whose source is a color camera, and depth for depth stream track whose source is a depth camera.

The MediaStream consumer for the depth-only stream and depth+color stream is the video element [[!HTML]].

If a MediaStreamTrack whose videoKind is depth is muted or disabled, it MUST render frames as if all the pixels would be 0.

WebGL implementation considerations

This section is currently work in progress, and subject to change.

Depth map values that the camera produces are often in 16-bit normalized unsigned fixed-point format. Application developer can access the data using canvas pixel arraybuffer red color component, but that would cause a precision loss given that it is in 8-bit normalized unsigned fixed-point format.

The same precision loss is related to usage of [[WEBGL]] UNSIGNED_BYTE textures. In order to access the full precision, application developer can use [[WEBGL]] floating-point textures.

There are several use-cases which are a good fit to be, at least partially, implemented on the GPU, such as motion recognition, pattern recognition, background removal, as well as 3D point cloud.

This section explains which APIs can be used for some of these mentioned use-cases; the concrete examples are provided in the Examples section.

Upload video frame to WebGL texture

A video element whose source is a MediaStream object containing a depth stream track may be uploaded to a [[WEBGL]] texture of format RGBA or RED and type FLOAT. See the specification [[WEBGL]] and the upload to float texture example code.

For each pixel of this WebGL texture, the R component represents normalized floating-point depth map value.

Read the data from a WebGL texture

Here we list some of the possible approaches.

Synchronous readPixels usage requires the least amount of code and it is available with WebGL 1.0. See the readPixels from float example for further details.
Using GetBufferSubData after WebGLSync's status signals that readPixels (read to pixel buffer objects) or transform feedback [[WEBGL2]] is complete, enables asynchronous, non-blocking read of depth data from GPU.

Examples

Playback of depth and color streams from same device group.

navigator.mediaDevices.getUserMedia({
  video: {videoKind: {exact: "color"}, groupId: {exact: id}}
}).then(function (stream) {
    // Wire the media stream into a <video> element for playback.
    // The RGB video is rendered.
    var video = document.querySelector('#video');
    video.srcObject = stream;
    video.play();
  }
);

navigator.mediaDevices.getUserMedia({
  video: {videoKind: {exact: "depth"}, groupId: {exact: id}}
}).then(function (stream) {
    // Wire the depth-only stream into another <video> element for playback.
    // The depth information is rendered in its grayscale representation.
    var depthVideo = document.querySelector('#depthVideo');
    depthVideo.srcObject = stream;
    depthVideo.play();
  }
);

WebGL: upload to float texture

This code sets up a video element from a depth stream, uploads it to a WebGL 2.0 float texture.

navigator.mediaDevices.getUserMedia({
  video: {videoKind: {exact: "depth"}}
}).then(function (stream) {
  // wire the stream into a <video> element for playback
  var depthVideo = document.querySelector('#depthVideo');
  depthVideo.srcObject = stream;
  depthVideo.play();
}).catch(function (reason) {
  // handle gUM error here
});

let gl = canvas.getContext("webgl2");
// Activate the standard WebGL 2.0 extension for using single component R32F
// texture format.
gl.getExtension('EXT_color_buffer_float');

// Later, in the rendering loop ...
gl.bindTexture(gl.TEXTURE_2D, depthTexture);
gl.texImage2D(
   gl.TEXTURE_2D,
   0,
   gl.R32F,
   gl.RED,
   gl.FLOAT,
   depthVideo);

WebGL: readPixels from float texture

This example extends upload to float texture example.

This code creates the texture to which we will upload the depth video frame. Then, it sets up a named framebuffer, attach the texture as color attachment and, after uploading the depth video to the texture, reads the texture content to Float32Array.

// Initialize texture and framebuffer for reading back the texture.
let depthTexture = gl.createTexture();
gl.bindTexture(gl.TEXTURE_2D, depthTexture);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_T, gl.CLAMP_TO_EDGE);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_WRAP_S, gl.CLAMP_TO_EDGE);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.NEAREST);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.NEAREST);

let framebuffer = gl.createFramebuffer();
gl.bindFramebuffer(gl.FRAMEBUFFER, framebuffer);
gl.framebufferTexture2D(
  gl.FRAMEBUFFER,
  gl.COLOR_ATTACHMENT0,
  gl.TEXTURE_2D,
  depthTexture,
  0);

let buffer;

// Later, in the rendering loop ...
gl.bindTexture(gl.TEXTURE_2D, depthTexture);
gl.texImage2D(
   gl.TEXTURE_2D,
   0,
   gl.R32F,
   gl.RED,
   gl.FLOAT,
   depthVideo);

if (!buffer) {
  buffer =
      new Float32Array(depthVideo.videoWidth * depthVideo.videoHeight);
}

gl.readPixels(
  0,
  0,
  depthVideo.videoWidth,
  depthVideo.videoHeight,
  gl.RED,
  gl.FLOAT,
  buffer);

Use gl.getParameter(gl.IMPLEMENTATION_COLOR_READ_FORMAT); to check whether readPixels to gl.RED or gl.RGBA float is supported.

Privacy and security considerations

The privacy and security considerations discussed in [[!GETUSERMEDIA]] apply to this extension specification.

Acknowledgements

Thanks to everyone who contributed to the Use Cases and Requirements, sent feedback and comments. Special thanks to Ningxin Hu for experimental implementations, as well as to the Project Tango for their experiments.

Introduction

Use cases and requirements

Dependencies

Terminology

Depth map

Extensions

MediaTrackSupportedConstraints dictionary

MediaTrackCapabilities dictionary

`MediaTrackConstraintSet` dictionary

`MediaTrackSettings` dictionary

Video kind constrainable property

WebGL implementation considerations

Upload video frame to WebGL texture

Read the data from a WebGL texture

Examples

Playback of depth and color streams from same device group.

WebGL: upload to float texture

WebGL: readPixels from float texture

Privacy and security considerations

Acknowledgements

Introduction

Use cases and requirements

Dependencies

Terminology

Depth map

Extensions

MediaTrackSupportedConstraints dictionary

MediaTrackCapabilities dictionary

MediaTrackConstraintSet dictionary

MediaTrackSettings dictionary

Video kind constrainable property

WebGL implementation considerations

Upload video frame to WebGL texture

Read the data from a WebGL texture

Examples

Playback of depth and color streams from same device group.

WebGL: upload to float texture

WebGL: readPixels from float texture

Privacy and security considerations

Acknowledgements

`MediaTrackConstraintSet` dictionary

`MediaTrackSettings` dictionary