Copyright © 2015 W3C® (MIT, ERCIM, Keio, Beihang), All Rights Reserved. W3C liability, trademark and document use rules apply.
This specification extends the Media Capture and Streams specification [GETUSERMEDIA] to allow a depth stream to be requested from the web platform using APIs familiar to web authors.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
The following changes were made since the W3C First Public Working Draft 07 October 2014:
DepthData
interface
ImageData
interface
CanvasImageSource
typedef
MediaStream
object containing a
depth track
Settings
dictionary
MediaStream
object
WebGLRenderingContext
interface
This document is not complete and is subject to change. Early experimentations are encouraged to allow the Media Capture Task Force to evolve the specification based on technical discussions within the Task Force, implementation experience gained from early implementations, and feedback from other groups and individuals.
This document was published by the Device APIs Working Group and Web Real-Time Communications Working Group as a Working Draft. This document is intended to become a W3C Recommendation. If you wish to make comments regarding this document, please send them to public-media-capture@w3.org (subscribe, archives). All comments are welcome.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. W3C maintains a public list of any patent disclosures (Device APIs Working Group, Web Real-Time Communications Working Group) made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This document is governed by the 1 August 2014 W3C Process Document.
Depth cameras are increasingly being integrated into devices such as phones, tablets, and laptops. Depth cameras provide a depth map, which conveys the distance information between points on an object's surface and the camera. With depth information, web content and applications can be enhanced by, for example, the use of hand gestures as an input mechanism, or by creating 3D models of real-world objects that can interact and integrate with the web platform. Concrete applications of this technology include more immersive gaming experiences, more accessible 3D video conferences, and augmented reality, to name a few.
To bring depth capability to the web platform, this specification
extends the
interface [GETUSERMEDIA]
to enable it to also contain depth-based
MediaStream
MediaStreamTrack
s. A depth-based
MediaStreamTrack
, referred to as a depth
track, represents an abstraction of a stream of frames that can
each be converted to objects which contain an array of pixel data,
where each pixel represents the distance between the camera and the
objects in the scene for that point in the array. A
object that contains one or more
depth tracks is referred to as a depth stream.
MediaStream
This specification attempts to address the Use Cases and Requirements for accessing depth stream from a depth camera. See also the Examples section for concrete usage examples.
As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.
The key words MAY and MUST are to be interpreted as described in [RFC2119].
This specification defines conformance criteria that apply to a single product: the user agent that implements the interfaces that it contains.
Implementations that use ECMAScript to implement the APIs defined in this specification must implement them in a manner consistent with the ECMAScript Bindings defined in the Web IDL specification [WEBIDL], as this specification uses that specification and terminology.
The
MediaStreamTrack
and
MediaStream
interfaces this specification extends
are defined in [GETUSERMEDIA].
The
Constraints
,
Settings
,
MediaStreamConstraints
, and
MediaTrackConstraints
dictionaries this
specification extends are based upon the
Constrainable pattern defined in [GETUSERMEDIA].
The
NavigatorUserMediaSuccessCallback
callback is
defined in [GETUSERMEDIA].
The
CanvasRenderingContext2D
and ImageData
interfaces, and the
CanvasImageSource
typedef are defined in
[2DCONTEXT2].
The ArrayBuffer
,
ArrayBufferView
and Uint16Array
types are defined in [TYPEDARRAY].
A depth stream is a
object
that contains one or more depth tracks.
MediaStream
A depth track represents media sourced from a depth camera or other similar source.
A video track represents media sourced from an RGB camera or other similar source.
MediaStreamConstraints
dictionary
partial dictionary MediaStreamConstraints {
(boolean or MediaTrackConstraints) depth = false;
};
The depth
attribute MUST return the value it was initialized to. When the
object is created, this attribute MUST be initialized to false. If
true, the attribute represents a request that the
object returned as an argument of the
MediaStream
NavigatorUserMediaSuccessCallback
contains a
depth track. If a Constraints
structure is
provided, it further specifies the nature and settings of the
depth track.
MediaStream
interface
partial interface MediaStream {
sequence<MediaStreamTrack> getDepthTracks ();
};
The getDepthTracks()
method, when invoked, MUST return a sequence of
MediaStreamTrack
objects representing the
depth tracks in this stream.
The getDepthTracks()
method MUST return a
sequence that represents a snapshot of all the
MediaStreamTrack
objects in this stream's track
set whose kind
is equal to "depth
".
The conversion from the track set to the sequence is user
agent defined and the order does not have to be stable between
calls.
This section is non-normative.
A MediaStreamTrack
object representing a
video track and a MediaStreamTrack
object representing a depth track can be combined into one
object. The rendering of the two
tracks are intended to be synchronized. The resolution of the two
tracks are intended to be same. And the coordination of the two
tracks are intended to be calibrated. These are not hard
requirements, since it might not be possible to synchronize tracks
from sources.
MediaStream
MediaStreamTrack
interface
The kind
attribute MUST, on getting, return
the string "depth
" if the object represents a depth
track.
User agents that support
direct assignment to media elements MUST allow a
MediaStream
object containing a depth
track to be assigned directly to a media element.
MediaStream
For each MediaStreamTrack
representing a depth
track in the
, the user
agent MUST create a corresponding MediaStream
VideoTrack
as defined in [HTML5].
CanvasImageSource
typedef
Several methods in the CanvasRenderingContext2D
API take the union type CanvasImageSource
as an
argument. This specification extends the list of
image sources for 2D rendering contexts defined in
[2DCONTEXT2].
A video
element whose source is a
object containing a depth
track is said to be a depth video element.
MediaStream
A depth video element may be used as a
CanvasImageSource
.
ImageData
interface
Depth cameras usually produce 16-bit depth values per pixel. However, the canvas drawing surface used to draw and manipulate 2D graphics on the web platform does not currently support 16bpp.
To address the issue, this specification defines a new data
representation for current Canvas Pixel
ArrayBuffer
of ImageData
interface to represent the 16bpp depth image produced by depth
cameras.
An ImageData
object is said to represent
depth data, when the CanvasImageSource
used
as the image source for the
CanvasRenderingContext2D
is a depth video
element.
When representing a depth image, the Canvas Pixel
ArrayBuffer
is an ArrayBuffer
whose data is represented in left-to-right order, row by row top to
bottom, starting with the top left, with each pixel's lower 8 bit of
16 bit depth value, upper 8 bit of 16 bit depth value, 8 bit reserved
data, and another 8 bit reserved data being given in that order for
each pixel. Each component of each pixel represented in this array
must be in the range 0..255, representing the 8 bit value for that
component. The components must be assigned consecutive indices
starting with 0 for the top left pixel's lower 8 bit of 16 bit depth
value component.
Settings
dictionary
When the getSettings()
method is invoked on a
MediaStreamTrack
object that represents a
depth track, the user agent MUST return a
Settings
dictionary with the additional
properties listed below. When the getSettings()
method
is invoked on a MediaStreamTrack
object that
represents a video track, the user agent MAY return a
Settings
dictionary with the additional properties
listed below:
partial dictionary Settings {
double focalLength = null;
double horizontalFieldOfView = null;
double verticalFieldOfView = null;
};
The focalLength
attribute
MUST return the value it was initialized to. When the object is
created, this attribute MUST be initialized to null. It represents
the focal length of the camera in millimeters.
The horizontalFieldOfView
attribute MUST return the value it was initialized to. When the
object is created, this attribute MUST be initialized to null. It
represents the horizontal angle of view in degrees.
The verticalFieldOfView
attribute MUST return the value it was initialized to. When the
object is created, this attribute MUST be initialized to null. It
represents the vertical angle of view in degrees.
WebGLRenderingContext
interface
This section is non-normative.
A video
element whose source is a
object containing a depth
track may be uploaded to a WebGL texture of format
MediaStream
RGB
and type UNSIGNED_BYTE
. [WEBGL]
For each pixel of this WebGL texture, the R component represents the lower 8 bit value of 16 bit depth value, the G component represents the upper 8 bit value of 16 bit depth value and the value in B component is not defined.
This section is non-normative.
var canvasContext = document.createElement("canvas").getContext("2d"); var fps = 60; navigator.mediaDevices.getUserMedia({ video: true, depth: true, }).then(function (stream) { // wire the stream into a <video> element for playback var video = document.querySelector('#video'); video.srcObject = stream; video.play(); // wire the depth stream into another <video> element to convert kind // NOTE: Only the R and G bytes are set to carry 16 bits of data var depthVideo = document.querySelector('#depthVideo'); // construct a new MediaStream out of the existing depth track(stream) var depthStream = new MediaStream(stream.getDepthTracks()[0]); depthVideo.srcObject = depthStream; depthVideo.play(); depthVideo.onloadedmetadata = function () { setInterval(function () { canvasContext.drawImage(video); var rgbImageData = canvasContext.getImageData(0, 0, w, h); var pixels = rgbImageData.data; depthCanvasContext.drawImage(depthVideo); var depthImageData = depthCanvasContext.getImageData(0, 0, w, h); var dexels = depthImageData.data; // iterate through depth pixels to convert 2 bytes into 1 Uint16 for (var x = 0; x < w ; ++x) { for (var y = 0; y < h; ++y) { var i = (x + y * w) * 4; // combine the R & G pixels at (x, y) to get // the 16 bit depth pixel value var depth = dexels[i] | dexels[i + 1] << 8; } } // do things with pixels and dexels here }, 1000/fps); }; }).catch(function (reason) { // handle gUM error here });
// This code sets up a video element from a depth stream, uploads it to a WebGL // texture, and samples that texture in the fragment shader, reconstructing the // 16-bit depth values from the red and green channels. navigator.mediaDevices.getUserMedia({ depth: true, }).then(function (stream) { // wire the stream into a <video> element for playback var depthVideo = document.querySelector('#depthVideo'); depthVideo.srcObject = stream; depthVideo.play(); }).catch(function (reason) { // handle gUM error here }); // ... later, in the rendering loop ... gl.texImage2D( gl.TEXTURE_2D, 0, gl.RGB, gl.RGB, gl.UNSIGNED_BYTE, depthVideo ); <script id="fragment-shader" type="x-shader/x-fragment"> varying vec2 v_texCoord; // u_tex points to the texture unit containing the depth texture. uniform sampler2D u_tex; void main() { vec4 floatColor = texture2D(u_tex, v_texCoord); vec3 rgb = floatColor.rgb; // ... float depth = rgb.r + 256. * rgb.g; // ... } </script>
Thanks to everyone who contributed to the Use Cases and Requirements, sent feedback and comments. Special thanks to Ningxin Hu for experimental implementations, as well as to the Project Tango for their experiments.