WebRTC Insertable Media using Streams

Editor’s Draft,

This version:
https://w3c.github.io/webrtc-insertable-streams/
Feedback:
public-webrtc@w3.org with subject line “[webrtc-media-streams] … message topic …” (archives)
Issue Tracking:
GitHub
Editors:
(Google)
(Google)

Abstract

This API defines an API surface for manipulating the bits on MediaStreamTracks being sent via an RTCPeerConnection.

Status of this document

This is a public copy of the editors’ draft. It is provided for discussion only and may change at any moment. Its publication here does not imply endorsement of its contents by W3C. Don’t cite this document other than as work in progress.

If you wish to make comments regarding this document, please send them to public-webrtc@w3.org (subscribe, archives). When sending e-mail, please put the text “webrtc-media-streams” in the subject, preferably like this: “[webrtc-media-streams] …summary of comment…”. All comments are welcome.

This document was produced by the Web Real-Time Communications Working Group.

This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 1 March 2019 W3C Process Document.

1. Introduction

The [WEBRTC-NV-USE-CASES] document describes several functions that can only be achieved by access to media (requirements N20-N22), including, but not limited to:

These use cases further require that processing can be done in worker threads (requirement N23-N24).

Furthermore, the "trusted JavaScript cloud conferencing" use case requires such processing to be done on encoded media, not just the raw media.

This specification gives an interface inspired by [WEB-CODECS] to provide access to such functionality while retaining the setup flow of RTCPeerConnection.

This iteration of the specification provides access to encoded media, which is the output of the encoder part of a codec and the input to the decoder part of a codec.

2. Terminology

3. Specification

The Streams definition doesn’t use WebIDL much, but the WebRTC spec does. This specification shows the IDL extensions for WebRTC.

It uses an extension to RTCConfiguration in order to notify the RTCPeerConnection that insertable streams will be used, and uses an additional API on RTCRtpSender and RTCRtpReceiver to insert the processing into the pipeline.

// New dictionary.
dictionary RTCInsertableStreams {
    ReadableStream readable;
    WritableStream writable;
};

// New enum for video frame types. Will eventually re-use the equivalent defined
// by WebCodecs.
enum RTCEncodedVideoFrameType {
    "empty",
    "key",
    "delta",
};

dictionary RTCEncodedVideoFrameMetadata {
    long long frameId;
    sequence<long long> dependencies;
    unsigned short width;
    unsigned short height;
    long spatialIndex;
    long temporalIndex;
    long synchronizationSource;
    sequence<long> contributingSources;
};

// New interfaces to define encoded video and audio frames. Will eventually
// re-use or extend the equivalent defined in WebCodecs.
[Exposed=Window]
interface RTCEncodedVideoFrame {
    readonly attribute RTCEncodedVideoFrameType type;
    readonly attribute unsigned long long timestamp;
    attribute ArrayBuffer data;
    RTCEncodedVideoFrameMetadata getMetadata();
};

dictionary RTCEncodedAudioFrameMetadata {
    long synchronizationSource;
    sequence<long> contributingSources;
};

[Exposed=Window]
interface RTCEncodedAudioFrame {
    readonly attribute unsigned long long timestamp;
    attribute ArrayBuffer data;
    RTCEncodedAudioFrameMetadata getMetadata();
};


// New fields in RTCConfiguration
partial dictionary RTCConfiguration {
    boolean encodedInsertableStreams = false;
};

// New methods for RTCRtpSender and RTCRtpReceiver
partial interface RTCRtpSender {
    RTCInsertableStreams createEncodedStreams();
};

partial interface RTCRtpReceiver {
    RTCInsertableStreams createEncodedStreams();
};

3.1. Extension operation

At the time when a codec is initialized as part of the encoder, and the corresponding flag is set in the RTCPeerConnection's RTCConfiguration argument, ensure that the codec is disabled and produces no output.

3.1.1. Stream creation

Let the RTCRtpSender or RTCRtpReceiver have an internal slot, [[Streams]], initialized to null.

When createEncodedStreams() is called, run the following steps:

3.1.2. Stream processing

When a frame is produced from the encoded data source, place it on the [[Streams]].readable.

When a frame appears on the [[Streams]].writable, do the following:

4. Privacy and security considerations

This API gives Javascript access to the content of media streams. This is also available from other sources, such as Canvas and WebAudio.

However, streams that are isolated (as specified in [WEBRTC-IDENTITY]) or tainted with another origin, cannot be accessed using this API, since that would break the isolation rule.

The API will allow access to some aspects of timing information that are otherwise unavailable, which allows some fingerprinting surface.

5. Examples

See the explainer document.

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[MEDIACAPTURE-STREAMS]
Daniel Burnett; et al. Media Capture and Streams. 2 July 2019. CR. URL: https://www.w3.org/TR/mediacapture-streams/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[STREAMS]
Adam Rice; Domenic Denicola; 吉野剛史 (Takeshi Yoshino). Streams Standard. Living Standard. URL: https://streams.spec.whatwg.org/
[WebIDL]
Boris Zbarsky. Web IDL. 15 December 2016. ED. URL: https://heycam.github.io/webidl/
[WEBRTC-1]
WebRTC 1.0: Real-time Communication Between Browsers URL: https://www.w3.org/TR/webrtc/

Informative References

[WEB-CODECS]
Web Codecs explainer. URL: https://github.com/WICG/web-codecs/blob/master/explainer.md
[WEBRTC-IDENTITY]
Cullen Jennings; Martin Thomson. Identity for WebRTC 1.0. 27 September 2018. CR. URL: https://www.w3.org/TR/webrtc-identity/
[WEBRTC-NV-USE-CASES]
Bernard Aboba. WebRTC Next Version Use Cases. 11 December 2018. WD. URL: https://www.w3.org/TR/webrtc-nv-use-cases/

IDL Index

// New dictionary.
dictionary RTCInsertableStreams {
    ReadableStream readable;
    WritableStream writable;
};

// New enum for video frame types. Will eventually re-use the equivalent defined
// by WebCodecs.
enum RTCEncodedVideoFrameType {
    "empty",
    "key",
    "delta",
};

dictionary RTCEncodedVideoFrameMetadata {
    long long frameId;
    sequence<long long> dependencies;
    unsigned short width;
    unsigned short height;
    long spatialIndex;
    long temporalIndex;
    long synchronizationSource;
    sequence<long> contributingSources;
};

// New interfaces to define encoded video and audio frames. Will eventually
// re-use or extend the equivalent defined in WebCodecs.
[Exposed=Window]
interface RTCEncodedVideoFrame {
    readonly attribute RTCEncodedVideoFrameType type;
    readonly attribute unsigned long long timestamp;
    attribute ArrayBuffer data;
    RTCEncodedVideoFrameMetadata getMetadata();
};

dictionary RTCEncodedAudioFrameMetadata {
    long synchronizationSource;
    sequence<long> contributingSources;
};

[Exposed=Window]
interface RTCEncodedAudioFrame {
    readonly attribute unsigned long long timestamp;
    attribute ArrayBuffer data;
    RTCEncodedAudioFrameMetadata getMetadata();
};


// New fields in RTCConfiguration
partial dictionary RTCConfiguration {
    boolean encodedInsertableStreams = false;
};

// New methods for RTCRtpSender and RTCRtpReceiver
partial interface RTCRtpSender {
    RTCInsertableStreams createEncodedStreams();
};

partial interface RTCRtpReceiver {
    RTCInsertableStreams createEncodedStreams();
};