TODO(henbos): Fill in.

This is an unofficial proposal.

Introduction

This document contains proposed extensions to the [[WEBRTC]] specification, where it was thought inappropriate to incorporate them directly into that specification at the point in time when they were written, but they were too small to warrant writing a separate document.

If, at a later time, it is a good time to add them to the base specification, that will be done. Deciding factors will include maturity of the extension, consensus on adding it, and implementation experience.

This document contains a number of sections that have extensions for one specific interface or dictionary in the base specification. When an extension only affects one interface or dictionary, it will only be described there. If an extension affects multiple interfaces or dictionaries, there will be a subsection in the informative "Overviews" section that describes the extension as a whole, while the normative changes are in the sections for the individual interfaces.

Terminology

The following terms are defined in [[!WEBRTC]]:

The term MediaStreamTrack is defined in [[!GETUSERMEDIA]].

The term DOMHighResTimeStamp is defined in [[!hr-time]].

The following terms are defined in RTP Header Extension for Absolute Capture Time:

The following terms are defined in [[!WEBRTC-STATS]]:

When referring to exceptions, the terms throw and create are defined in [[WEBIDL]].

The OAuth Client and Authorization Server roles are defined in [[RFC6749]] Section 1.1.

Overviews

RTP header control

The extension for RTP header extension control consists of 2 parts:

The RTP header extension mechanism is defined in [[RFC8285]], with the SDP negotiation mechanism defined in section 5. It goes into some detail on the meaning of "direction" with regard to RTP header extensions, and gives a detailed procedure for negotiating RTP header extension IDs.

This API extension gives the means to control the use and direction of RTP header extensions as documented in [[RFC8285]]. It does not influence the ID negotiation mechanism, except for being able to control the number of extensions offered.

RTCPeerConnection extensions

The RTCPeerConnection interface is defined in [[WEBRTC]]. This document extends that interface by adding an additional static method.

partial interface RTCPeerConnection {
  static sequence<RTCIceServer> getDefaultIceServers();
};

Methods

getDefaultIceServers

Returns a list of ICE servers that are configured into the browser. A browser might be configured to use local or private STUN or TURN servers. This method allows an application to learn about these servers and optionally use them.

This list is likely to be persistent and is the same across origins. It thus increases the fingerprinting surface of the browser. In privacy-sensitive contexts, browsers can consider mitigations such as only providing this data to whitelisted origins (or not providing it at all.)

Since the use of this information is left to the discretion of application developers, configuring a user agent with these defaults does not per se increase a user's ability to limit the exposure of their IP addresses.

If set, the configured default ICE servers exposed by getDefaultIceServers on RTCPeerConnection instances provides persistent information across time and origins which increases the fingerprinting surface of a given browser.

getDefaultIceServers() was moved from [[WEBRTC]] to this extension spec due to lack of support from implementers and concerns discussed in webrtc-pc#2023.

RTCRtpTransceiver extensions

The RTCRtpTransceiver interface is defined in [[WEBRTC]]. This document extends that interface by adding an additional method and attribute in order to control negotiation of RTP header extensions.

partial dictionary RTCRtpHeaderExtensionCapability {
  RTCRtpTransceiverDirection direction = "sendrecv";
};
      
partial interface RTCRtpTransceiver {
  void setOfferedRtpHeaderExtensions(
sequence<RTCRtpHeaderExtensionCapability> headerExtensionsToOffer);
  readonly attribute FrozenArray<RTCRtpHeaderExtensionCapability> headerExtensionsToOffer;
  readonly attribute FrozenArray<RTCRtpHeaderExtensionCapability> headerExtensionsNegotiated;
};
    

Let [[\HeaderExtensionsToOffer]] be an internal slot of the RTCRtpTransceiver, initialized to the platform-specific list of implemented RTP header extensions. The direction attribute for all extensions that are mandatory to use MUST be initialized to an appropriate value other than "stopped". The direction attribute for extensions that will not be offered by default in an initial offer MUST be initialized to "stopped".

The list of header extensions that MUST/SHOULD be supported is listed in [[RTCWEB-RTP]], section 5.2. The "mid" extension is mandatory to use when BUNDLE is in use, per [[BUNDLE]] section 9.1.

Let [[\HeaderExtensionsNegotiated]] be an internal slot of the RTCRtpTransceiver, initialized to an empty list.

Modifications to existing procedures

In the set an RTCSessionDescription algorithm, add a step right after the step that sets transceiver.[[\Sender]].[[\SendCodecs]], saying "For each transciever, set [[\HeaderExtensionsNegotiated]] to an empty list, and then for each "a=hdrext" line, add an appropriate RTCRtpHeaderExtensionCapability to the list.

In the algorithms for generating initial offers in [[RTCWEB-JSEP]] section 5.2.1, replace "for each supported RTP header extension, an "a=extmap" line, as specified in [[RFC5285]], section 5" " with "For each RTP header extension "e" listed in [[\HeaderExtensionsToOffer]] where direction is not "stopped", an "a=extmap" line, as specified in [[RFC5285]], section 5, with direction taken from "e"'s direction attribute."

In the algorithm for generating subsequent offers in [[RTCWEB-JSEP]] section 5.2.2, replace "The RTP header extensions MUST only include those that are present in the most recent answer" with "For each RTP header extension listed in [[\HeaderExtensionsToOffer]] that is also present in the most recent answer, and where direction is not "stopped", generate an appropriate "a=extmap" line with "direction" set according to the rules of [[RFC5285]] section 6, considering the direction in [[\HeaderExtensionsToOffer]] to indicate the answerer's desired usage". This preserves the property that the set of extensions may not grow with subsequent offers.

In the algorithm for generating initial answers in [[RTCWEB-JSEP]] section 5.3.1, replace "For each supported RTP header extension that is present in the offer" with "For each supported RTP header extension that is present in the offer and is also present in [[\HeaderExtensionsToOffer]] with a direction different from "stopped"".

Since JSEP does not know about WebRTC internal slots, merging this change requires more work on a JSEP revision.

Methods

setOfferedRtpHeaderExtensions
Execute the following steps:
  1. Let extlist be the list of extensions in the argument.
  2. For each extension ext in extlist, do the following:
    1. If ext's "uri" value is missing, throw a TypeError.
    2. If ext's "uri" value is not present in [[\HeaderExtensionsToOffer]], throw a NotSupported error.
    3. Let entry be the entry with the same "uri" value in [[\HeaderExtensionsToOffer]].
    4. If ext's "direction" field is not "sendrecv", and "uri" indicates a mandatory-to-use attribute that is required to be both sent and received, throw an IllegalArgument error.
    5. if ext's "direction" field is "stopped", and "uri" indicates a mandatory-to-implement extension, throw an IllegalArgument error.
    6. Set the "direction" field of entry to ext's "direction" field.

Attributes

headerExtensionsToOffer, readonly
Returns the value of the [[\HeaderExtensionsToOffer]] slot.
headerExtensionsNegotiated, readonly
Returns the value of the [[\HeaderExtensionsNegotiated]] slot.

RTCRtpHeaderExtensionParameters extensions

The RTCRtpHeaderExtensionParameters dictionary is defined in [[WEBRTC]]. This document extends that interface by adding an additional member.

      partial dictionary RTCRtpHeaderExtensionParameters {
        boolean enabled;
      };
    

Attributes

enabled, of type boolean

When returned from getParameters() on an RTCRTPSender, "enabled" is true when the RTP sender is configured to send this header extension when appropriate. If the attribute is missing, it means that it is enabled and cannot be disabled.

When passed to setParameters(), setting "enabled" to true instructs the RTP sender to send this header extension when appropriate. Setting "enabled" to false instructs the RTP sender to never send this extension.

When calling getParameters() on an RTCRTPReceiver, the "enabled" member will always be missing.

The list of extensions returned from getParameters() on a sender will only include the extensions that have been negotiated for sending.

The inclusion of a settable member of RTCRtpHeaderExtensionParameters means that "headerExtensions" is no longer a read-only member of RTCRtpParameters. The "uri", "id" and "encrypted" members of RTCRtpHeaderExtensionParameters remain read-only.

Changing the list, apart from setting the "enabled" member, MUST cause setParameters() to reject with an InvalidModificationError.

{{RTCRtpReceiver}} extensions

The {{RTCRtpReceiver}} interface is defined in [[WEBRTC]]. This document extends that interface by adding an additional internal slot and attribute.

Let RTCRtpReceiver objects have a [[\PlayoutDelay]] internal slot initially initialized to null.

partial interface RTCRtpReceiver {
  attribute double? playoutDelay;
  };

Attributes

playoutDelay of type double, nullable

This attribute allows the application to specify a target duration of time between network reception of media and playout. The User Agent SHOULD NOT playout audio or video that is received unless this amount of time has passed in seconds, allowing the User Agent to perform more or less buffering than it might otherwise do. This allows to influence the tradeoffs between having a higher delay and the risk that buffers such as the jitter buffer will run out of audio or video frames to play due to network jitter.

The User Agent may have a minimum allowed delay and a maximum allowed delay reflecting what the User Agent is able or willing to provide based on network conditions and memory constraints.

The playout delay hint applies even if DTX is used. For example, if DTX is used and packets start flowing after silence, the hint can influence the User Agent to buffer these packets rather than playing them out.

If the track is paired with other tracks through RTCRtpReceiver [[\AssociatedRemoteMediaStreams]] internal slot, then it will be synchronized with other tracks (for e.g. audio video synchronization). This means that even if one of the paired tracks is delayed through [[\PlayoutDelay]] then the User Agent synchronization mechanism will automatically delay all others paired tracks. If multiple such paired tracks are delayed through [[\PlayoutDelay]] by different amounts then the largest of those hints will take precedence in synchronization mechanism.

The receiver's average delay can be measured as the delta jitterBufferDelay divided by the delta jitterBufferEmittedCount.

On getting, this attribute MUST return the value of the [[\PlayoutDelay]] internal slot.

On setting, the User Agent MUST run the following steps:

  1. Let receiver be the RTCRtpReceiver object on which the setter is invoked.

  2. Let delay be the argument to the setter.

  3. If delay is negative or larger than 4 seconds then, throw an RangeError and abort these steps.

  4. Set the value of receiver's [[\PlayoutDelay]] internal slot to delay.

  5. In parallel, begin executing the following steps:

    1. Update the underlying system about the new delay request, or that there is no hint if delay is null.

      If the given delay value is below minimum allowed delay or above maximum allowed delay then the value used MUST be clamped to minimum allowed delay or maximum allowed delay to be as close as possible to the requested one.

      If User Agent chooses delay different from requested one, for e.g. due to network conditions or physical memory constraints, this is not reflected in the [[\PlayoutDelay]] internal slot.

    2. Modifying the delay of the underlying system SHOULD affect the internal audio or video buffering gradually in order not to hurt user experience. Audio samples or video frames SHOULD be accelerated or decelerated before playout, similarly to how it is done for audio/video synchronization or in response to congestion control.

      The acceleration or deceleration rate may vary depending on network conditions or the type of audio received (e.g. speech or background noise). It MAY take several seconds to achieve 1 second of buffering but SHOULD not take more than 30 seconds assuming packets are being received. The speed MAY be different for audio and video.

      For audio, acceleration and deceleration can be measured with insertedSamplesForDeceleration and removedSamplesForAcceleration. For video, this may result in the same frame being rendered multiple times or frames may be dropped.

RTCRtpEncodingParameters extensions

The RTCRtpEncodingParameters dictionary is defined in [[WEBRTC]]. This document extends that dictionary by adding an additional boolean flag.

partial dictionary RTCRtpEncodingParameters {
      boolean adaptivePtime = false;
      double maxFramerate;
    };

Dictionary RTCRtpEncodingParameters Members

adaptivePtime of type boolean, defaulting to false.

Indicates whether this encoding MAY dynamically change the frame length. If the value is true, the user agent MAY use any valid frame length for any of its frames, and MAY change this at any time. Valid values are multiples of 10ms. If the maxptime attribute (defined in [[RFC4566]] Section 6) is specified, that maximum applies. Read-only parameter.

Using a longer frame length reduces the bandwidth consumption due to overhead, but does so at the cost of increased latency. Changing the frame length dynamically allows the user agent to adapt its bandwidth allocation strategy based on the current network conditions.

If adaptivePtime is set to true, ptime MUST NOT be set; otherwise, InvalidModificationError MUST be thrown.

maxFramerate of type double

When present, indicates the maximum frame rate that can be used to send this encoding, in frames per second. The user agent is free to allocate bandwidth between the encodings, as long as the maxFramerate value is not exceeded.

When set with addTransceiver, the frame rate is required to be 0.0 or greater. If this is not the case, throw a RangeError.

If changed with setParameters, the new frame rate takes effect after the current picture is completed; setting the max frame rate to zero thus has the effect of freezing the video on the next frame. Upon setting with setParameters, if the value is less than 0.0, reject with a RangeError.

maxFramerate was moved from [[WEBRTC]] to this extension spec due to lack of support from implementers.

RTCIceCredentialType extensions

The following enum values are added to the RTCIceCredentialType enum defined in [[WEBRTC]].

// This is a partial enum, but this is not yet expressable in WebIDL.
enum RTCIceCredentialType {
    "oauth"
};
Enumeration description
oauth

An OAuth 2.0 based authentication method, as described in [[RFC7635]].

For OAuth Authentication, the ICE Agent requires three pieces of credential information. The credential is composed of a kid, which the RTCIceServer username member is used for, and macKey and accessToken, which are placed in the RTCOAuthCredential dictionary.

This specification does not define how an application (acting as the OAuth Client) obtains the accessToken, kid and macKey from the Authorization Server, as WebRTC only handles the interaction between the ICE agent and TURN server. For example, the application may use the OAuth 2.0 Implicit Grant type, with PoP (Proof-of-Possession) Token type, as described in [[RFC6749]] and [[OAUTH-POP-KEY-DISTRIBUTION]]; an example of this is provided in [[RFC7635]], Appendix B.

The application, acting as the OAuth Client, is responsible for refreshing the credential information and updating the ICE Agent with fresh new credentials before the accessToken expires. The OAuth Client can use the RTCPeerConnection.setConfiguration method to periodically refresh the TURN credentials.

The length of the HMAC key (RTCOAuthCredential.macKey) MAY be any integer number of bytes greater than 20 (160 bits).

According to [[RFC7635]] Section 4.1, the HMAC key MUST be a symmetric key, as asymmetric keys would result in large access tokens which may not fit in a single STUN message.

Currently the STUN/TURN protocols use only SHA-1 and SHA-2 family hash algorithms for Message Integrity Protection, as defined in [[RFC5389]] Section 15.4, and [[STUN-BIS]] Section 14.6.

When setting a configuration and evaluating urls, also run the following step:

  1. If scheme name is turn or turns, and server.credentialType is "oauth", and server.credential is not an RTCOAuthCredential, then throw an InvalidAccessError.

Support for the oauth value of RTCIceCredentialType is marked as a feature at risk, since there is no clear commitment from implementers.

RTCOAuthCredential Dictionary

The RTCOAuthCredential dictionary is used to describe the OAuth auth credential information which is used by the STUN/TURN client (inside the ICE Agent) to authenticate against a STUN/TURN server, as described in [[RFC7635]]. Note that the kid parameter is not located in this dictionary, but in RTCIceServer's username member.

Support for the RTCOAuthCredential dictionary is marked as a feature at risk, since there is no clear commitment from implementers.

dictionary RTCOAuthCredential {
    required DOMString macKey;
    required DOMString accessToken;
};

Dictionary RTCOAuthCredential Members

macKey of type DOMString, required

The "mac_key", as described in [[RFC7635]], Section 6.2, in a base64-url encoded format. It is used in STUN message integrity hash calculation (as the password is used in password based authentication). Note that the OAuth response "key" parameter is a JSON Web Key (JWK) or a JWK encrypted with a JWE format. Also note that this is the only OAuth parameter whose value is not used directly, but must be extracted from the "k" parameter value from the JWK, which contains the needed base64-encoded "mac_key".

accessToken of type DOMString, required

The "access_token", as described in [[RFC7635]], Section 6.2, in a base64-encoded format. This is an encrypted self-contained token that is opaque to the application. Authenticated encryption is used for message encryption and integrity protection. The access token contains a non-encrypted nonce value, which is used by the Authorization Server for unique mac_key generation. The second part of the token is protected by Authenticated Encryption. It contains the mac_key, a timestamp and a lifetime. The timestamp combined with lifetime provides expiry information; this information describes the time window during which the token credential is valid and accepted by the TURN server.

An example of an RTCOAuthCredential dictionary is:

{
  macKey: 'WmtzanB3ZW9peFhtdm42NzUzNG0=',
  accessToken: 'AAwg3kPHWPfvk9bDFL936wYvkoctMADzQ5VhNDgeMR3+ZlZ35byg972fW8QjpEl7bx91YLBPFsIhsxloWcXPhA=='
}

RTCIceServer extensions

The RTCIceServer dictionary is defined in [[WEBRTC]]. This document extends that dictionary by adding the credential attribute, and adds a paragraph to how to interpret the existing username attribute.

partial dictionary RTCIceServer {
    // This attribute is not new in this extension spec, but how to interpret it
    // in the case of credentialType being "oauth" is described here.
    DOMString username;
    (DOMString or RTCOAuthCredential) credential;
};

Dictionary RTCIceServer Members

username of type DOMString

How to interpret the username when this RTCIceServer object represents a TURN server, and credentialType is "password" is specified in RTCIceServer of [[WEBRTC]].

If this RTCIceServer object represents a TURN server, and credentialType is "oauth", then this attribute specifies the Key ID (kid) of the shared symmetric key, which is shared between the TURN server and the Authorization Server, as described in [[RFC7635]]. It is an ephemeral and unique key identifier. The kid allows the TURN server to select the appropriate keying material for decryption of the Access-Token, so the key identified by this kid is used in the Authenticated Encryption of the "access_token". The kid value is equal with the OAuth response "kid" parameter, as defined in [[RFC7515]] Section 4.1.4.

credential of type (DOMString or RTCOAuthCredential)

If this RTCIceServer object represents a TURN server, then this attribute specifies the credential to use with that TURN server.

If credentialType is "password", credential is a DOMString, and represents a long-term authentication password, as described in [[RFC5389]], Section 10.2.

If credentialType is "oauth", credential is an RTCOAuthCredential, which contains the OAuth access token and MAC key.

If this RTCIceServer object represents a TURN server, and credentialType is "oauth", then this attribute specifies the Key ID (kid) of the shared symmetric key, which is shared between the TURN server and the Authorization Server, as described in [[RFC7635]]. It is an ephemeral and unique key identifier. The kid allows the TURN server to select the appropriate keying material for decryption of the Access-Token, so the key identified by this kid is used in the Authenticated Encryption of the "access_token". The kid value is equal with the OAuth response "kid" parameter, as defined in [[RFC7515]] Section 4.1.4.

An example array of RTCIceServer objects is:

{
  urls: 'turns:turn2.example.net',	
  username: '22BIjxU93h/IgwEb',	
  credential: {	
    macKey: 'WmtzanB3ZW9peFhtdm42NzUzNG0=',	
    accessToken: 'AAwg3kPHWPfvk9bDFL936wYvkoctMADzQ5VhNDgeMR3+ZlZ35byg972fW8QjpEl7bx91YLBPFsIhsxloWcXPhA=='	
  },	
  credentialType: 'oauth'
}

{{RTCRtpContributingSource}} extensions

The {{RTCRtpContributingSource}} dictionary is defined in [[WEBRTC]]. This document extends that dictionary by adding two additional members.

In this section, the capture system refers to the system where media is sourced from and the sender system refers to the system that is sending RTP and RTCP packets to the receiver system where {{RTCRtpContributingSource}} data is populated.

In a direct connection, the capture system is the same as the sender system. But when one or more RTCP-terminating intermediate systems (e.g. mixers) are involved this is not the case. In such cases, media is sourced from the capture system, may be relayed through a number of intermediate systems and is then finally sent from the sender system to the receiver system. The sender system-receiver system path only represents the "last hop".

Despite RTCRemoteInboundRtpStreamStats.roundTripTime measurements only accounting for the "last hop", one-way delay from the capture system's time of capture to the receiver system's time of playout can be estimated if the RTP Header Extension for Absolute Capture Time is used all hops of the way, where each RTCP-terminating intermediate system appropriately updates the estimated capture clock offset.

partial dictionary RTCRtpContributingSource {
      DOMHighResTimeStamp captureTimestamp;
      DOMHighResTimeStamp senderCaptureTimeOffset;
    };

Dictionary {{RTCRtpContributingSource}} Members

captureTimestamp of type {{DOMHighResTimeStamp}}.

The {{captureTimestamp}} is the timestamp that, the most recent frame (from an RTP packet originating from this source) delivered to the {{RTCRtpReceiver}}'s {{MediaStreamTrack}}, was originally captured. Its reference clock is the capture system's NTP clock (same clock used to generate NTP timestamps for RTCP sender reports on that system).

On populating this member, the user agent MUST run the following steps:

  1. If the relevant RTP packet contains the RTP Header Extension for Absolute Capture Time, return the value of the absolute capture timestamp field and abort these steps.

  2. Otherwise, if the relevant RTP packet does not contain the RTP Header Extension for Absolute Capture Time but a previous RTP packet did, return the result of calculating the absolute capture timestamp according to timestamp interpolation and abort these steps.

  3. Otherwise, return undefined.

If multiple receiving tracks are sourced from the same capture system, two {{captureTimestamp}}s can be used to accurately measure audio-video synchronization since both timestamps are based on the same system's clock.

senderCaptureTimeOffset of type {{DOMHighResTimeStamp}}.

The {{senderCaptureTimeOffset}} is the sender system's estimate of the offset between its own NTP clock and the capture system's NTP clock, for the same frame that the {{captureTimestamp}} was originated from.

On populating this member, the user agent MUST run the following steps:

  1. If the relevant RTP packet contains the RTP Header Extension for Absolute Capture Time and the estimated capture clock offset field is present, return the value of the estimated capture clock offset field and abort these steps.

  2. Otherwise, if the relevant RTP packet does not contain the RTP Header Extension for Absolute Capture Time's estimated capture clock offset field, but a previous RTP packet did, return the most recent value that was present and abort these steps.

  3. Otherwise, return undefined.

The time of capture can estimatedly be expressed in the sender system's clock as follows: senderCaptureTimestamp = {{captureTimestamp}} + {{senderCaptureTimeOffset}}.

The offset between the sender system's clock and the receiver system's clock can be estimated as follows: senderReceiverTimeOffset = RTCRemoteOutboundRtpStreamStats.timestamp - (RTCRemoteOutboundRtpStreamStats.remoteTimestamp + RTCRemoteInboundRtpStreamStats.roundTripTime / 2).

The time of capture can estimatedly be expressed in the receiver system's clock as follows: receiverCaptureTimestamp = senderCaptureTimestamp + senderReceiverTimeOffset.

The one-way delay between the capture system's time of capture and the receiver system's time of playout can be estimated as follows: RTCRtpContributingSource.timestamp - receiverCaptureTimestamp.