Open Screen Application Protocol

Editor’s Draft,

More details about this document
This version:
https://w3c.github.io/openscreenprotocol/application.html
Latest published version:
https://www.w3.org/TR/openscreen-application/
Feedback:
GitHub
Inline In Spec
Editor:
Mark Foltz (Google)

Abstract

The Open Screen Application Protocol allows user agents to implement the Presentation API and the Remote Playback API in an interoperable fashion.

Status of this document

This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

This document was published by the Second Screen Working Group as an Editor’s Draft. This document is intended to become a W3C Recommendation.

Feedback and comments on this specification are welcome. Please use Github issues.

Publication as an Editor’s Draft does not imply endorsement by W3C and its Members. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 03 November 2023 W3C Process Document.

1. Introduction

The Open Screen Application Protocol connects browsers to devices capable of rendering Web content for a shared audience. Typically, these are devices like Internet-connected TVs, HDMI dongles, and smart speakers.

The Open Screen Application Protocol is intended to be supported by a variety of connection technologies between browsers and devices. There are specific requirements to allow browsers and devices to discover and authenticate to one another, but these requirements may be met by multiple possible implementations. To maximize interoperability, browsers and devices should support the Open Screen Network Protocol, which provides a way for browsers and devices to discover, connect, and authenticate each other on a local area network.

The Open Screen Application Protocol allows a browser to present a URL, initiate remote playback of an HTML media element, and stream media data to another device.

The Open Screen Application Protocol is intended to be extensible, so that additional capabilities can be added over time. This may include additions to existing Web APIs or new Web APIs.

The accompanying explainer provides more background on the protocol.

1.1. Terminology

An Open Screen Protocol agent (or OSP agent) is any implementation of this protocol (browser, display, speaker, or other software).

Some OSP agents support the Presentation API. The API allows a controlling user agent to initiate presentation of Web content on another device. We call this agent a controller for short. A receiving user agent is responsible for rendering the Web content, which we will call a receiver for short. The the Web content itself is called a presentation.

Some OSP agents also support the Remote Playback API. That API allows an agent to render a media element on a remote playback device. In this document, we refer to it as a receiver because it is shorter and keeps terminology consistent between presentations and remote playbacks. Similarly, we use the term controller to refer to the agent that starts, terminates, and controls the remote playback.

For media streaming, we refer to an OSP agent that sends a media stream as a media sender and an agent that receives a media stream as a media receiver. Note that an agent can be both a media sender and a media receiver.

For additional terms and idioms specific to the Presentation API or Remote Playback API, please consult the respective specifications.

2. Requirements

2.1. Agent Discovery Requirements

Define Discovery Requirements [Issue #342]

2.2. Transport Layer Requirements

Define Transport Layer Requirements [Issue #342]

2.3. Presentation API Requirements

  1. A controller must be able to determine if a receiver is reasonably capable of rendering a specific presentation request URL.

  2. A controller must be able to start a new presentation on a receiver given a presentation request URL and presentation identifier.

  3. A controller must be able to create a new PresentationConnection to an existing presentation on the receiver, given its presentation request URL and presentation identifier.

  4. It must be possible to close a PresentationConnection between a controller and a presentation, and signal both parties with the reason why the connection was closed.

  5. Multiple controllers must be able to connect to a single presentation simultaneously.

  6. Messages sent by the controller must be delivered to the presentation (or vice versa) in a reliable and in-order fashion.

  7. If a message cannot be delivered, then the controller must be able to signal the receiver (or vice versa) that the connection should be closed with reason error.

  8. The controller and presentation must be able to send and receive DOMString messages (represented as string type in ECMAScript).

  9. The controller and presentation must be able to send and receive binary messages (represented as Blob objects in HTML5, or ArrayBuffer or ArrayBufferView types in ECMAScript).

  10. The controller must be able to signal to the receiver to terminate a presentation, given its presentation request URL and presentation identifier.

  11. The receiver must be able to signal all connected controllers when a presentation is terminated.

2.4. Remote Playback API Requirements

  1. A controller must be able to find out whether there is at least one compatible receiver for a given HTMLMediaElement, both instantaneously and continuously.

  2. A controller must be able to to initiate remote playback of an HTMLMediaElement to a compatible receiver.

  3. The controller must be able send media sources as URLs and text tracks from an HTMLMediaElement to a compatible receiver.

  4. The controller must be able send media data from an HTMLMediaElement to a compatible receiver.

  5. During remote playback, the controller and the remote playback device must be able to synchronize the media element state of the HTMLMediaElement.

  6. During remote playback, either the controller or the receiver must be able to disconnect from the other party.

  7. The controller should be able to pass locale and text direction information to the receiver to assist in rendering text during remote playback.

2.5. Non-Functional Requirements

  1. It should be possible to implement an OSP agent using modest hardware requirements, similar to what is found in a low end smartphone, smart TV or streaming device. See the Device Specifications document for agent hardware specifications.

  2. Agents should present sensible information to the user when a protocol operation fails. For example, if a controller is unable to start a presentation, it should be possible to report in the controller interface if it was a network error, authentication error, or the presentation content failed to load.

  3. Message latency between agents should be minimized to permit interactive use. For example, it should be comfortable to type in a form in one agent and have the text appear in the presentation in real time. Real-time latency for gaming or mouse use is ideal, but not a requirement.

  4. The controller initiating a presentation or remote playback should communicate its preferred locale to the receiver, so it can render the content in that locale.

  5. It should be possible to extend the application protocol with optional features not defined explicitly by the specification, to facilitate experimentation and enhancement of the base APIs.

3. Metadata Discovery

To learn further metadata, an agent may send an agent-info-request message and receive back an agent-info-response message. Any agent may send this request at any time to learn about the state and capabilities of another device, which are described by the agent-info message in the agent-info-response.

If an agent changes any information in its agent-info message, it should send an agent-info-event message to all other connected agents with the new agent-info (without waiting for an agent-info-request).

The agent-info message contains the following fields:

display-name (required)

The display name of the agent, intended to be displayed to a user by the requester. The requester should indicate through the UI if the responder is not authenticated or if the display name changes.

model-name (optional)

If the agent is a hardware device, the model name of the device. This is used mainly for debugging purposes, but may be displayed to the user of the requesting agent.

capabilities (required)

The control protocols, roles, and media types the agent supports. Presence indicates a capability and absence indicates lack of a capability. Capabilities should should affect how an agent is presented to a user, such as drawing a different icon depending on the whether it receives audio, video or both.

state-token (required)

A random alphanumeric value consisting of 8 characters in the range [0-9A-Za-z]. This value is set before the agent makes its first connection and must be set to a new value when the agent is reset or otherwise lost all of its state related to this protocol.

locales (required)

The agent’s preferred locales for display of localized content, in the order of user preference. Each entry is an RFC5646 language tag.

The various capabilities have the following meanings:

receive-audio

The agent can render audio via the other protocols it supports. Those other protocols may report more specific capabilities, such as support for certain audio codecs in the streaming protocol.

receive-video

The agent can receive video via the other protocols it supports. Those other protocols may report more specific capabilities, such as support for certain video codecs in the streaming protocol.

receive-presentation

The agent can receive presentations using the presentation protocol.

control-presentation

The agent can control presentations using the presentation protocol.

receive-remote-playback

The agent can receive remote playback using the remote playback protocol.

control-remote-playback

The agent can control remote playback using the remote playback protocol.

receive-streaming

The agent can receiving streaming using the streaming protocol.

send-streaming

The agent can send streaming using the streaming protocol.

NOTE: See the Capabilities Registry for a list of all known capabilities (both defined by this specification, and through § 10 Protocol Extensions).

Rewrite to not depend on QUIC, or move agent-status messges to the network spec [Issue #343]

If a listening agent wishes to receive messages from an advertising agent or an advertising agent wishes to send messages to a listening agent, it may wish to keep the QUIC connection alive. Once neither side needs to keep the connection alive for the purposes of sending or receiving messages, the connection should be closed with an error code of 5139. In order to keep a QUIC connection alive, an agent may send an agent-status-request message, and any agent that receives an agent-status-request message should send an agent-status-response message. Such messages should be sent more frequently than the QUIC idle_timeout transport parameter (see Transport Parameter Encoding in QUIC) and QUIC PING frames should not be used. An idle_timeout transport parameter of 25 seconds is recommended. The agent should behave as though a timer less than the idle_timeout were reset every time a message is sent on a QUIC stream. If the timer expires, a agent-status-request message should be sent.

If a listening agent wishes to send messages to an advertising agent, the listening agent can connect to the advertising agent "on demand"; it does not need to keep the connection alive.

If an OSP agent suspends its network connectivity (e.g. for power saving reasons), it should attempt to resume QUIC connections to the OSP agents to which it was previously connected once network connectivity is restored. Once reconnected, it should send agent-status-request messages to those agents.

The agent-info and agent-status-response messages may be extended to include additional information not defined in this spec, as described in § 10.1 Protocol Extension Fields.

4. Message delivery using CBOR

Rewrite to not depend on QUIC [Issue #343]

Messages are serialized using CBOR. To send a group of messages in order, that group of messages must be sent in one QUIC stream. Independent groups of messages (with no ordering dependency across groups) should be sent in different QUIC streams. In order to put multiple CBOR-serialized messages into the the same QUIC stream, the following is used.

For each message, the OSP agent must write into a unidirectional QUIC stream the following:

  1. A type key representing the type of the message, encoded as a variable-length integer (see Appendix A: Messages for type keys)

  2. The message encoded as CBOR.

If an agent receives a message for which it does not recognize a type key, it must close the QUIC connection with an application error code of 404 and should include the unknown type key in the reason phrase of the CONNECTION_CLOSE frame.

Variable-length integers are encoded in the Variable-Length Integer Encoding used by QUIC.

Many messages are requests and responses, so a common format is defined for those. A request and a response includes a request ID which is an unsigned integer chosen by the requester. Responses must include the request ID of the request they are associated with.

4.1. Type Key Backwards Compatibility

As messages are modified or extended over time, certain rules must be followed to maintain backwards compatibiilty with agents that understand older versions of messages.

  1. If a required field is added to or removed from a message (either to/from the message directly or indirectly through the field of a field), a new type key must be assigned to the message. Is is effectively a new message and must not be sent unless the receiving agent is known to understand the new type key.

  2. If an optional field is added to a message (either to the message directly or indirectly through the field of a field), the type key may remain unchanged if the behavior of older receiving agents that do not understand the added field is compatible with newer sending agents that include the field. Otherwise, a new type key must be assigned.

  3. If an optional field is removed from a message (either from the message directly or indirectly through the field of a field), the type key may remain unchanged if the behavior of newer receiving agents that do not understand the removed field is compatible with older sending agents that include the field. Otherwise, a new type key must be assigned.

  4. Required fields may not be added or removed from array-based messages, such as audio-frame.

5. Presentation Protocol

This section defines the use of the Open Screen Protocol for starting, terminating, and controlling presentations as defined by Presentation API. § 5.1 Presentation API defines how APIs in Presentation API map to the protocol messages defined in this section.

To learn which receivers are available presentation displays for a particular presentation request URL or set of URLs, the controller may send a presentation-url-availability-request message with the following values:

urls

A list of presentation URLs. Must not be empty.

watch-duration

The period of time that the controller is interested in receiving updates about their URLs, should the availability change.

watch-id

An identifier the receiver must use when sending updates about URL availability so that the controller knows which URLs the receiver is referring to.

In response, the receiver should send one presentation-url-availability-response message with the following values:

url-availabilities

A list of URL availability states. Each state must correspond to the matching URL from the request by list index.

While the watch is valid (the watch-duration has not expired), the receivers should send presentation-url-availability-event messages when URL availabilities change. Such events contain the following values:

watch-id

The watch-id given in the presentation-url-availability-response, used to refer to the presentation URLs whose availability has changed.

url-availabilities

A list of URL availability states. Each state must correspond to the URLs from the request referred to by the watch-id.

Note that these messages are not broadcasted to all controllers. They are sent individually to controllers that have requested availability for the URLs that have changed in availability state within the watch duration of the original availability request.

Add power saving as a transport requirement and remove the following. [Issue #342]

To save power, the controller may disconnect the QUIC connection and later reconnect to send availability requests and receive availability responses and updates. The QUIC connection ID may or may not be the same when reconnecting.

To start a presentation, the controller may send a presentation-start-request message to the receiver with the following values:

presentation-id

The presentation identifier

url

The selected presentation URL

headers

headers that the receiver should use to fetch the presentation URL. For example, section 6.6.1 of the Presentation API says that the HTTP Accept-Language header should be provided.

The presentation identifier must follow the restrictions defined by section 6.1 of the Presentation API, in that it must consist of at least 16 ASCII characters.

When the receiver receives the presentation-start-request, it should send back a presentation-start-response message after either the presentation URL has been fetched and loaded, or the receiver has failed to do so. If it has failed, it must respond with the appropriate result (such as invalid-url or timeout). If it has succeeded, it must reply with a success result.

Additionally, the response must include the following:

connection-id

An ID that both agents can use to send connection messages to each other. It is chosen by the receiver for ease of implementation: if the message receiver chooses the connection-id, it may keep the ID unique across connections, thus making message demuxing/routing easier.

The response should include the following:

http-response-code

The numeric HTTP response code that was returned from fetching the presentation URL (after redirects).

To send a presentation message, the controller or receiver may send a presentation-connection-message with the following values:

connection-id

The ID from the presentation-start-response or presentation-connection-open-response messages.

message

The presentation message data.

Rewrite the following NOTE as a transport requirement for any message transport. [Issue #342]

NOTE: An OSP agent should minimize buffering and processing of messages sent or received beyond what is strictly necessary (i.e., CBOR serialization). Message payloads should be treated as real-time data, as they may be used to synchronize playback of media streams between agents or other low latency use cases. The synchronization thresholds recommended in [ITU-R-BT.1359-1] imply that the total agent-to-agent processing latency (including serialization, buffering, and network latency) must be no greater than 45 ms to permit effective lip sync during media playback.

To terminate a presentation, the controller may send a presentation-termination-request message with the following values:

presentation-id

The ID of the presentation to terminate.

reason

Set to application-request if the application requested termination, or user-request if the user requested termination. (These are the only valid values for reason in a presentation-termination-request.)

When a receiver receives a presentation-termination-request, it should send back a presentation-termination-response message to the requesting controller.

It should also notify other controllers about the termination by sending a presentation-termination-event message. And it can send the same message if it terminates a presentation without a request from a controller to do so. This message contains the following values:

presentation-id

The ID of the presentation that was terminated.

source

Set to controller when the termination was in response to a presentation-termination-request, or receiver otherwise.

reason

The detailed reason why the presentation was terminated.

To accept incoming connection requests from controller, a receiver must receive and process the presentation-connection-open-request message which contains the following values:

presentation-id

The ID of the presentation to connect to.

url

The URL of the presentation to connect to.

The receiver should, upon receipt of a presentation-connection-open-request message, send back a presentation-connection-open-response message which contains the following values:

result

a code indicating success or failure, and the reason for the failure

connection-id

An ID that both agents can use to send connection messages to each other. It is chosen by the receiver for ease of implementation (if the message receiver chooses the connection-id, it may keep the ID unique across connections, thus making message demuxing/routing easier).

connection-count

The new number of open connections to the presentation that received the incoming connection request.

If the presentation-connection-open-response message indicates success, the receiver should also send a presentation-change-event to all other endpoints that have an active presentation connection to that presentation with the values:

presentation-id

The ID of the presentation that just received a new presentation connection.

connection-count

The new total number of open connections to that presentation.

A controller may close a connection without terminating the presentation by sending a presentation-connection-close-event message to the receiver with the following values:

connection-id

The ID of the connection that was closed.

reason

Set to close-method-called or connection-object-discarded.

The receiver may also close a connection without terminating a presentation. If it does so, it should send a presentation-connection-close-event message to the controller with the following values:

connection-id

The ID of the connection that was closed.

reason

Set to close-method-called or connection-object-discarded.

connection-count

The number of open presentation connections that remain.

If a receiver closes a presentation connection (for any reason), it should send a presentation-change-event to all other controllers with an open connection to that presentation with the values:

presentation-id

The ID of the presentation that just closed a connection.

connection-count

The number of open presentation connections that remain.

Note: When an agent closes a presentation connection, it is always successful, so request and response messages are not needed. A request to terminate a presentation may succeed or fail, so a response message is required.

5.1. Presentation API

An Open Screen Protocol agent that is a controlling user agent for the Presentation API must support the control-presentation capability. An OSP agent that is a receiving user agent for the Presentation API must support the receive-presentation capability. The same OSP agent may be a controlling user agent and a receiving user agent.

This is how the Presentation API uses the § 5 Presentation Protocol:

When section 6.4.2 says "This list of presentation displays ... is populated based on an implementation specific discovery mechanism", the controller may use the mDNS, QUIC, agent-info-request, and presentation-url-availability-request messages defined previously in this spec to discover receivers.

When section 6.4.2 says "To further save power, ... implementation specific discovery of presentation displays can be resumed or suspended.", the agent may use the power saving mechanism defined in the previous section.

When section 6.3.4 says "Using an implementation specific mechanism, tell U to create a receiving browsing context with D, presentationUrl, and I as parameters.", U (the controller) may send a presentation-start-request message to D (the receiver), with I for the presentation identifier and presentationUrl for the selected presentation URL.

When section 6.3.5 says to "establish a presentation connection with newConnection," let U be the presentationURL of newConnection and I the presentation identifier of newConnection. The agent should send a presentation-connection-open-request message with U for the url and I for the presentation-id.

When section 6.5.2 says "Using an implementation specific mechanism, transmit the contents of messageOrData as the presentation message data and messageType as the presentation message type to the destination browsing context", the controller may send a presentation-connection-message with messageOrData for the presentation message data. Note that the messageType is embedded in the encoded CBOR type and does not need an additional value in the message.

When section 6.5.5 says "Start to signal to the destination browsing context the intention to close the corresponding PresentationConnection", the agent may send a presentation-connection-close-event message to the other agent with the destination browsing context and a presentation-change-event when required.

When section 6.5.6 says "Send a termination request for the presentation to its receiving user agent using an implementation specific mechanism", the controller may send a presentation-termination-request message to the receiver with a reason of application-request.

When section 6.7.1 says "it MUST listen to and accept incoming connection requests from a controlling browsing context using an implementation specific mechanism", the receiver must receive and process the presentation-connection-open-request.

When section 6.7.1 says "Establish the connection between the controlling and receiving browsing contexts using an implementation specific mechanism.", the receiver must send a presentation-connection-open-response message and presentation-change-event messages when required.

6. Representation Of Time

The § 7 Remote Playback Protocol and the § 8 Streaming Protocol represent points of time and durations in terms of a time scale. A time scale is a common denominator for time values that allows values to be expressed as rational numbers without loss of precision. The time scale is represented in hertz, such as 90000 for 90000 Hz, a common time scale for video.

7. Remote Playback Protocol

This section defines the use of the Open Screen Protocol for starting, terminating, and controlling remote playback of media as defined by the Remote Playback API. § 7.2 Remote Playback API defines how APIs in Remote Playback API map to the protocol messages defined in this section.

For all messages defined in this section, see Appendix A: Messages for the full CDDL definitions.

To learn which receivers are compatible remote playback devices for a particular URL or set of URLs, the controller may send a remote-playback-availability-request message with the following values:

sources

A list of media resources, the same as specified in the remote-playback-start-request message. Must not be empty.

headers

headers that the receiver should use to fetch the urls. For example, section 6.2.4 of the Remote Playback API says that the Accept-Language header should be provided.

watch-duration

The period of time that the controller is interested in receiving updates about their URLs, should the availability change.

watch-id

An identifier the receiver must use when sending updates about URL availability so that the controller knows which URLs the receiver is referring to.

In response, the receiver should send a remote-playback-availability-response message with the following values:

url-availabilities

A list of URL availability states. Each state must correspond to the matching URL from the request by list index.

The receivers should later (up to the current time plus request watch-duration) send remote-playback-availability-event messages if URL availabilities change. Such events contain the following values:

watch-id

The watch-id given in the remote-playback-availability-response used to refer to the remote playback URLs whose availability has changed.

url-availabilities

A list of URL availability states. Each state must correspond to the URLs from the request referred to by the watch-id.

Note that these messages are not broadcasted to all controllers. They are sent individually to controllers that have requested availability for the URLs that have changed in availability state within the watch duration of the original availability request.

Add power saving as a transport requirement and remove the following. [Issue #342]

To save power, the controller may disconnect the QUIC connection and later reconnect to send availability requests and receive availability responses and updates. The QUIC connection ID may or may not be the same when reconnecting.

To start remote playback, the controller may send a remote-playback-start-request message to the receiver with the following values:

remote-playback-id

An identifier for this remote playback. It should be universally unique among all remote playbacks.

Note: A version 4 (pseudorandom) UUID is recommended as it meets the requirements for a remote-playback-id.

sources (optional)

The media resources that the controller has selected for playback on the receiver. Each source must include a source URL and should include an extended MIME type when available for the media resource. If sources is missing or empty, the remoting field must be populated, as the controller will use a streaming session to send encoded media.

text-track-urls

URLs of text tracks associated with the media resources.

controls

Initial controls for modifying the initial state of the remote playback, as defined in § 7.1 Remote Playback State and Controls. The controller may send controls that are optional for the receiver to support before it knows the receiver supports them. If the receiver does not support them, it will ignore them and the controller will learn that it does not support them from the remote-playback-start-response message.

remoting (optional)

Parameters for starting a streaming session associated with this remote playback. If not included, no streaming session is started. Required when sources is missing or empty.

When the receiver receives a remote-playback-start-request message, it should send back a remote-playback-start-response message. It should do so quickly, usually before the media resource has been loaded and instead give updates of the progress of loading with remote-playback-state-event messages, unless the receiver decides to not attempt to load the resource at all. If it chooses not to, it must respond with the appropriate failure result (such as timeout or invalid-url). Additionally, the response must include the following:

state

The initial state of the remote playback, as defined in § 7.1 Remote Playback State and Controls.

remoting (optional)

A response to the started streaming session associated with this remote playback. If not included, no streaming session is started.

If a streaming session is started, streaming messages such a streaming-session-modify-request and video-frame can be used for the streaming session as if the streaming session had been started with streaming-session-start-request and streaming-session-start-response. The streaming session may be terminated before the remote playback is terminated, but if the remote playback is terminated first, the streaming session associated with it is automatically terminated.

If the controller wishes to modify the state of the remote playback (for example, to pause, resume, skip, etc), it may send a remote-playback-modify-request message with the following values:

remote-playback-id

The ID of the remote playback to be modified.

controls

Updated controls as defined in § 7.1 Remote Playback State and Controls.

When a receiver receives a remote-playback-modify-request it should send a remote-playback-modify-response message in reply with the following values:

state

The updated state of the remote playback as defined in § 7.1 Remote Playback State and Controls.

When the state of remote playback changes without request for modification from the controller (such as when the skips or pauses due to user user interaction on the receiver), the receiver may send a remote-playback-state-event to the controller.

The receiver should send a remote-playback-state-event message whenever:

Any of the following methods are called:

Any of the following attributes observably change since the last sent remote-playback-state-event message:

The timeline offset associated with the playback changes since the last sent remote-playback-state-event message:

The stalled event needs to fire at the associated HTMLMediaElement instance.

More than 250ms pass since the last remote-playback-state-event message and any of the following attributes observably change since the last remote-playback-state-event message. Any new continuously changing attributes fall under this rule.

NOTE: A media element is required to fire a timeupdate event every 250ms or sooner.

remote-playback-id

The ID of the remote playback whose state has changed.

state

The updated state of the remote playback, as defined in § 7.1 Remote Playback State and Controls.

To terminate the remote playback, the controller may send a remote-playback-termination-request message with the following values:

remote-playback-id

The ID of the remote playback to terminate.

reason

The reason the remote playback is being terminated.

When a receiver receives a remote-playback-termination-request, it should send back a remote-playback-termination-response message to the controller.

If a receiver terminates a remote playback without a request from the controller to do so, it must send a remote-playback-termination-event message to the controller with the following values:

remote-playback-id

The ID of the remote playback that was terminated.

reason

The reason the remote playback was terminated.

As mentioned in Remote Playback API section 6.2.7, terminating the remote playback means the controller is no longer controlling the remote playback and does not necessarily stop media from rendering on the receiver. Whether or not the receiver stops rendering media depends upon the implementation of the receiver.

7.1. Remote Playback State and Controls

In order for the controller and receiver to stay in sync with regards to the state of the remote playback, the controller may send controls to modify the state (for example, via the remote-playback-modify-request message) and the receiver may send updates about state changes (for example, via the remote-playback-state-event message).

The controls sent by the controller include the following individual control values, each of which is optional. This allows the controller to change one control value or many control values at once without having to specify all control values every time. A non-present control value indicates no change. A present control value indicates the change defined below. These controls intentionally mirror settable attributes and methods of the HTMLMediaElement.

source

Change the media resource. See HTMLMediaElement.src for more details. Must not be used in the initial controls of the remote-playback-start-request message (which already contains a media resource).

preload

Set how aggressively to preload media. See HTMLMediaElement.preload for more details. Should only be used in the initial controls of the remote-playback-start-request message or when the source is changed. If not set in the initial controls, it is left to the receiver to decide. This is optional for the receiver to support and if not supported, the receiver will behave as though it were never set.

loop

Set whether or not to loop media. See HTMLMediaElement.loop for more details. Should only be used in the initial control of the remote-playback-start-request. If not set in the initial controls, it is assumed to be false.

paused

If true, pause; if false, resume. See HTMLMediaElement.pause() and HTMLMediaElement.play() for more details. If not set in the initial controls, it is left to the receiver to decide.

muted

If true, mute; if false, unmute. See HTMLMediaElement.muted for more details. If not set in the initial controls, it is left to the receiver to decide.

volume

Set the audio volume in the range from 0.0 to 1.0 inclusive. See HTMLMediaElement.volume for more details. If not set in the initial controls, it is left to the receiver to decide.

seek

Seek to a precise time. See HTMLMediaElement.currentTime for more details.

fast-seek

Seek to an approximate time as fast as possible. See HTMLMediaElement.fastSeek() for more details.

playback-rate

Set the rate a which the media plays. See HTMLMediaElement.playbackRate for more details. If not set in the initial controls, it is left to the receiver to decide. This is optional for the receiver to support and if not supported, the receiver will behave as though it were never set.

poster

Set the URL of an image to show when video data is not available. See poster frame for more details. If not set in the initial controls, no poster is used and the receiver can choose what to render when video data is unavailable. This is optional for the receiver to support and if not supported, the receiver will behave as though it were never set.

enabled-audio-track-ids

Enable included audio tracks by ID and disable all other audio tracks. See HTMLMediaElement.audioTracks for more details.

selected-video-track-id

Select the given video track by ID and unselect all other video tracks. See HTMLMediaElement.videoTracks for more details.

added-text-tracks

Add text tracks with the given kinds, labels, and languages. See HTMLMediaElement.addTextTrack() for more details. This is optional for the receiver to support and if not supported, the receiver will behave as though it were never set.

changed-text-tracks

Change text tracks by ID. All other text tracks are left unchanged. Set the mode, add cues, and remove cues by id. See HTMLMediaElement.textTracks for more details. Note that future specifications or extensions to this specifications are expected to add new fields to the text-track-cue (such as text size, alignment, position, etc). Adding and removing cues is optional for the receiver to support and if not supported, the receiver will behave as though no cues were added or removed (both adding and removing are indicated via the support for "added-cues"). As specified in HTMLMediaElement.textTracks, if a cue ID is invalid (removing an un-added ID or adding an ID twice, for example), the receiver may reject the text track change.

Field Default value for the initial controls Receiver support
source urls in remote-playback-start-request Required
preload Decided by the receiver Not required
loop False Required
paused Decided by the receiver Required
muted Decided by the receiver Required
volume Decided by the receiver Required
seek (None) Required
fast-seek (None) Required
playback-rate Decided by the receiver Not required
poster Decided by the receiver Not required
enabled-audio-track-ids (None) Required
selected-video-track-id (None) Required
added-text-tracks (None) Not required
changed-text-tracks (None) Not required

The states sent by the receiver include the following individual state values, each of which is optional. This allows the receiver to update the controller about more than one state value at once without having to specify all state values every time. A non-present state value indicates the state has not changed.

supports

The controls the receiver supports. These may differ according to the media resource and should not change unless the media resource also changes. The default is empty (support for nothing) for the initial state in the remote-playback-start-response message.

source

The current media resource. See HTMLMediaElement.currentSrc. Must be present in the initial state in the remote-playback-start-response message so the controller knows what media resource was selected for playback.

loading

The state of network activity for loading the media resource. See HTMLMediaElement.networkState. The default is empty (NETWORK_EMPTY) for the initial state in the remote-playback-start-response message.

loaded

The state of the loaded media (whether enough is loaded to play). See HTMLMediaElement.readyState. The default is nothing (HAVE_NOTHING) for the initial state in the remote-playback-start-response message.

error

A major error occurred which prevents the remote playback from continuing. See HTMLMediaElement.error and media error codes. The default is no error for the initial state in the remote-playback-start-response message.

epoch

The "zero time" of the media timeline, in milliseconds relative to the epoch. See timeline offset and HTMLMediaElement.getStartDate(). The default is an unknown epoch for the initial state in the remote-playback-start-response message, which is represented by null.

duration

The duration of the media timeline, in seconds. See HTMLMediaElement.duration. The default is an unknown duration for the initial state in the remote-playback-start-response message, which is represented by null.

buffered-time-ranges

The time ranges for which media has been buffered. See HTMLMediaElement.buffered. The default is an empty array for the initial state in the remote-playback-start-response message.

played-time-ranges

The time ranges reached by the playback position during normal playback. See HTMLMediaElement.played. The default is an empty array for the initial state in the remote-playback-start-response message.

seekable-time-ranges

The time ranges for which media is seekable by the controller or the receiver. See HTMLMediaElement.seekable. The default is an empty array for the initial state in the remote-playback-start-response message.

position

The playback position. See official playback position and HTMLMediaElement.currentTime. The default is 0 for the initial state in the remote-playback-start-response message.

playbackRate

The current rate of playback on a scale where 1.0 is "normal speed". See HTMLMediaElement.playbackRate. The default is 1.0 for the initial state in the remote-playback-start-response message.

paused

Whether media is paused or not. See HTMLMediaElement.paused. The default is false for the initial state in the remote-playback-start-response message.

seeking

Whether the receiver is seeking or not. See HTMLMediaElement.seeking. The default is false for the initial state in the remote-playback-start-response message.

stalled

If true, media is not playing because not enough media is loaded, and false otherwise. See the stalled event. The default is false for the initial state in the remote-playback-start-response message.

ended

Whether media has reached the end or not. See HTMLMediaElement.ended. The default is false for the initial state in the remote-playback-start-response message.

volume

The current volume of playback on a scale of 0.0 to 1.0. See HTMLMediaElement.volume. The default is 1.0 for the initial state in the remote-playback-start-response message.

muted

True if audio is muted (overriding the volume value) and false otherwise. See HTMLMediaElement.muted. The default is false for the initial state in the remote-playback-start-response message.

resolution

The "intrinsic width" and "intrinsic width" of the video. See HTMLVideoElement.videoWidth and HTMLVideoElement.videoHeight. The default is an unknown resolution for the initial state in the remote-playback-start-response message, which is represented by null.

audio-tracks

The available audio tracks, which can individually enabled or disabled. See HTMLMediaElement.audioTracks. The default is an empty array for the initial state in the remote-playback-start-response message.

video-tracks

The available video tracks. Only one may be selected. See HTMLMediaElement.videoTracks. The default is an empty array for the initial state in the remote-playback-start-response message.

text-tracks

The available text tracks, which can be individually shown, hidden, or disabled. See HTMLMediaElement.textTracks. The controller can also add cues to and remove cues from text tracks. The default is an empty array for the initial state in the remote-playback-start-response message.

Media positions, durations, and time ranges are defined in terms of the media timeline specified in HTML, which are fractional seconds between zero and the media duration.

NOTE: An Open Screen agent can convert between values on the media timeline and the media sync time sent with individual media frames using the steps in Appendix C: Media Time Conversions.

Field Default value for the initial state
supports Empty
source url in state in remote-playback-start-response (required field)
loading empty
loaded nothing
error No error
epoch null
duration null
buffered-time-ranges Empty array
played-time-ranges Empty array
seekable-time-ranges Empty array
position 0.0
playbackRate 1.0
paused False
seeking False
stalled False
ended False
volume 1.0
muted False
resolution null
audio-tracks Empty array
video-tracks Empty array
text-tracks Empty array

7.2. Remote Playback API

An Open Screen Protocol agent that implements the Remote Playback API must support the control-remote-playback capability. It may support the send-streaming capability so it can send HTMLMediaElement media data through media remoting.

An an OSP agent that is a remote playback device for the Remote Playback API must support the receive-remote-playback capability. It may support the receive-streaming capability so it can receive HTMLMediaElement data through media remoting.

The same OSP agent may implement both the Remote Playback API and be a remote playback device for that API.

This is how the Remote Playback API uses the messages defined in § 7 Remote Playback Protocol:

When section 5.2.1.2 says "This list contains remote playback devices and is populated based on an implementation specific discovery mechanism" and section 5.2.1.4 says "Retrieve available remote playback devices (using an implementation specific mechanism)", the user agent may use the mDNS, QUIC, agent-info-request, and remote-playback-availability-request messages defined previously in this spec to discover receivers. The remote-playback-availability-request URLs must contain the availability sources set.

When section 5.2.4 says "Request connection of remote to device. The implementation of this step is specific to the user agent." and "Synchronize the current media element state with the remote playback state", the controller may send the remote-playback-start-request message to the receiver to start remote playback. The remote-playback-start-request URLs must contain the remote playback source. The current Remote Playback API only allows a single source, but the protocol allows for several and future versions of Remote Playback API may allow for several.

When section 5.2.4 says "The mechanism that is used to connect the user agent with the remote playback device and play the remote playback source is an implementation choice of the user agent. The connection will likely have to provide a two-way messaging abstraction capable of carrying media commands to the remote playback device and receiving media playback state in order to keep the media element state and remote playback state in sync", the controller may send remote-playback-modify-request messages to the receiver to change the remote playback state based on changes to the local media element and receive remote-playback-modify-response and remote-playback-state-event messages to change the local media element based on changes to the remote playback state.

When section 5.2.7 says "Request disconnection of remote from the device. The implementation of this step is specific to the user agent," the controller may send the remote-playback-termination-request message to the receiver.

8. Streaming Protocol

This section defines the use of the Open Screen Protocol for streaming media from a media sender to a media receiver.

If an Open Screen Protocol agent is a media sender, it must advertise the send-streaming capability. If an OSP agent is a media receiver, it must advertise the receive-streaming capability. The same agent may be a media sender and a media receiver.

8.1. Streaming Protocol Capabilities

If the advertiser is already authenticated, the requester has the ability to request additional information by sending an streaming-capabilities-request message, and receive back a streaming-capabilities-response message with the following fields:
receive-audio (required)

A list of capabilities for receiving audio. For an explanation of fields, see below.

receive-video (required)

A list of capabilities for receiving video. For an explanation of fields, see below.

The format type is used as the basis for audio and video capabilities. Formats are composed of the following fields:

codec-name (required)

A fully qualified codec string listed in the [WEBCODECS-CODEC-REGISTRY] and further specified by the codec-specific registrations referenced in that registry.

For codec-name, Open Screen agents may also accept a single-codec codec parameter as described in [RFC6381] for codecs not listed in the [WEBCODECS-CODEC-REGISTRY].

Audio capabilities are composed of the above format type, with the following additional fields:

max-audio-channels (optional)

An optional field indicating the maximum amount of audio channels the media receiver is capable of supporting. Default value is "2," meaning a stereo speaker channel setup.

min-bit-rate (optional)

An optional field indicating the minimum audio bit rate that the media receiver can handle, in kilobits per second. Default is no minimum.

Video capabilities are similarly composed of the above format type, with the following additional fields:

max-resolution (optional)

An optional field indicating the maximum video-resolution (width, height) that the media receiver is capable of processing. Default is no maximum.

max-frames-per-second (optional)

An optional field indicating the maximum frames-per-second the media receiver is capable of processing. Default is no maximum.

max-pixels-per-second (optional)

An optional field indicating the maximum pixels-per-second the media receiver is capable of processing, in pixels per second. Default is no maximum.

min-video-bit-rate (optional)

An optional field indicating the minimum video bit rate the device is capable of processing, in kilobits per second. Default is no minimum.

aspect-ratio (optional)

An optional field indicating what its ideal aspect ratio is, e.g. a 16:10 display could return this value as 1.6 to indicate its preferred content scaling. Default is none.

color-gamut (optional)

An optional field indicating the widest color space that can be decoded and rendered by the media receiver. The media sender may use this value to determine how to encode video, and should assume all narrower color spaces are supported. Valid values correspond to ColorGamut in the Media Capabilities API. The default value is "srgb".

NOTE: Support for "p3" implies support for "srgb", and support for "rec2020" implies support for "p3" and "srgb".

hdr-formats (optional)

An optional field indicating what HDR transfer functions and metadata formats can be decoded and rendered by the media receiver. Each video-hdr-format consists of two fields, transfer-function and hdr-metadata.

The transfer-function field must be a valid TransferFunction and the hdr-metadata field must be a valid HdrMetadataType, both defined in the Media Capabilities API.

If a video-hdr-format is provided with a transfer-function but no hdr-metadata, then the media receiver can render the transfer-function without any associated metadata. (This is the case, for example, with the "hlg" transfer-function.)

The media receiver should ignore duplicate entries in hdr-formats. If no hdr-formats are listed, then the media reciever cannot decode any HDR formats.

native-resolutions (optional)

An optional field indicating what video-resolutions the media receiver supports and considers to be "native," meaning that scaling is not required. The default value is none.

supports-scaling (optional)

An optional boolean field indicating whether the media receiver can scale content provided in a video-resolution not listed in the native-resolutions list (if provided) or of a different aspect ratio. The default value is true.

supports-rotation (optional)

An optional boolean field indicating whether the media receiver can receive video frames with the rotation field set. The default value is true.

8.2. Sessions

To start a streaming session, a sender may send a streaming-session-start-request message with the following fields:

streaming-session-id

Identifies the streaming session. Must be unique for the (sender, receiver) pair. Can be used later to modify or terminate a streaming session. These IDs should be treated like other IDs with regards to the state-token as specified in § 9 Requests, Responses, and Watches.

desired-stats-interval

Indicates the frequency the receiver should send stats messages to the sender.

stream-offers

Indicates the streams that the receiver can request from the sender.

Each stream offer contains the following fields:

media-stream-id

Identifies the media stream being offered. Must be unique within the streaming session. Can be used by the receiver to request the media session. These IDs should be treated like other IDs with regards to the state-token as specified in § 9 Requests, Responses, and Watches.

display-name

An optional name intended to be shown to a user, such that the receiver may allow the user to choose which media streams to receive, or if they are received automatically by the receiver, give the user some information about what the media stream is.

audio

A list of audio encodings offered. An audio encoding is a series of encoded audio frames. Encodings define fields needed by the receiver to know how to decode the encoding, such as codec. They can differ by codec and related fields, but should be different encodings of the same audio.

video

A list of video encodings offered. A video encoding is a series of encoded video frames. Encodings define fields needed by the receiver to know how to decode the encoding, such as codec and default duration. They can differ by codec and potentially other fields, but should be different encodings of the same video.

data

A list of data encodings offered. A data encoding is a series of data frames. Encodings define fields needed by the receiver to know how to interpret the encoding, such as data type and default duration. They can differ by data type and potentially other fields, but should be different encodings of the same data. (For encodings of different data, use distinct media streams, not distinct encodints with the same media stream).

Each audio encoding offered defines the following fields:

encoding-id

Identifies the audio encoding being offered. Must be unique within the media stream. These IDs should be treated like request IDs with regards to the state-token as specified in § 9 Requests, Responses, and Watches.

codec-name

The name of the codec used by the encoding, following the same rules as codec-name in § 8.1 Streaming Protocol Capabilities.

time-scale

The time scale used by all audio frames. This allows senders to make audio-frame messages smaller by not including the time scale in each one.

default-duration:

The duration of an audio frame. This allows senders to make audio-frame messages smaller by not including the duration for audio-frame messages that have the default duration.

Each video encoding offered defines the following fields:

encoding-id

Identifies the video encoding being offered. Must be unique within the media stream. These IDs should be treated like request IDs with regards to the state-token as specified in § 9 Requests, Responses, and Watches.

codec-name

The name of the codec used by the encoding, following the same rules as codec-name in § 8.1 Streaming Protocol Capabilities.

time-scale

The time scale used by all video frames. This allows senders to make video-frame messages smaller by not including the time scale in each one.

default-duration:

The default duration of a video frame. This allows senders to make video-frame messages smaller by not including the duration for video-frame messages that have the default duration.

default-rotation:

The default rotation of a video frame. This allows senders to make video-frame messages smaller by not including the rotation for video-frame messages that have the default rotation.

Each data encoding offered defines the following fields:

encoding-id

Identifies the data encoding being offered. Must be unique within the media stream. These IDs should be treated like request IDs with regards to the state-token as specified in § 9 Requests, Responses, and Watches.

data-type-name

The name of the data type used by the encoding.

time-scale

The time scale used by all data frames. This allows senders to make data-frame messages smaller by not including the time scale in each one.

default-duration:

The duration of an data frame . This allows senders to make data-frame messages smaller by not including the duration for data-frame messages that have the default duration.

After receiving a streaming-session-start-request message, a receiver should send back a streaming-session-start-response message with the following fields:

desired-stats-interval

Indicates the frequency the sender should send stats messages to the receiver.

stream-requests

Indicates which media streams the receiver would like to receive from the sender.

Each stream request contains the following fields:

media-stream-id

The ID of the stream reqeusted.

audio (optional)

The requested audio encoding, by encoding ID

video (optional)

The requested video encoding, by encoding ID. It may include a target resolution and maximum frame rate. The sender should not exceed the maximum frame rate and should attempt to send at the target bitrate, possibly exceeding it by a small amount.

data (optional)

The requested data encoding, by encoding ID

During a streaming session, the receiver can modify the requests it made for encodings by sending a streaming-session-modify-request containing a modified list of stream-requests. When the sender receives a streaming-session-modify-request, it should send back a streaming-session-modify-response indicate whether or not the application of the new request from the streaming-session-modify-request was successful.

NOTE: If the sender wishes to send an encoding other than the one selected by the receiver in a streaming-session-start-response or streaming-session-modify-request, it must terminate the current session and start a new session.

Finally, the sender may terminate the streaming session by sending a streaming-session-terminate-request command. When the receiver receives the streaming-session-terminate-request, it should send back a streaming-session-terminate-response. The receiver can terminate at any point and notify the sender by sending a streaming-session-terminate-event message.

8.3. Audio

Media senders may send audio to media receivers by sending audio-frame messages (see Appendix A: Messages) with the following keys and values. An audio frame message contains a set of encoded audio samples for a range of time. A series of encoded audio frames that share a codec and a timeline form an audio encoding.

Unlike most Open Screen Protocol messages, this one uses an array-based grouping rather than a struct-based grouping. For required fields, this allows for a more efficient use of bytes on the wire, which is important for streaming audio because the payload is typically so small and every byte of overhead is relatively large. In order to accomodate optional values in the array-based grouping, one optional field in the array is used to hold all optional values in a struct-based grouping. This will hopefully provide a good balance of efficiency and flexibility.

Rewrite the following to not depend on QUIC specifics. [Issue #343]

To allow for audio frames to be sent out of order, they should be sent in separate QUIC streams.

encoding-id

Identifies the media encoding to which this audio frame belongs. This can be used to reference fields of the encoding (from the audio-encoding-offer message) such as the codec, codec properties, time scale, and default duration. Referencing fields of the encoding through the encoding id helps to avoid sending duplicate information in every frame.

start-time

Identifies the beginning of the time range of the audio frame. The end time can be inferred from the start time and duration. The time scale is equal to the value in the time-scale field of the audio-encoding-offer message referenced by the encoding-id.

duration

If present, the duration of the audio frame. If not present, the duration is equal to the default-duration field of the audio-encoding-offer message referenced by the encoding-id. The time scale is equal to the value in the time-scale field of the audio-encoding-offer message referenced by the encoding-id.

sync-time

If present, a time used to synchronize the start time of this audio frame (and thus, this encoding) with that of other media encodings on different timelines. It may be wall clock time, but it need not be; it can be any clock chosen by the media sender.

payload

The encoded audio. The codec is equal to the codec-name field of the audio-encoding-offer message referenced by the encoding-id.

8.4. Video

Media senders may send video to media receivers by sending video-frame messages (see Appendix A: Messages) with the following keys and values. A video frame message contains an encoded video frame (an encoded image) at a specific point in time or over a specfic time range (if the duration is known). A series of encoded video frames that share a codec and a timeline form a video encoding.

Rewrite the following to not depend on QUIC specifics. [Issue #343]

To allow for video frames to be sent out of order, they may be sent in separate QUIC streams. If the encoding is a long chain of encoded video frames dependent on the previous one back until an independent frame, it may make sense to send them in a single QUIC stream starting at the indepdendent frame and ending at the last dependent frame.

encoding-id

Identifies the media encoding to which this video frame belongs. This can be used to reference fields of the encoding such as the codec, codec properties, time scale, and default rotation. Referencing fields of the encoding through the encoding id helps to avoid sending duplicate information in every frame.

sequence-number

Identifies the frame and its order in the encoding. Within an encoding, larger sequence numbers mean later start times. Within an encoding, gaps in sequence numbers mean frames are missing.

depends-on

If present, the sequence numbers of the frames this frame depends on. If a sequence numbers is negative, it is treated as a relative sequence numbers and the sequence numbers is calculated by adding it to the sequence number of this frame. If empty, this is an independent frame (a key frame). If not present, the default value is [-1].

start-time

Identifies the beginning of the time range of the video frame. The end time can be inferred from the start time and duration. The time scale is equal to the value in the time-scale field of the video-encoding-offer message referenced by the encoding-id.

duration

If present, the duration of the video frame. If not present, that means duration is unknown. The time scale is equal to the value in the time-scale field of the video-encoding-offer message referenced by the encoding-id.

sync-time

If present, a time used to synchronize the start time of this frame (and thus, this encoding) with that of other media encodings on different timelines.

rotation

If present, indicates how the frame should be rotated after decoding but before rendering. Rotation is clockwise in increments of 90 degrees. The default is equal to the default-rotation field of the video-encoding-offer message referenced by the encoding-id.

payload

The encoded video frame (encoded image). The codec is equal to the codec-name field of the video-encoding-offer message referenced by the encoding-id.

8.5. Data

Media senders may send timed data to media receivers by sending data-frame messages (see Appendix A: Messages) with the following keys and values. A data frame message contains an arbitrary payload that can be synchronized with audio and video. A series of data frames that share a data type and timeline form a data encoding.

Rewrite the following to not depend on QUIC specifics. [Issue #343]

To allow for data frames to be sent out of order, they may be sent in separate QUIC streams, but more than one data frame may be sent in one QUIC stream if that makes sense for a specific type of data.

encoding-id

Identifies the data encoding to which this data frame belongs. This can be used to reference fields of the encoding such as the type of data and time scale. Referencing fields of the encoding through the encoding id helps to avoid sending duplicate information in every frame.

sequence-number

Identifies the frame and its order in the encoding. Within an encoding, larger sequence numbers mean later start times. Within an encoding, gaps in sequence numbers mean frames are missing.

start-time

Identifies the beginning of the time range of the data frame. The end time can be inferred from the start time and duration. The time scale is equal to the value in the time-scale field of the data-encoding-offer message referenced by the encoding-id.

duration

If present, the duration of the data frame. If not present, the duration is equal to the default-duration field of the data-encoding-offer message referenced by the encoding-id. The time scale is equal to the value in the time-scale field of the data-encoding-offer message referenced by the encoding-id.

sync-time

If present, a time used to synchronize the start time of this data frame (and thus, this encoding) with that of other media encodings on different timelines.

payload

The data. The data type is equal to the data-type-name field of the the data-encoding-offer message referenced by the encoding-id.

8.6. Feedback

The media receiver can send feedback to the media sender, such as key frame requests.

A video key frame is requested by sending a video-request message with the following keys and values.

Rewrite the following to not depend on QUIC specifics. [Issue #343]

To allow for video frames to be sent out of order, they may be sent in separate QUIC streams.

encoding-id

The encoding for which the media sender should send a new key frame.

sequence-number

Gives the order in the encoding. Within an encoding, larger sequence numbers invalidate previous ones. A media sender may ignore smaller sequence numbers after a larger one has been processed. This it to prevent out-of-order requests from generating more key frames than necessary.

highest-decoded-frame-sequence-number: uint

If set, the media sender may generate a video frame dependent on the last decoded frame. If not set, the media sender must generate an indepdendent (key) frame.

8.7. Stats

During a streaming session, the sender should send stats with the streaming-session-sender-stats-event at the interval the receiver requested. It should send all of the following stats for all of the media streams it is sending. The streaming-session-sender-stats-event message contains the following fields:

streaming-session-id

The ID of the streaming session these stats apply to.

system-time

The time when the stats were calculated, using a monotonic system clock.

audio

Stats specific to audio. Stats for multiple encodings can be sent at once, but encodings need not be included if the stats haven’t changed. See below.

video

Stats specific to video. Stats for multiple encodings can be sent at once, but encodings need not be included if the stats haven’t changed. See below.

Audio encoding sender stats include the following fields:

encoding-id

The ID of the encoding for which the stats apply.

cumulative-sent-frames

The total number of frames sent.

cumulative-encode-delay

The sum of the time spent encoding frames sent.

Video encoding sender stats include the following fields:

encoding-id

The ID of the encoding for which the stats apply.

cumulative-sent-duration

The sum of all of the durations of all of the audio frames sent.

cumulative-encode-delay

The sum of the time spent encoding frames sent.

cumulative-dropped frames

The total number of frames that were not sent due to network, CPU, or other contraints.

During a streaming session, the receiver should send stats with the streaming-session-receiver-stats-event at the interval the sender requested. It should send all of the following stats for all of the media streams it is receiving.

If the receiver is using a buffer to hold frames before playing them out, it should also send the status of that buffer using the remote-buffer-status field. It can have one of three values:

A sender that receives a status of insufficient-data should increase its send rate, or switch to a more efficient encoding for future frames. A sender that receives a status of too-much-data should decrease its send rate.

If the receiver is playing frames immediately without buffering, it should always report a buffering status of enough-data.

The streaming-session-receiver-stats-event message contains the following fields:

streaming-session-id

The ID of the streaming session these stats apply to.

system-time

The time when the stats were calculated, using a monotonic system clock.

audio

Stats specific to audio. Stats for multiple encodings can be sent at once, but encodings need not be included if the stats haven’t changed. See below.

video

Stats specific to video. Stats for multiple encodings can be sent at once, but encodings need not be included if the stats haven’t changed. See below.

Audio encoding receiver stats include the following fields. If not present, that indicates the value has not changed since the last value.

encoding-id

The ID of the encoding for which the stats apply.

cumulative-decoded-frames

The total number of audio frames received and decoded.

cumulative-received-duration

The sum of all of the durations of all of the audio frames received.

cumulative-lost-duration

The sum of all of the durations of all of the audio frames detected as lost.

cumulative-buffer-delay

The sum of the time frames spent buffered between receipt and playout.

cumulative-decode-delay

The sum of the time spent decoding frames received.

remote-buffer-status : streaming-buffer-status

The status of the remote buffer for this encoding.

Video encoding receiver stats include the following fields. If not present, that indicates the value has not changed since the last value.

encoding-id

The ID of the encoding for which the stats apply.

cumulative-decoded-frames

The total number of video frames received and decoded.

cumulative-lost-frames

The total number of video frames detected as lost.

cumulative-buffer-delay

The sum of the time frames spent buffered between receipt and render.

cumulative-decode-delay

The sum of the time spent decoding frames received.

remote-buffer-status : streaming-buffer-status

The status of the remote buffer for this encoding.

9. Requests, Responses, and Watches

Multiple sub-protocols in the Open Screen Protocol have messages that act as requests, responses, watches, and events. Most requests have a request-id, and the agent that receives the request must send exactly one reponse message in return with the same request-id. A watch request has a watch-id, and the agent that receives the request may send any number of event messages in response with the same watch-id, until the watch request expires.

request-id and watch-id values are unsigned integer IDs that are assigned from a counter kept by each agent that starts at 1 and increments by 1 for each ID. Whenever an agent changes its state-token, it must reset its counter to 1.

When an agent sees that another agent has reset its state (by virtue of advertising a new state-token), it should discard any requests, responses, watches and events for that agent.

Other IDs that must be unique and would cause confusion if one side loses state, such as streaming-session-id, media-session-id, and encoding-id should be treated the same.

Rewrite the following to not depend on QUIC specifics. [Issue #343]

Note: Request and watch IDs are not tied to any particular QUIC connection between agents. If a QUIC connection is closed, an agent should not discard requests, responses, watches, or events related to the other party. This allows agents to save power by closing unused connections.

Note: Request and watch IDs are not unique across agents. An agent can combine a request ID with a unique identifier for the agent that sent it (like its certificate fingerprint) to track requests across multiple agents.

10. Protocol Extensions

Open Screen Protocol agents may exchange extension messages that are not defined by this specification. This could be used for experimentation, customization or other purposes.

To add new extension messages, extension authors must register a capability ID with a range of message type keys in a public registry. Agents may then indicate that they accept an extension by including the corresponding capability ID in the capabilities field of its agent-info message.

Capability IDs 1-999 are reserved for use by the Open Screen Protocol. Capability IDs 1000 and above are available for extensions. See Appendix B: Message Type Key Ranges for legal ranges for extension message type keys.

Note: The purpose of the public registry is to prevent conflicts between multiple extension authors' capability IDs and message type keys.

Agents must not send extension messages to another agent that has not advertised the corresponding extension capability ID.

Note: See § 4 Message delivery using CBOR for how agents handle unknown message type keys.

It is recommended that extension messages are also encoded in CBOR, to simplify implementations and provide an easier path to standardization of extension protocols. However, this is not required; agents that support non-CBOR extensions must be able to decode QUIC streams that contain a mix of CBOR messages and non-CBOR extension messages.

10.1. Protocol Extension Fields

It is legal for an agent to add additional, extension fields to any map-valued CBOR message type defined by the Open Screen Protocol. Extension fields must be optional, and the Open Screen Protocol message must make sense both with and without the field set.

Agents must not add extended fields to the audio-frame message directly. Instead, they may add them to its nested optional value.

Extension fields should use string keys to avoid conflicts with integer keys in Open Screen Protocol messages. An agent should not send extension fields to another agent unless that agent advertises an extension capability ID in its agent-info that indicates that it understands the extension fields.

11. Security and Privacy

The Open Screen Application Protocol allows two OSP agents to exchange user and application data. As such, its security and privacy considerations should be closely examined. We first evaluate the protocol itself using the W3C Security and Privacy Questionnaire. We then examine whether the security and privacy guidelines recommended by the Presentation API and the Remote Playback API are met. Finally we discuss recommended mitigations that agents can use to meet these security and privacy requirements.

11.1. Threat Models

11.1.1. Same-Origin Policy Violations

The Presentation API allows cross-origin communication between controlling pages and presentations with the consent of each origin (through their use of the API). This is similar to cross-origin communication via postMessage() with a targetOrigin of *. However, the Presentation API does not convey source origin information with each message. Therefore, the Open Screen Protocol does not convey origin information between its agents.

The presentation identifier carries some protection against unrestricted cross-origin access; but, rigorous authentication of the parties connected by a PresentationConnection must be done at the application level.

11.2. Open Screen Application Protocol Security and Privacy Considerations

11.2.1. Personally Identifiable Information & High-Value Data

The following data exchanged by the protocol can be personally identifiable and/or high value data:

  1. Presentation URLs and availability results

  2. Presentation identifiers

  3. Presentation connection IDs

  4. Presentation connection messages

  5. Remote playback URLs

  6. Remote playback commands and status messages

Presentation identifiers are considered high value data because they can be used in conjunction with a Presentation URL to connect to a running presentation.

Data provided through agent-info cannot be reasonably made confidential and should be considered public:

This data, while not considered personally identifiable, is important to protect to prevent an attacker from changing it or substituting other values.

11.2.2. Cross Origin State Considerations

Access to origin state across browsing sessions is possible through the Presentation API by reconnecting to a presentation that was started by a previous session. This scenario is addressed in Presentation API § 7.2 Cross-origin access.

Receiver availability is available cross-origin depending on the user’s network context. Exposure of this data to the Web is also discussed in Presentation API § 7.1 Personally identifiable information and Remote Playback API § 6.1 Personally identifiable information.

11.2.3. Origin Access to Other Devices

By design, the Open Screen Protocol allows access to receivers from the Web. By implementing the protocol, these devices are knowingly making themselves available to the Web and should be designed accordingly.

Below, we discuss mitigation steps to prevent malicious use of these devices.

11.2.4. Private Browsing Mode

The Open Screen Protocol itself does not distinguish between the user agent’s normal browsing and private browsing modes.

Update to reflect generic transport requirements. [Issue #342]

However, it’s recommended that user agents use separate authentication contexts and QUIC connections for normal and private browsing from the same user agent instance. This makes it more difficult for OSP agents to match activities occurring in normal and private browsing by the same user.

11.2.5. Persistent State

An agent is likely to persist the identity of agents that it has previously connected to from the agent-info message or other application messages.

However, this data is not normally exposed to the Web, only through the native UI of the user agent during the display selection or display authentication process. It can be an implementation choice whether the user agent clears or retains this data when the user clears browsing data.

11.2.6. Other Considerations

The Open Screen Protocol does not grant to the Web additional access to the following:

11.3. Presentation API Considerations

Presentation API § 7 Security and privacy considerations place these requirements on the Open Screen Protocol:

  1. Presentation URLs and presentation identifiers should remain private among the parties that are allowed to connect to a presentation, per the cross-origin access guidelines.

  2. Controllers and receivers should be notified when connections representing multiple user agent profiles have been made to a presentation, per the user interface guidelines.

  3. Messaging between controllers and receivers should be authenticated and confidential, per the guidelines for messaging between presentation connections.

Update to reflect generic transport requirements. [Issue #342]

The Open Screen Protocol addresses these considerations by:

  1. Requiring mutual authentication and a TLS-secured QUIC connection before presentation URLs, IDs, or messages are exchanged.

  2. Adding explicit messages and connection IDs for individual PresentationConnections so that agents can track the number of active connections.

11.4. Remote Playback API Considerations

The Remote Playback API § 6 Security and privacy considerations also state that messaging between controllers and receivers should also be authenticated and confidential.

Update to reflect generic transport requirements. [Issue #342]

This consideration is handled by requiring mutual authentication and a TLS-secured QUIC connection before any remote playback related messages are exchanged.

11.5. Mitigation Strategies

11.5.1. Malicious input

OSP agents should be robust against malicious input that attempts to compromise the target device by exploiting parsing vulnerabilities.

CBOR is intended to be less vulnerable to such attacks relative to alternatives like JSON and XML. Still, agents should be thoroughly tested using approaches like fuzz testing.

Where possible, OSP agents (including the content rendering components) should use defense-in-depth techniques like sandboxing to prevent vulnerabilities from gaining access to user data or leading to persistent exploits.

11.6. User Interface Considerations

This specification does not make any specific requirements of the security relevant user interfaces of OSP agents. However, before an agent has authenticated another agent, the user interface should make it clear that any agent-info or other data from that agent has not been verified by authentication.

11.6.1. Instance and Display Names

Update to reflect generic discovery requirements. [Issue #342]

Because DNS-SD Instance Names are the primary information that the user sees prior to authentication, careful presentation of these names is necessary.

Agents must treat Instance Names as unverified information, and should check that the Instance Name is a prefix of the display name received through the agent-info message after a successful QUIC connection. Once an agent has done this check, it can show the name as a verified display name.

Agents should show only complete display names to the user, instead of truncated display names from DNS-SD. A truncated display name should be verified as above before being shown in full as a verified display name.

This means there are three categories of display names that agents should be capable of handling:
  1. Truncated and unverified DNS-SD Instance Names, which should not be shown to the user.
  2. Complete but unverified DNS-SD Instance Names, which can be shown as unverified prior to authentication.
  3. Verified display names.

Appendix A: Messages

The following messages are defined using the Concise Data Definition Language syntax. When integer keys are used, a comment is appended to the line to indicate the name of the field. Object definitions in this specification have this unusual syntax to reduce the number of bytes-on-the-wire, while maintaining a human-readable name for each key. Integer keys are used instead of object arrays to allow for easy indexing of optional fields.

Each root message (one that can be put into a QUIC stream without being enclosed by another message) has a comment indicating the message type key.

Smaller numbers should be reserved for message that will be sent more frequently or are very small or both and larger numbers should be reserved for messages that are infrequently sent or large or both because smaller type keys encode on the wire smaller.

; type key 10
agent-info-request = {
  request
}

; type key 11
agent-info-response = {
  response
  1: agent-info ; agent-info
}

; type key 120
agent-info-event = {
  0: agent-info ; agent-info
}

agent-capability = &(
  receive-audio: 1
  receive-video: 2
  receive-presentation: 3
  control-presentation: 4
  receive-remote-playback: 5
  control-remote-playback: 6
  receive-streaming: 7
  send-streaming: 8
)

agent-info = {
  0: text ; display-name
  1: text ; model-name
  2: [* agent-capability] ; capabilities
  3: text ; state-token
  4: [* text] ; locales
}

; type key 12
agent-status-request = {
  request
  ? 1: status ; status
}

; type key 13
agent-status-response = {
  response
  ? 1: status ; status
}

status = {
  0: text ; status
}

request = (
  0: request-id ; request-id
)

response = (
  0: request-id ; request-id
)

request-id = uint

microseconds = uint

epoch-time = int

media-timeline = float64

media-timeline-range = [
  start: media-timeline
  end: media-timeline
]

watch-id = uint

; type key 14
presentation-url-availability-request = {
  request
  1: [1* text] ; urls
  2: microseconds ; watch-duration
  3: watch-id ; watch-id
}

; type key 15
presentation-url-availability-response = {
  response
  1: [1* url-availability] ; url-availabilities
}

; type key 103
presentation-url-availability-event = {
  0: watch-id ; watch-id
  1: [1* url-availability] ; url-availabilities
}

; idea: use HTTP response codes?
url-availability = &(
  available: 0
  unavailable: 1
  invalid: 10
)

; type key 104
presentation-start-request = {
  request
  1: text ; presentation-id
  2: text ; url
  3: [* http-header] ; headers
}

http-header = [
  key: text
  value: text
]

; type key 105
presentation-start-response = {
  response
  1: &result ; result
  2: uint ; connection-id
  ? 3: uint ; http-response-code
}

presentation-termination-source = &(
  controller: 1
  receiver: 2
  unknown: 255
)

presentation-termination-reason = &(
  application-request: 1
  user-request: 2
  receiver-replaced-presentation: 20
  receiver-idle-too-long: 30
  receiver-attempted-to-navigate: 31
  receiver-powering-down: 100
  receiver-error: 101
  unknown: 255
)

; type key 106
presentation-termination-request = {
  request
  1: text ; presentation-id
  2: presentation-termination-reason ; reason
}

; type key 107
presentation-termination-response = {
  response
  1: &result ; result
}

; type key 108
presentation-termination-event = {
  0: text ; presentation-id
  1: presentation-termination-source ; source
  2: presentation-termination-reason ; reason
}

; type key 109
presentation-connection-open-request = {
  request
  1: text ; presentation-id
  2: text ; url
}

; type key 110
presentation-connection-open-response = {
  response
  1: &result ; result
  2: uint ; connection-id
  3: uint ; connection-count
}

; type key 113
presentation-connection-close-event = {
  0: uint ; connection-id
  1: &(
    close-method-called: 1
    connection-object-discarded: 10
    unrecoverable-error-while-sending-or-receiving-message: 100
  ) ; reason
  ? 2: text ; error-message
  3: uint ; connection-count
}

; type key 121
presentation-change-event = {
  0: text ; presentation-id
  1: uint ; connection-count
}

; type key 16
presentation-connection-message = {
  0: uint ; connection-id
  1: bytes / text ; message
}

result = (
  success: 1
  invalid-url: 10
  invalid-presentation-id: 11
  timeout: 100
  transient-error: 101
  permanent-error: 102
  terminating: 103
  unknown-error: 199
)

; type key 17
remote-playback-availability-request = {
  request
  1: [* remote-playback-source] ; sources
  2: microseconds ; watch-duration
  3: watch-id ; watch-id
}

; type key 18
remote-playback-availability-response = {
  response
  1: [* url-availability] ; url-availabilities
}

; type key 114
remote-playback-availability-event = {
  0: watch-id ; watch-id
  1: [* url-availability] ; url-availabilities
}

; type key 115
remote-playback-start-request = {
  request
  1: remote-playback-id ; remote-playback-id
  ? 2: [* remote-playback-source] ; sources
  ? 3: [* text] ; text-track-urls
  ? 4: [* http-header] ; headers
  ? 5: remote-playback-controls ; controls
  ? 6: {streaming-session-start-request-params} ; remoting
}

remote-playback-source = {
  0: text; url
  1: text; extended-mime-type
}

; type key 116
remote-playback-start-response = {
  response
  ? 1: remote-playback-state ; state
  ? 2: {streaming-session-start-response-params} ; remoting
}

; type key 117
remote-playback-termination-request = {
  request
  1: remote-playback-id ; remote-playback-id
  2: &(
    user-terminated-via-controller: 11
    unknown: 255
  ) ; reason
}

; type key 118
remote-playback-termination-response = {
  response
  1: &result ; result
}

; type key 119
remote-playback-termination-event = {
  0: remote-playback-id ; remote-playback-id
  1: &(
    receiver-called-terminate: 1
    user-terminated-via-receiver: 2
    receiver-idle-too-long: 30
    receiver-powering-down: 100
    receiver-crashed: 101
    unknown: 255
  ) ; reason
}

; type key 19
remote-playback-modify-request = {
  request
  1: remote-playback-id ; remote-playback-id
  2: remote-playback-controls ; controls
}

; type key 20
remote-playback-modify-response = {
  response
  1: &result ; result
  ? 2: remote-playback-state ; state
}

; type key 21
remote-playback-state-event = {
  0: remote-playback-id ; remote-playback-id
  1: remote-playback-state ; state
}

remote-playback-id = uint

remote-playback-controls = {
  ? 0: remote-playback-source ; source
  ? 1: &(
    none: 0
    metadata: 1
    auto: 2
  ) ; preload
  ? 2: bool ; loop
  ? 3: bool ; paused
  ? 4: bool ; muted
  ? 5: float64 ; volume
  ? 6: media-timeline ; seek
  ? 7: media-timeline ; fast-seek
  ? 8: float64 ; playback-rate
  ? 9: text ; poster
  ? 10: [* text] ; enabled-audio-track-ids
  ? 11: text ; selected-video-track-id
  ? 12: [* added-text-track] ; added-text-tracks
  ? 13: [* changed-text-track] ; changed-text-tracks
}

remote-playback-state = {
  ? 0: {
    0: bool ; rate
    1: bool ; preload
    2: bool ; poster
    3: bool ; added-text-track
    4: bool ; added-cues
  } ; supports
  ? 1: remote-playback-source ; source
  ? 2: &(
    empty: 0
    idle: 1
    loading: 2
    no-source: 3
  ) ; loading
  ? 3: &(
    nothing: 0
    metadata: 1
    current: 2
    future: 3
    enough: 4
  ) ; loaded
  ? 4: media-error ; error
  ? 5: epoch-time / null ; epoch
  ? 6: media-timeline / null ; duration
  ? 7: [* media-timeline-range] ; buffered-time-ranges
  ? 8: [* media-timeline-range] ; seekable-time-ranges
  ? 9: [* media-timeline-range] ; played-time-ranges
  ? 10: media-timeline ; position
  ? 11: float64 ; playbackRate
  ? 12: bool ; paused
  ? 13: bool ; seeking
  ? 14: bool ; stalled
  ? 15: bool ; ended
  ? 16: float64 ; volume
  ? 17: bool ; muted
  ? 18: video-resolution / null ; resolution
  ? 19: [* audio-track-state] ; audio-tracks
  ? 20: [* video-track-state] ; video-tracks
  ? 21: [* text-track-state] ; text-tracks
}

added-text-track = {
  0: &(
    subtitles: 1
    captions: 2
    descriptions: 3
    chapters: 4
    metadata: 5
  ) ; kind
  ? 1: text ; label
  ? 2: text ; language
}

changed-text-track = {
  0: text ; id
  1: text-track-mode ; mode
  ? 2: [* text-track-cue] ; added-cues
  ? 3: [* text] ; removed-cue-ids
}

text-track-mode = &(
  disabled: 1
  showing: 2
  hidden: 3
)

text-track-cue = {
  0: text ; id
  1: media-timeline-range ; range
  2: text ; text
}

media-sync-time = [
  value: uint
  scale: uint
]

media-error = [
  code: &(
    user-aborted: 1
    network-error: 2
    decode-error: 3
    source-not-supported: 4
    unknown-error: 5
  )
  message: text
]

track-state = (
  0: text ; id
  1: text ; label
  2: text ; language
)

audio-track-state = {
  track-state
  3: bool ; enabled
}

video-track-state = {
  track-state
  3: bool ; selected
}

text-track-state = {
  track-state
  3: text-track-mode ; mode
}

; type key 22
audio-frame = [
  encoding-id: uint
  start-time: uint
  payload: bytes
  ? optional: {
    ? 0: uint ; duration
    ? 1: media-sync-time ; sync-time
  }
]

; type key 23
video-frame = {
  0: uint ; encoding-id
  1: uint ; sequence-number
  ? 2: [* int] ; depends-on
  3: uint ; start-time
  ? 4: uint ; duration
  5: bytes ; payload
  ? 6: uint ; video-rotation
  ? 7: media-sync-time ; sync-time
}

; type key 24
data-frame = {
  0: uint ; encoding-id
  ? 1: uint ; sequence-number
  ? 2: uint ; start-time
  ? 3: uint ; duration
  4: bytes ; payload
  ? 5: media-sync-time ; sync-time
}

ratio = [
  antecedent: uint
  consequent: uint
]

; type key 122
streaming-capabilities-request = {
  request
}

; type key 123
streaming-capabilities-response = {
  response
  1: streaming-capabilities ; streaming-capabilities
}

streaming-capabilities = {
  0: [* receive-audio-capability] ; receive-audio
  1: [* receive-video-capability] ; receive-video
  2: [* receive-data-capability] ; receive-data
}

format = {
  0: text ; codec-name
}

receive-audio-capability = {
  0: format ; codec
  ? 1: uint ; max-audio-channels
  ? 2: uint ; min-bit-rate
}

video-resolution = {
  0: uint ; height
  1: uint ; width
}

video-hdr-format = {
  0: text; transfer-function
  ? 1: text; hdr-metadata
}

receive-video-capability = {
  0: format ; codec
  ? 1: video-resolution ; max-resolution
  ? 2: ratio ; max-frames-per-second
  ? 3: uint ; max-pixels-per-second
  ? 4: uint ; min-bit-rate
  ? 5: ratio ; aspect-ratio
  ? 6: text ; color-gamut
  ? 7: [* video-resolution] ; native-resolutions
  ? 8: bool ; supports-scaling
  ? 9: bool ; supports-rotation
  ? 10: [* video-hdr-format] ; hdr-formats
}

receive-data-capability = {
  0: format ; data-type
}

; type key 124
streaming-session-start-request = {
  request
  streaming-session-start-request-params
}

; type key 125
streaming-session-start-response = {
  response
  streaming-session-start-response-params
}

; A separate group so it can be used in remote-playback-start-request
streaming-session-start-request-params = (
  1: uint ; streaming-session-id
  2: [* media-stream-offer] ;  stream-offers
  3: microseconds ; desired-stats-interval
)

; type key 126
streaming-session-modify-request = {
  request
  streaming-session-modify-request-params
}

; A separate group so it can be used in remote-playback-start-response
streaming-session-start-response-params = (
  1: &result ; result
  2: [* media-stream-request] ; stream-requests
  3: microseconds ; desired-stats-interval  
)

streaming-session-modify-request-params = (
  1: uint ; streaming-session-id
  2: [* media-stream-request] ; stream-requests
)

; type key 127
streaming-session-modify-response = {
  response
  1: &result ; result
}

; type key 128
streaming-session-terminate-request = {
  request
  1: uint ; streaming-session-id
}

; type key 129
streaming-session-terminate-response = {
  response
}

; type key 130
streaming-session-terminate-event = {
  0: uint ; streaming-session-id
}

media-stream-offer = {
  0: uint ; media-stream-id
  ? 1: text ; display-name
  ? 2: [1* audio-encoding-offer] ; audio
  ? 3: [1* video-encoding-offer] ; video
  ? 4: [1* data-encoding-offer] ; data
}

media-stream-request = {
  0: uint ; media-stream-id
  ? 1: audio-encoding-request ; audio
  ? 2: video-encoding-request ; video
  ? 3: data-encoding-request ; data
}

audio-encoding-offer = {
  0: uint ; encoding-id
  1: text ; codec-name
  2: uint ; time-scale
  ? 3: uint ; default-duration
}

video-encoding-offer = {
  0: uint ; encoding-id
  1: text ; codec-name
  2: uint ; time-scale
  ? 3: uint ; default-duration
  ? 4: video-rotation ; default-rotation
}

data-encoding-offer = {
  0: uint ; encoding-id
  1: text ; data-type-name
  2: uint ; time-scale
  ? 3: uint ; default-duration
}

audio-encoding-request = {
  0: uint ; encoding-id
}

video-encoding-request = {
  0: uint ; encoding-id
  ? 1: video-resolution ; target-resolution
  ? 2: ratio ; max-frames-per-second
}

data-encoding-request = {
  0: uint ; encoding-id
}

video-rotation = &(
  ; Degrees clockwise
  video-rotation-0: 0
  video-rotation-90: 1
  video-rotation-180: 2
  video-rotation-270: 3
)

sender-stats-audio = {
  0: uint ; encoding-id
  ? 1: uint ; cumulative-sent-frames
  ? 2: microseconds ; cumulative-encode-delay
}

sender-stats-video = {
  0: uint ; encoding-id
  ? 1: microseconds ; cumulative-sent-duration
  ? 2: microseconds ; cumulative-encode-delay
  ? 3: uint ; cumulative-dropped-frames
}

; type key 131
streaming-session-sender-stats-event = {
  0: uint; streaming-session-id
  1: microseconds ; system-time
  ? 2: [1* sender-stats-audio] ; audio
  ? 3: [1* sender-stats-video] ; video
}

streaming-buffer-status = &(
  enough-data: 0
  insufficient-data: 1
  too-much-data: 2
)

receiver-stats-audio = {
  0: uint ; encoding-id
  ? 1: microseconds ; cumulative-received-duration
  ? 2: microseconds ; cumulative-lost-duration
  ? 3: microseconds ; cumulative-buffer-delay
  ? 4: microseconds ; cumulative-decode-delay
  ? 5: streaming-buffer-status ; remote-buffer-status
}

receiver-stats-video = {
  0: uint ; encoding-id
  ? 1: uint ; cumulative-decoded-frames
  ? 2: uint ; cumulative-lost-frames
  ? 3: microseconds ; cumulative-buffer-delay
  ? 4: microseconds ; cumulative-decode-delay
  ? 5: streaming-buffer-status ; remote-buffer-status
}

; type key 132
streaming-session-receiver-stats-event = {
  0: uint; streaming-session-id
  1: microseconds ; system-time
  ? 2: [1* receiver-stats-audio] ; audio
  ? 3: [1* receiver-stats-video] ; video
}

Appendix B: Message Type Key Ranges

The following appendix describes how the range of message type keys is divided. Legal values are 1 to 264.

Each type key is encoded as a variable-length integer on the wire of 1, 2, 4 or 8 bytes. For each wire byte size, 1/4 to 1/2 of the keys are available for extensions.

Bytes Range Purpose
1 1 - 48 Open Screen Protocol
1 49 - 63 Available for extensions
2 64 - 8,192 Open Screen Protocol
2 8,193 - 16,383 Available for extensions
4 16,384 - 229 Reserved for future use
4 229+1 - 230-1 Available for extensions
8 >= 230 Reserved for future use

Appendix C: Media Time Conversions

To convert between a media synchronization timestamp for a given audio or video frame and a media timeline value, the following formula can be used:

media-timeline-value = media-zero-time + (value / scale)

Where:

In the event of an overflow in media-timeline-value, the maximum representable value should be used.

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Conformant Algorithms

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps can be implemented in any manner, so long as the end result is equivalent. In particular, the algorithms defined in this specification are intended to be easy to understand and are not intended to be performant. Implementers are encouraged to optimize.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[MEDIA-CAPABILITIES]
Jean-Yves Avenard; Mark Foltz. Media Capabilities. URL: https://w3c.github.io/media-capabilities/
[PRESENTATION-API]
Mark Foltz. Presentation API. URL: https://w3c.github.io/presentation-api/
[REMOTE-PLAYBACK]
Mark Foltz. Remote Playback API. URL: https://w3c.github.io/remote-playback/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119
[RFC5646]
A. Phillips, Ed.; M. Davis, Ed.. Tags for Identifying Languages. September 2009. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc5646
[RFC6381]
R. Gellens; D. Singer; P. Frojdh. The 'Codecs' and 'Profiles' Parameters for "Bucket" Media Types. August 2011. Proposed Standard. URL: https://www.rfc-editor.org/rfc/rfc6381
[RFC8610]
H. Birkholz; C. Vigano; C. Bormann. Concise Data Definition Language (CDDL): A Notational Convention to Express Concise Binary Object Representation (CBOR) and JSON Data Structures. June 2019. Proposed Standard. URL: https://www.rfc-editor.org/rfc/rfc8610
[RFC8949]
C. Bormann; P. Hoffman. Concise Binary Object Representation (CBOR). December 2020. Internet Standard. URL: https://www.rfc-editor.org/rfc/rfc8949
[RFC9000]
J. Iyengar, Ed.; M. Thomson, Ed.. QUIC: A UDP-Based Multiplexed and Secure Transport. May 2021. Proposed Standard. URL: https://www.rfc-editor.org/rfc/rfc9000

Informative References

[IEEE-754]
IEEE Standard for Floating-Point Arithmetic. 22 July 2019. URL: https://ieeexplore.ieee.org/document/8766229
[ITU-R-BT.1359-1]
ITU-R BT.1359-1, Relative Timing of Sound and Vision for Broadcasting. Recommendation. URL: https://www.itu.int/dms_pubrec/itu-r/rec/bt/R-REC-BT.1359-1-199811-I!!PDF-E.pdf
[RFC4122]
P. Leach; M. Mealling; R. Salz. A Universally Unique IDentifier (UUID) URN Namespace. July 2005. Proposed Standard. URL: https://www.rfc-editor.org/rfc/rfc4122
[SECURITY-PRIVACY-QUESTIONNAIRE]
Theresa O'Connor; Peter Snyder. Self-Review Questionnaire: Security and Privacy. URL: https://w3ctag.github.io/security-questionnaire/
[WEBCODECS-CODEC-REGISTRY]
Chris Cunningham; Paul Adenot; Bernard Aboba. WebCodecs Codec Registry. URL: https://w3c.github.io/webcodecs/codec_registry.html

Issues Index

Define Discovery Requirements [Issue #342]
Define Transport Layer Requirements [Issue #342]
Rewrite to not depend on QUIC, or move agent-status messges to the network spec [Issue #343]
Rewrite to not depend on QUIC [Issue #343]
Add power saving as a transport requirement and remove the following. [Issue #342]
Rewrite the following NOTE as a transport requirement for any message transport. [Issue #342]
Add power saving as a transport requirement and remove the following. [Issue #342]
Rewrite the following to not depend on QUIC specifics. [Issue #343]
Rewrite the following to not depend on QUIC specifics. [Issue #343]
Rewrite the following to not depend on QUIC specifics. [Issue #343]
Rewrite the following to not depend on QUIC specifics. [Issue #343]
Rewrite the following to not depend on QUIC specifics. [Issue #343]
Update to reflect generic transport requirements. [Issue #342]
Update to reflect generic transport requirements. [Issue #342]
Update to reflect generic transport requirements. [Issue #342]
Update to reflect generic discovery requirements. [Issue #342]