Open Screen Protocol

1. Introduction

The Open Screen Protocol connects browsers to devices capable of rendering Web content for a shared audience. Typically, these are devices like Internet-connected TVs, HDMI dongles, and smart speakers.

This spec defines a suite of network protocols that enable two user agents to implement the Presentation API and Remote Playback API in an interoperable fashion. This means that a Web developer can expect these APIs to work as intended when connecting two devices from independent implementations of the Open Screen Protocol.

The Open Screen Protocol is a specific implementation of these two APIs, meaning that it does not handle all possible ways that browsers and presentation displays could support these APIs. The Open Screen Protocol specifically supports browsers and displays that are connected via the same local area network. It allows a browser to present a URL, initiate remote playback of an HTML media element, and stream media data to another device.

The Open Screen Protocol is intended to be extensible, so that additional capabilities can be added over time. This may include additions to existing Web APIs or new Web APIs.

The accompanying explainer provides more background on the protocol.

1.1. Terminology

An Open Screen Protocol agent (or OSP agent) is any implementation of this protocol (browser, display, speaker, or other software).

Some OSP agents support the Presentation API. The API allows a controlling user agent to initiate presentation of Web content on another device. We call this agent a controller for short. A receiving user agent is responsible for rendering the Web content, which we will call a receiver for short. The the Web content itself is called a presentation.

Some OSP agents also support the Remote Playback API. That API allows an agent to render a media element on a remote playback device. In this document, we refer to it as a receiver because it is shorter and keeps terminology consistent between presentations and remote playbacks. Similarly, we use the term controller to refer to the agent that starts, terminates, and controls the remote playback.

For media streaming, we refer to an OSP agent that sends a media stream as a media sender and an agent that receives a media stream as a media receiver. Note that an agent can be both a media sender and a media receiver.

For additional terms and idioms specific to the Presentation API or Remote Playback API, please consult the respective specifications.

2. Requirements

2.1. General Requirements

An Open Screen Protocol agent must be able to discover the presence of another OSP agent connected to the same IPv4 or IPv6 subnet and reachable by IP multicast.
An OSP agent must be able to obtain the IPv4 or IPv6 address of the agent, a display name for the agent, and an IP port number for establishing a network transport to the agent.

2.2. Presentation API Requirements

A controller must be able to determine if a receiver is reasonably capable of rendering a specific presentation request URL.
A controller must be able to start a new presentation on a receiver given a presentation request URL and presentation identifier.
A controller must be able to create a new PresentationConnection to an existing presentation on the receiver, given its presentation request URL and presentation identifier.
It must be possible to close a PresentationConnection between a controller and a presentation, and signal both parties with the reason why the connection was closed.
Multiple controllers must be able to connect to a single presentation simultaneously.
Messages sent by the controller must be delivered to the presentation (or vice versa) in a reliable and in-order fashion.
If a message cannot be delivered, then the controller must be able to signal the receiver (or vice versa) that the connection should be closed with reason error.
The controller and presentation must be able to send and receive DOMString messages (represented as string type in ECMAScript).
The controller and presentation must be able to send and receive binary messages (represented as Blob objects in HTML5, or ArrayBuffer or ArrayBufferView types in ECMAScript).
The controller must be able to signal to the receiver to terminate a presentation, given its presentation request URL and presentation identifier.
The receiver must be able to signal all connected controllers when a presentation is terminated.

2.3. Remote Playback API Requirements

A controller must be able to find out whether there is at least one compatible receiver for a given HTMLMediaElement, both instantaneously and continuously.
A controller must be able to to initiate remote playback of an HTMLMediaElement to a compatible receiver.
The controller must be able send media sources as URLs and text tracks from an HTMLMediaElement to a compatible receiver.
The controller must be able send media data from an HTMLMediaElement to a compatible receiver.
During remote playback, the controller and the remote playback device must be able to synchronize the media element state of the HTMLMediaElement.
During remote playback, either the controller or the receiver must be able to disconnect from the other party.
The controller should be able to pass locale and text direction information to the receiver to assist in rendering text during remote playback.

2.4. Non-Functional Requirements

It should be possible to implement an OSP agent using modest hardware requirements, similar to what is found in a low end smartphone, smart TV or streaming device. See the Device Specifications document for agent hardware specifications.
The discovery and connection protocols should minimize power consumption, especially on a listening agent which is likely to be battery powered.
The protocol should minimize the amount of information provided to a passive network observer about the identity of the user or activities on the agent, including presentations, remote playbacks, or the content of media streams.
The protocol should prevent active network attackers from impersonating a display and observing or altering data intended for the controller or receiver.
A listening agent should be able to discover quickly when an advertising agent becomes available or unavailable (i.e., when it connects or disconnects from the network).
Agents should present sensible information to the user when a protocol operation fails. For example, if a controller is unable to start a presentation, it should be possible to report in the controller interface if it was a network error, authentication error, or the presentation content failed to load.
Agents should be able to remember that a user authenticated another agent. This means it is not required for the user to intervene and re-authenticate each time an agent wants to connect to an agent that the user has already authenticated.
Message latency between agents should be minimized to permit interactive use. For example, it should be comfortable to type in a form in one agent and have the text appear in the presentation in real time. Real-time latency for gaming or mouse use is ideal, but not a requirement.
The controller initiating a presentation or remote playback should communicate its preferred locale to the receiver, so it can render the content in that locale.
It should be possible to extend the control protocol (above the discovery and transport levels) with optional features not defined explicitly by the specification, to facilitate experimentation and enhancement of the base APIs.

3. Discovery with mDNS

Open Screen Protocol agents discover one another by advertising and listening for information identifying themselves along with an IP service endpoint. Agent advertisement and discovery through DNS-SD and mDNS is defined by this specification and is mandatory to implement by all agents. However, agents are free to implement additional discovery mechanisms, such as querying for the same DNS-SD records via unicast DNS.

OSP agents must use the DNS-SD Service Name _openscreen._udp.

An advertising agent is one that responds to mDNS queries for _openscreen._udp.local. Such an agent should have a display name (a non-empty string) that is a human readable description of the presentation display, e.g. "Living Room TV."

A listening agent is one that sends mDNS queries for _openscreen._udp.local. Listening agents may have a display name.

Advertising agents must use a DNS-SD Instance Name that is a prefix of the agent’s display name. If the Instance Name is not the complete display name, it must be terminated by a null (\000) character, so that a listening agent knows it has been truncated.

Advertising agents must follow the mDNS conflict resolution procedure, to prevent multiple advertising agents from using the same DNS-SD Instance Name.

Agents should be careful when displaying Instance Names to users; see § 13.6.1 Instance and Display Names for guidelines on Instance Name display.

Advertising agents must include DNS TXT records with the following keys and values:

fp: The agent fingerprint of the advertising agent. The steps to compute the agent fingerprint are defined below.
mv: An unsigned integer value that indicates that metadata has changed. The advertising agent must update it to a greater value. This signals to the listening agent that it should connect to the advertising agent to discover updated metadata. The value should be encoded as a variable-length integer.
at: An alphanumeric, unguessable token consisting of characters from the set [A-Za-z0-9+/].

Note: at prevents off-LAN parties from attempting authentication; see § 13.5.3 Remote active network attackers. at should have at least 32 bits of true entropy to make brute force attacks impractical.

NOTE: If an OSP agent suspends its network connectivity (e.g. for power saving reasons) it should attempt to retain cached and valid mDNS records so that discovery state is preserved when the network connection is resumed.

Future extensions to this QUIC-based protocol can use the same metadata discovery process to indicate support for those extensions, through a capabilities mechanism to be determined. If a future version of the Open Screen Protocol uses mDNS but breaks compatibility with the metadata discovery process, it should change the DNS-SD service name to a new value, indicating a new mechanism for metadata discovery.

3.1. Computing the Agent Fingerprint

The agent fingerprint of an agent is computed by following these steps:

Compute the SKPI Fingerprint of the agent certificate according to [RFC7469] using SHA-256 as the hash algorithm.
base64 encode the result of Step 1 according to [RFC4648].

Note: The resulting string will be 44 bytes in length.

4. Transport and metadata discovery with QUIC

If a listening agent wants to connect to or learn further metadata about an advertising agent, it initiates a QUIC connection to the IP and port from its SRV record. Prior to authentication, a message may be exchanged (such as further metadata), but such info should be treated as unverified (such as indicating to a user that a display name of an unauthenticated agent is unverified).

The connection IDs used both by agents should be zero length. If zero length connection IDs are chosen, agents are restricted from changing IP or port without establishing a new QUIC connection. In such cases, agents must establish a new QUIC connection in order to change IP or port.

4.1. TLS 1.3

When an OSP Agent makes a QUIC connection to another agent, it must use TLS 1.3 to secure the connection. TLS 1.3 should be used with the following application-specific parameters to indicate that the connection will be used to communicate with a specific OSP Agent using OSP. An OSP Agent may refuse incoming connections that lack these parameters.

The ALPN used must be "osp".
The server_name extension must be set to the following host_name: <fp>._openscreen._udp.
- <fp> must be substituted with the agent fingerprint as used in mDNS TXT.

An OSP Agent must not send TLS early data.

4.2. Agent Certificates

Each OSP Agent must generate an X.509 v3 agent certificate containing a public key to be used with the TLS 1.3 certificate exchange. Both advertising agents and listening agents must use the agent certificate in TLS 1.3 Certificate messages when making a QUIC connection.

The agent certificate must have the following characteristics:

256-bit ECDSA public key.
Self-signed.
Support the ecdsa_secp256r1_sha256 signature scheme as defined in TLS 1.3.
- The AlgorithmIdentifier values are as defined in [RFC5480] (for public keys) and [RFC5758] (for signature schemes).
- [X690] specifies the Distinguished Encoding Rules (DER) representation used to encode the identifiers.
Valid for signing.

Let the certificate serial number be the result of the following steps:

If the agent has never generated an agent certificate:
1. Let the certificate serial number base be a 32-bit pseudorandom integer value.
2. Let the certificate serial number counter be a 32-bit unsigned integer, initially set to 0.
Generate a 64-bit value as follows:
1. Increment the certificate serial number counter by one.
2. Assign the upper 32 bits to the certificate serial number base.
3. Assign the lower 32 bits to the certificate serial number counter.

The following X.509 v3 fields are to be set as follows:

Field	Value
Version Number	3
Serial Number	The certificate serial number.
Public Key `AlgorithmIdentifier`	ECC OID: `1.2.840.10045.2.1` ECDSA 256 OID: `1.2.840.10045.3.1.7` DER representation: `301306072a8648ce3d020106082a8648ce3d030107`
Signature `AlgorithmIdentifier`	OID: `1.2.840.10045.4.3.2` DER representation: `300a06082a8648ce3d040302`
Issuer Name	CN = The `model-name` from the `agent-info` message. O = See note. L = See note. ST = See note. C = See note.
Subject Name	CN = `<fp>`._openscreen._udp O = See note.
Subject Public Key Algorithm	Elliptic Curve Public Key
Certificate Key usage	digitalSignature

Mandatory fields not mentioned above should be set according to [RFC5280].

The value <sn> above should be substituted with the certificate serial number.

Note: The OSP agent may use the implementer or device model name as the value for the O key for user interface and debugging purposes. It may use the agent implementer’s or device manufacturer’s location as the value for the location keys (L, ST, and C) for user interface and debugging purposes.

If an OSP agent sees an agent certificate it has not yet verified through § 6 Authentication, it must treat that agent as unverified and initiate authentication with that agent before allowing additional messages to be exchanged with that agent (apart from the messages described in § 4.3 Metadata Discovery).

If an OSP agent sees a valid agent certificate it has verified through authentication, it is not required to initiate authentication with that agent before sending further messages.

4.3. Metadata Discovery

To learn further metadata, an agent may send an agent-info-request message and receive back an agent-info-response message. Any agent may send this request at any time to learn about the state and capabilities of another device, which are described by the agent-info message in the agent-info-response.

If an agent changes any information in its agent-info message, it should send an agent-info-event message to all other connected agents with the new agent-info (without waiting for an agent-info-request).

The agent-info message contains the following fields:

display-name (required): The display name of the agent, intended to be displayed to a user by the requester. The requester should indicate through the UI if the responder is not authenticated or if the display name changes.
model-name (optional): If the agent is a hardware device, the model name of the device. This is used mainly for debugging purposes, but may be displayed to the user of the requesting agent.
capabilities (required): The control protocols, roles, and media types the agent supports. Presence indicates a capability and absence indicates lack of a capability. Capabilities should should affect how an agent is presented to a user, such as drawing a different icon depending on the whether it receives audio, video or both.
state-token (required): A random alphanumeric value consisting of 8 characters in the range [0-9A-Za-z]. This value is set before the agent makes its first connection and must be set to a new value when the agent is reset or otherwise lost all of its state related to this protocol.
locales (required): The agent’s preferred locales for display of localized content, in the order of user preference. Each entry is an RFC5646 language tag.

The various capabilities have the following meanings:

receive-audio: The agent can render audio via the other protocols it supports. Those other protocols may report more specific capabilities, such as support for certain audio codecs in the streaming protocol.
receive-video: The agent can receive video via the other protocols it supports. Those other protocols may report more specific capabilities, such as support for certain video codecs in the streaming protocol.
receive-presentation: The agent can receive presentations using the presentation protocol.
control-presentation: The agent can control presentations using the presentation protocol.
receive-remote-playback: The agent can receive remote playback using the remote playback protocol.
control-remote-playback: The agent can control remote playback using the remote playback protocol.
receive-streaming: The agent can receiving streaming using the streaming protocol.
send-streaming: The agent can send streaming using the streaming protocol.

NOTE: See the Capabilities Registry for a list of all known capabilities (both defined by this specification, and through § 12 Protocol Extensions).

If a listening agent wishes to receive messages from an advertising agent or an advertising agent wishes to send messages to a listening agent, it may wish to keep the QUIC connection alive. Once neither side needs to keep the connection alive for the purposes of sending or receiving messages, the connection should be closed with an error code of 5139. In order to keep a QUIC connection alive, an agent may send an agent-status-request message, and any agent that receives an agent-status-request message should send an agent-status-response message. Such messages should be sent more frequently than the QUIC idle_timeout transport parameter (see Transport Parameter Encoding in QUIC) and QUIC PING frames should not be used. An idle_timeout transport parameter of 25 seconds is recommended. The agent should behave as though a timer less than the idle_timeout were reset every time a message is sent on a QUIC stream. If the timer expires, a agent-status-request message should be sent.

If a listening agent wishes to send messages to an advertising agent, the listening agent can connect to the advertising agent "on demand"; it does not need to keep the connection alive.

If an OSP agent suspends its network connectivity (e.g. for power saving reasons), it should attempt to resume QUIC connections to the OSP agents to which it was previously connected once network connectivity is restored. Once reconnected, it should send agent-status-request messages to those agents.

The agent-info and agent-status-response messages may be extended to include additional information not defined in this spec, as described in § 12.1 Protocol Extension Fields.

5. Messages delivery using CBOR and QUIC streams

Messages are serialized using CBOR. To send a group of messages in order, that group of messages must be sent in one QUIC stream. Independent groups of messages (with no ordering dependency across groups) should be sent in different QUIC streams. In order to put multiple CBOR-serialized messages into the the same QUIC stream, the following is used.

NOTE: Open Screen Agents should configure QUIC stream limits (MAX_STREAMS) to not hinder application performance, keeping in mind the number of concurrent streams that may be necessary for audio, video, or data streaming use cases.

For each message, the OSP agent must write into a unidirectional QUIC stream the following:

A type key representing the type of the message, encoded as a variable-length integer (see Appendix A: Messages for type keys)
The message encoded as CBOR.

If an agent receives a message for which it does not recognize a type key, it must close the QUIC connection with an application error code of 404 and should include the unknown type key in the reason phrase of the CONNECTION_CLOSE frame.

Variable-length integers are encoded in the Variable-Length Integer Encoding used by QUIC.

Many messages are requests and responses, so a common format is defined for those. A request and a response includes a request ID which is an unsigned integer chosen by the requester. Responses must include the request ID of the request they are associated with.

5.1. Type Key Backwards Compatibility

As messages are modified or extended over time, certain rules must be followed to maintain backwards compatibiilty with agents that understand older versions of messages.

If a required field is added to or removed from a message (either to/from the message directly or indirectly through the field of a field), a new type key must be assigned to the message. Is is effectively a new message and must not be sent unless the receiving agent is known to understand the new type key.
If an optional field is added to a message (either to the message directly or indirectly through the field of a field), the type key may remain unchanged if the behavior of older receiving agents that do not understand the added field is compatible with newer sending agents that include the field. Otherwise, a new type key must be assigned.
If an optional field is removed from a message (either from the message directly or indirectly through the field of a field), the type key may remain unchanged if the behavior of newer receiving agents that do not understand the removed field is compatible with older sending agents that include the field. Otherwise, a new type key must be assigned.
Required fields may not be added or removed from array-based messages, such as audio-frame.

6. Authentication

Each supported authentication method is implemeted via authentication messages specific to that method. The authentication method is explicitly specified by the message itself. The authentication status message is common for all authentication methods. Any new authentication method added must define new authentication messages.

Open Screen Protocol agents must implement § 6.1 Authentication with SPAKE2 with pre-shared keys.

Prior to authentication, agents exchange auth-capabilities messages specifying pre-shared key (PSK) ease of input for the user and supported PSK input methods. The agent with the lowest PSK ease of input presents a PSK to the user when the agent either sends or receives an authentication request. In case both agents have the same PSK ease of input value, the server presents the PSK to the user. The same pre-shared key is used by both agents. The agent presenting the PSK to the user is the PSK presenter, the agent requiring the user to input the PSK is the PSK consumer.

PSK ease of input is an integer in the range from 0 to 100 inclusive, where 0 means it is not possible for the user to input PSK on this device and 100 means that it’s easy for the user to input PSK on the device. Supported PSK input methods are numeric and scanning a QR-code. Devices with non-zero PSK ease of input must support the numeric PSK input method.

Any authentication method may require an auth-initiation-token before showing a PSK to the user or requesting PSK input from the user. For an advertising agent, the at field in its mDNS TXT record must be used as the auth-initation-token in the the first authentication message sent to or from that agent. Agents should discard any authentication message whose auth-initation-token is set and does not match the at provided by the advertising agent.

In the psk-min-bits-of-entropy field of the auth-capabilities messsage, agents may specify the minimum bits of entropy it requires for a PSK, in the range of 20 to 60 bits inclusive, with a default of 20. The PSK presenter must generate a PSK that has at least as many bits of entropy as it receives in this field, and at least as many bits of entropy as it sends in this field.

If an agent chooses to show a user a PSK in more than one way (such as both a QR-code and a numeric PSK), they should be for the same PSK. If they were different, the PSK presenter would not know which one the user chose to use, and that may lead to authentication failures.

Appendix C: PSK Encoding Schemes describes two encoding schemes for PSKs that agents may support to produce either a string or a QR code for display to the user.

6.1. Authentication with SPAKE2

[Meta] Track CFRG PAKE competition outcome [Issue #242]

For all messages and objects defined in this section, see Appendix A: Messages for the full CDDL definitions.

The default authentication method is [SPAKE2](https://tools.ietf.org/html/draft-irtf-cfrg-spake2-26) with the following cipher suite:

Elliptic curve is [edwards25519](https://tools.ietf.org/html/rfc7748#page-4).
Hash function is [SHA-256](https://tools.ietf.org/html/rfc6234).
Key derivation function is [HKDF](https://tools.ietf.org/html/rfc5869).
Message authentication code is [HMAC](https://tools.ietf.org/html/rfc2104).
Password hash function is [SHA-512](https://tools.ietf.org/html/rfc6234).

Open Screen Protocol does not use a memory-hard hash function to hash PSKs with SPAKE2 and uses SHA-512 instead, as the PSK is one-time use and is not stored in any form.

SPAKE2 provides explicit mutual authentication.

This authentication method assumes the agents share a low-entropy secret, such as a number or a short password that could be entered by a user on a phone, a keyboard or a TV remote control.

SPAKE2 is not symmetric and has two roles, Alice (A) and Bob (B).

The messages used in this authentication method are: auth-spake2-handshake, auth-spake2-confirmation and auth-status. [SPAKE2] describes in detail how auth-spake2-handshake and auth-spake2-confirmation are computed.

The values A and B used in SPAKE2 are the agent fingerprints of the client and server, respectively. pw is the PSK presented to the user.

The PSK presenter or the PSK consumer may initiate authentication (assuming the role of Alice in SPAKE2).

If the PSK presenter wants to initiate authentication, it starts the authentication process by presenting the PSK to the user and sending a auth-spake2-handshake message. The public-value field of the auth-spake2-handshake message must be set to the value of pA from SPAKE2 and the psk-status field must be set to psk-shown.

When the PSK consumer receives the auth-spake2-handshake message, the PSK consumer prompts the user for the PSK input if it has not done so yet. Once it receives the PSK, it sends an auth-spake2-handshake message with the public-value field set to the value of pB from SPAKE2 and the psk-status field set to psk-input.

If the PSK consumer wants to initiate authentication, the PSK consumer sends a auth-spake2-handshake message to the PSK presenter with the psk-status field set to psk-needs-presentation and the public-value field set to pA. The PSK presenter, on receiving this message, creates a PSK and presents it to the the user. Once that is done, it sends an auth-spake2-handshake message to the PSK consumer with psk-status set to psk-input and the public-value field set to pB.

Once an agent knows both pA and pB from auth-spake2-handshake messages, it computes and sends a auth-spake2-confirmation with the confirmation-value field set to cA (for Alice) or cB (for Bob) to the other agent.

Once an agent receives an auth-spake2-confirmation message, it validates that message using the procedure in [SPAKE2] and then replies with an auth-status authenticated message to the other agent. Any value of result other than authenticated means that authentication failed, and the agent must immediately disconnect.

NOTE: The auth-status message is merely informative as each agent independently computes the outcome of SPAKE2 through key confirmation verification.

Appendix D: Entire Flow Chart shows the entire process when agents have not authenticated each other, including discovery, QUIC connection establishment, metadata exchange and authentication. When agents have completed authentication, the authentication phase can be omitted.

7. Presentation Protocol

This section defines the use of the Open Screen Protocol for starting, terminating, and controlling presentations as defined by Presentation API. § 7.1 Presentation API defines how APIs in Presentation API map to the protocol messages defined in this section.

To learn which receivers are available presentation displays for a particular presentation request URL or set of URLs, the controller may send a presentation-url-availability-request message with the following values:

urls: A list of presentation URLs. Must not be empty.
watch-duration: The period of time that the controller is interested in receiving updates about their URLs, should the availability change.
watch-id: An identifier the receiver must use when sending updates about URL availability so that the controller knows which URLs the receiver is referring to.

In response, the receiver should send one presentation-url-availability-response message with the following values:

url-availabilities: A list of URL availability states. Each state must correspond to the matching URL from the request by list index.

While the watch is valid (the watch-duration has not expired), the receivers should send presentation-url-availability-event messages when URL availabilities change. Such events contain the following values:

watch-id: The watch-id given in the presentation-url-availability-response, used to refer to the presentation URLs whose availability has changed.
url-availabilities: A list of URL availability states. Each state must correspond to the URLs from the request referred to by the watch-id.

Note that these messages are not broadcasted to all controllers. They are sent individually to controllers that have requested availability for the URLs that have changed in availability state within the watch duration of the original availability request.

To save power, the controller may disconnect the QUIC connection and later reconnect to send availability requests and receive availability responses and updates. The QUIC connection ID may or may not be the same when reconnecting.

To start a presentation, the controller may send a presentation-start-request message to the receiver with the following values:

presentation-id: The presentation identifier
url: The selected presentation URL
headers: headers that the receiver should use to fetch the presentation URL. For example, section 6.6.1 of the Presentation API says that the HTTP Accept-Language header should be provided.

The presentation identifier must follow the restrictions defined by section 6.1 of the Presentation API, in that it must consist of at least 16 ASCII characters.

When the receiver receives the presentation-start-request, it should send back a presentation-start-response message after either the presentation URL has been fetched and loaded, or the receiver has failed to do so. If it has failed, it must respond with the appropriate result (such as invalid-url or timeout). If it has succeeded, it must reply with a success result.

Additionally, the response must include the following:

connection-id: An ID that both agents can use to send connection messages to each other. It is chosen by the receiver for ease of implementation: if the message receiver chooses the connection-id, it may keep the ID unique across connections, thus making message demuxing/routing easier.

The response should include the following:

http-response-code: The numeric HTTP response code that was returned from fetching the presentation URL (after redirects).

To send a presentation message, the controller or receiver may send a presentation-connection-message with the following values:

connection-id: The ID from the presentation-start-response or presentation-connection-open-response messages.
message: The presentation message data.

NOTE: An OSP agent should minimize buffering and processing of messages sent or received via the QUIC connection beyond what is strictly necessary (i.e., CBOR serialization). Message payloads should be treated as real-time data, as they may be used to synchronize playback of media streams between agents or other low latency use cases. The synchronization thresholds recommended in [ITU-R-BT.1359-1] imply that the total agent-to-agent processing latency (including serialization, buffering, QUIC processing, and network latency) must be no greater than 45 ms to permit effective lip sync during media playback.

To terminate a presentation, the controller may send a presentation-termination-request message with the following values:

presentation-id: The ID of the presentation to terminate.
reason: Set to application-request if the application requested termination, or user-request if the user requested termination. (These are the only valid values for reason in a presentation-termination-request.)

When a receiver receives a presentation-termination-request, it should send back a presentation-termination-response message to the requesting controller.

It should also notify other controllers about the termination by sending a presentation-termination-event message. And it can send the same message if it terminates a presentation without a request from a controller to do so. This message contains the following values:

presentation-id: The ID of the presentation that was terminated.
source: Set to controller when the termination was in response to a presentation-termination-request, or receiver otherwise.
reason: The detailed reason why the presentation was terminated.

To accept incoming connection requests from controller, a receiver must receive and process the presentation-connection-open-request message which contains the following values:

presentation-id: The ID of the presentation to connect to.
url: The URL of the presentation to connect to.

The receiver should, upon receipt of a presentation-connection-open-request message, send back a presentation-connection-open-response message which contains the following values:

result: a code indicating success or failure, and the reason for the failure
connection-id: An ID that both agents can use to send connection messages to each other. It is chosen by the receiver for ease of implementation (if the message receiver chooses the connection-id, it may keep the ID unique across connections, thus making message demuxing/routing easier).
connection-count: The new number of open connections to the presentation that received the incoming connection request.

If the presentation-connection-open-response message indicates success, the receiver should also send a presentation-change-event to all other endpoints that have an active presentation connection to that presentation with the values:

presentation-id: The ID of the presentation that just received a new presentation connection.
connection-count: The new total number of open connections to that presentation.

A controller may close a connection without terminating the presentation by sending a presentation-connection-close-event message to the receiver with the following values:

connection-id: The ID of the connection that was closed.
reason: Set to close-method-called or connection-object-discarded.

The receiver may also close a connection without terminating a presentation. If it does so, it should send a presentation-connection-close-event message to the controller with the following values:

connection-id: The ID of the connection that was closed.
reason: Set to close-method-called or connection-object-discarded.
connection-count: The number of open presentation connections that remain.

If a receiver closes a presentation connection (for any reason), it should send a presentation-change-event to all other controllers with an open connection to that presentation with the values:

presentation-id: The ID of the presentation that just closed a connection.
connection-count: The number of open presentation connections that remain.

Note: When an agent closes a presentation connection, it is always successful, so request and response messages are not needed. A request to terminate a presentation may succeed or fail, so a response message is required.

7.1. Presentation API

An Open Screen Protocol agent that is a controlling user agent for the Presentation API must support the control-presentation capability. An OSP agent that is a receiving user agent for the Presentation API must support the receive-presentation capability. The same OSP agent may be a controlling user agent and a receiving user agent.

Note: These roles are independent of which agent was the advertising agent or the listening agent during discovery and connection establishment.

This is how the Presentation API uses the § 7 Presentation Protocol:

When section 6.4.2 says "This list of presentation displays ... is populated based on an implementation specific discovery mechanism", the controller may use the mDNS, QUIC, agent-info-request, and presentation-url-availability-request messages defined previously in this spec to discover receivers.

When section 6.4.2 says "To further save power, ... implementation specific discovery of presentation displays can be resumed or suspended.", the agent may use the power saving mechanism defined in the previous section.

When section 6.3.4 says "Using an implementation specific mechanism, tell U to create a receiving browsing context with D, presentationUrl, and I as parameters.", U (the controller) may send a presentation-start-request message to D (the receiver), with I for the presentation identifier and presentationUrl for the selected presentation URL.

When section 6.3.5 says to "establish a presentation connection with newConnection," let U be the presentationURL of newConnection and I the presentation identifier of newConnection. The agent should send a presentation-connection-open-request message with U for the url and I for the presentation-id.

When section 6.5.2 says "Using an implementation specific mechanism, transmit the contents of messageOrData as the presentation message data and messageType as the presentation message type to the destination browsing context", the controller may send a presentation-connection-message with messageOrData for the presentation message data. Note that the messageType is embedded in the encoded CBOR type and does not need an additional value in the message.

When section 6.5.5 says "Start to signal to the destination browsing context the intention to close the corresponding PresentationConnection", the agent may send a presentation-connection-close-event message to the other agent with the destination browsing context and a presentation-change-event when required.

When section 6.5.6 says "Send a termination request for the presentation to its receiving user agent using an implementation specific mechanism", the controller may send a presentation-termination-request message to the receiver with a reason of application-request.

When section 6.7.1 says "it MUST listen to and accept incoming connection requests from a controlling browsing context using an implementation specific mechanism", the receiver must receive and process the presentation-connection-open-request.

When section 6.7.1 says "Establish the connection between the controlling and receiving browsing contexts using an implementation specific mechanism.", the receiver must send a presentation-connection-open-response message and presentation-change-event messages when required.

8. Representation Of Time

The § 9 Remote Playback Protocol and the § 10 Streaming Protocol represent points of time and durations in terms of a time scale. A time scale is a common denominator for time values that allows values to be expressed as rational numbers without loss of precision. The time scale is represented in hertz, such as 90000 for 90000 Hz, a common time scale for video.

9. Remote Playback Protocol

This section defines the use of the Open Screen Protocol for starting, terminating, and controlling remote playback of media as defined by the Remote Playback API. § 9.2 Remote Playback API defines how APIs in Remote Playback API map to the protocol messages defined in this section.

For all messages defined in this section, see Appendix A: Messages for the full CDDL definitions.

To learn which receivers are compatible remote playback devices for a particular URL or set of URLs, the controller may send a remote-playback-availability-request message with the following values:

sources: A list of media resources, the same as specified in the remote-playback-start-request message. Must not be empty.
headers: headers that the receiver should use to fetch the urls. For example, section 6.2.4 of the Remote Playback API says that the Accept-Language header should be provided.
watch-duration: The period of time that the controller is interested in receiving updates about their URLs, should the availability change.
watch-id: An identifier the receiver must use when sending updates about URL availability so that the controller knows which URLs the receiver is referring to.

In response, the receiver should send a remote-playback-availability-response message with the following values:

url-availabilities: A list of URL availability states. Each state must correspond to the matching URL from the request by list index.

The receivers should later (up to the current time plus request watch-duration) send remote-playback-availability-event messages if URL availabilities change. Such events contain the following values:

watch-id: The watch-id given in the remote-playback-availability-response used to refer to the remote playback URLs whose availability has changed.
url-availabilities: A list of URL availability states. Each state must correspond to the URLs from the request referred to by the watch-id.

To start remote playback, the controller may send a remote-playback-start-request message to the receiver with the following values:

remote-playback-id: An identifier for this remote playback. It should be universally unique among all remote playbacks.

Note: A version 4 (pseudorandom) UUID is recommended as it meets the requirements for a remote-playback-id.

sources (optional): The media resources that the controller has selected for playback on the receiver. Each source must include a source URL and should include an extended MIME type when available for the media resource. If sources is missing or empty, the remoting field must be populated, as the controller will use a streaming session to send encoded media.
text-track-urls: URLs of text tracks associated with the media resources.
controls: Initial controls for modifying the initial state of the remote playback, as defined in § 9.1 Remote Playback State and Controls. The controller may send controls that are optional for the receiver to support before it knows the receiver supports them. If the receiver does not support them, it will ignore them and the controller will learn that it does not support them from the remote-playback-start-response message.
remoting (optional): Parameters for starting a streaming session associated with this remote playback. If not included, no streaming session is started. Required when sources is missing or empty.

When the receiver receives a remote-playback-start-request message, it should send back a remote-playback-start-response message. It should do so quickly, usually before the media resource has been loaded and instead give updates of the progress of loading with remote-playback-state-event messages, unless the receiver decides to not attempt to load the resource at all. If it chooses not to, it must respond with the appropriate failure result (such as timeout or invalid-url). Additionally, the response must include the following:

state: The initial state of the remote playback, as defined in § 9.1 Remote Playback State and Controls.
remoting (optional): A response to the started streaming session associated with this remote playback. If not included, no streaming session is started.

If a streaming session is started, streaming messages such a streaming-session-modify-request and video-frame can be used for the streaming session as if the streaming session had been started with streaming-session-start-request and streaming-session-start-response. The streaming session may be terminated before the remote playback is terminated, but if the remote playback is terminated first, the streaming session associated with it is automatically terminated.

Add a back pressure signal for media remoting [Issue #241]

If the controller wishes to modify the state of the remote playback (for example, to pause, resume, skip, etc), it may send a remote-playback-modify-request message with the following values:

remote-playback-id: The ID of the remote playback to be modified.
controls: Updated controls as defined in § 9.1 Remote Playback State and Controls.

When a receiver receives a remote-playback-modify-request it should send a remote-playback-modify-response message in reply with the following values:

state: The updated state of the remote playback as defined in § 9.1 Remote Playback State and Controls.

When the state of remote playback changes without request for modification from the controller (such as when the skips or pauses due to user user interaction on the receiver), the receiver may send a remote-playback-state-event to the controller.

The receiver should send a remote-playback-state-event message whenever:

Any of the following methods are called:

Any of the following attributes observably change since the last sent remote-playback-state-event message:

The timeline offset associated with the playback changes since the last sent remote-playback-state-event message:

The stalled event needs to fire at the associated HTMLMediaElement instance.

More than 250ms pass since the last remote-playback-state-event message and any of the following attributes observably change since the last remote-playback-state-event message. Any new continuously changing attributes fall under this rule.

NOTE: A media element is required to fire a timeupdate event every 250ms or sooner.

remote-playback-id: The ID of the remote playback whose state has changed.
state: The updated state of the remote playback, as defined in § 9.1 Remote Playback State and Controls.

To terminate the remote playback, the controller may send a remote-playback-termination-request message with the following values:

remote-playback-id: The ID of the remote playback to terminate.
reason: The reason the remote playback is being terminated.

When a receiver receives a remote-playback-termination-request, it should send back a remote-playback-termination-response message to the controller.

If a receiver terminates a remote playback without a request from the controller to do so, it must send a remote-playback-termination-event message to the controller with the following values:

remote-playback-id: The ID of the remote playback that was terminated.
reason: The reason the remote playback was terminated.

As mentioned in Remote Playback API section 6.2.7, terminating the remote playback means the controller is no longer controlling the remote playback and does not necessarily stop media from rendering on the receiver. Whether or not the receiver stops rendering media depends upon the implementation of the receiver.

9.1. Remote Playback State and Controls

In order for the controller and receiver to stay in sync with regards to the state of the remote playback, the controller may send controls to modify the state (for example, via the remote-playback-modify-request message) and the receiver may send updates about state changes (for example, via the remote-playback-state-event message).

The controls sent by the controller include the following individual control values, each of which is optional. This allows the controller to change one control value or many control values at once without having to specify all control values every time. A non-present control value indicates no change. A present control value indicates the change defined below. These controls intentionally mirror settable attributes and methods of the HTMLMediaElement.

source: Change the media resource. See HTMLMediaElement.src for more details. Must not be used in the initial controls of the remote-playback-start-request message (which already contains a media resource).
preload: Set how aggressively to preload media. See HTMLMediaElement.preload for more details. Should only be used in the initial controls of the remote-playback-start-request message or when the source is changed. If not set in the initial controls, it is left to the receiver to decide. This is optional for the receiver to support and if not supported, the receiver will behave as though it were never set.
loop: Set whether or not to loop media. See HTMLMediaElement.loop for more details. Should only be used in the initial control of the remote-playback-start-request. If not set in the initial controls, it is assumed to be false.
paused: If true, pause; if false, resume. See HTMLMediaElement.pause() and HTMLMediaElement.play() for more details. If not set in the initial controls, it is left to the receiver to decide.
muted: If true, mute; if false, unmute. See HTMLMediaElement.muted for more details. If not set in the initial controls, it is left to the receiver to decide.
volume: Set the audio volume in the range from 0.0 to 1.0 inclusive. See HTMLMediaElement.volume for more details. If not set in the initial controls, it is left to the receiver to decide.
seek: Seek to a precise time. See HTMLMediaElement.currentTime for more details.
fast-seek: Seek to an approximate time as fast as possible. See HTMLMediaElement.fastSeek() for more details.
playback-rate: Set the rate a which the media plays. See HTMLMediaElement.playbackRate for more details. If not set in the initial controls, it is left to the receiver to decide. This is optional for the receiver to support and if not supported, the receiver will behave as though it were never set.
poster: Set the URL of an image to show when video data is not available. See poster frame for more details. If not set in the initial controls, no poster is used and the receiver can choose what to render when video data is unavailable. This is optional for the receiver to support and if not supported, the receiver will behave as though it were never set.
enabled-audio-track-ids: Enable included audio tracks by ID and disable all other audio tracks. See HTMLMediaElement.audioTracks for more details.
selected-video-track-id: Select the given video track by ID and unselect all other video tracks. See HTMLMediaElement.videoTracks for more details.
added-text-tracks: Add text tracks with the given kinds, labels, and languages. See HTMLMediaElement.addTextTrack() for more details. This is optional for the receiver to support and if not supported, the receiver will behave as though it were never set.
changed-text-tracks: Change text tracks by ID. All other text tracks are left unchanged. Set the mode, add cues, and remove cues by id. See HTMLMediaElement.textTracks for more details. Note that future specifications or extensions to this specifications are expected to add new fields to the text-track-cue (such as text size, alignment, position, etc). Adding and removing cues is optional for the receiver to support and if not supported, the receiver will behave as though no cues were added or removed (both adding and removing are indicated via the support for "added-cues"). As specified in HTMLMediaElement.textTracks, if a cue ID is invalid (removing an un-added ID or adding an ID twice, for example), the receiver may reject the text track change.

Field	Default value for the initial controls	Receiver support
source	`urls` in remote-playback-start-request	Required
preload	Decided by the receiver	Not required
loop	False	Required
paused	Decided by the receiver	Required
muted	Decided by the receiver	Required
volume	Decided by the receiver	Required
seek	(None)	Required
fast-seek	(None)	Required
playback-rate	Decided by the receiver	Not required
poster	Decided by the receiver	Not required
enabled-audio-track-ids	(None)	Required
selected-video-track-id	(None)	Required
added-text-tracks	(None)	Not required
changed-text-tracks	(None)	Not required

The states sent by the receiver include the following individual state values, each of which is optional. This allows the receiver to update the controller about more than one state value at once without having to specify all state values every time. A non-present state value indicates the state has not changed.

supports: The controls the receiver supports. These may differ according to the media resource and should not change unless the media resource also changes. The default is empty (support for nothing) for the initial state in the remote-playback-start-response message.
source: The current media resource. See HTMLMediaElement.currentSrc. Must be present in the initial state in the remote-playback-start-response message so the controller knows what media resource was selected for playback.
loading: The state of network activity for loading the media resource. See HTMLMediaElement.networkState. The default is empty (NETWORK_EMPTY) for the initial state in the remote-playback-start-response message.
loaded: The state of the loaded media (whether enough is loaded to play). See HTMLMediaElement.readyState. The default is nothing (HAVE_NOTHING) for the initial state in the remote-playback-start-response message.
error: A major error occurred which prevents the remote playback from continuing. See HTMLMediaElement.error and media error codes. The default is no error for the initial state in the remote-playback-start-response message.
epoch: The "zero time" of the media timeline, in milliseconds relative to the epoch. See timeline offset and HTMLMediaElement.getStartDate(). The default is an unknown epoch for the initial state in the remote-playback-start-response message, which is represented by null.
duration: The duration of the media timeline, in seconds. See HTMLMediaElement.duration. The default is an unknown duration for the initial state in the remote-playback-start-response message, which is represented by null.
buffered-time-ranges: The time ranges for which media has been buffered. See HTMLMediaElement.buffered. The default is an empty array for the initial state in the remote-playback-start-response message.
played-time-ranges: The time ranges reached by the playback position during normal playback. See HTMLMediaElement.played. The default is an empty array for the initial state in the remote-playback-start-response message.
seekable-time-ranges: The time ranges for which media is seekable by the controller or the receiver. See HTMLMediaElement.seekable. The default is an empty array for the initial state in the remote-playback-start-response message.
position: The playback position. See official playback position and HTMLMediaElement.currentTime. The default is 0 for the initial state in the remote-playback-start-response message.
playbackRate: The current rate of playback on a scale where 1.0 is "normal speed". See HTMLMediaElement.playbackRate. The default is 1.0 for the initial state in the remote-playback-start-response message.
paused: Whether media is paused or not. See HTMLMediaElement.paused. The default is false for the initial state in the remote-playback-start-response message.
seeking: Whether the receiver is seeking or not. See HTMLMediaElement.seeking. The default is false for the initial state in the remote-playback-start-response message.
stalled: If true, media is not playing because not enough media is loaded, and false otherwise. See the stalled event. The default is false for the initial state in the remote-playback-start-response message.
ended: Whether media has reached the end or not. See HTMLMediaElement.ended. The default is false for the initial state in the remote-playback-start-response message.
volume: The current volume of playback on a scale of 0.0 to 1.0. See HTMLMediaElement.volume. The default is 1.0 for the initial state in the remote-playback-start-response message.
muted: True if audio is muted (overriding the volume value) and false otherwise. See HTMLMediaElement.muted. The default is false for the initial state in the remote-playback-start-response message.
resolution: The "intrinsic width" and "intrinsic width" of the video. See HTMLVideoElement.videoWidth and HTMLVideoElement.videoHeight. The default is an unknown resolution for the initial state in the remote-playback-start-response message, which is represented by null.
audio-tracks: The available audio tracks, which can individually enabled or disabled. See HTMLMediaElement.audioTracks. The default is an empty array for the initial state in the remote-playback-start-response message.
video-tracks: The available video tracks. Only one may be selected. See HTMLMediaElement.videoTracks. The default is an empty array for the initial state in the remote-playback-start-response message.
text-tracks: The available text tracks, which can be individually shown, hidden, or disabled. See HTMLMediaElement.textTracks. The controller can also add cues to and remove cues from text tracks. The default is an empty array for the initial state in the remote-playback-start-response message.

Media positions, durations, and time ranges are defined in terms of the media timeline specified in HTML, which are fractional seconds between zero and the media duration.

NOTE: An Open Screen agent can convert between values on the media timeline and the media sync time sent with individual media frames using the steps in Appendix E: Media Time Conversions.

Field	Default value for the initial state
supports	Empty
source	`url` in `state` in remote-playback-start-response (required field)
loading	`empty`
loaded	`nothing`
error	No error
epoch	`null`
duration	`null`
buffered-time-ranges	Empty array
played-time-ranges	Empty array
seekable-time-ranges	Empty array
position	0.0
playbackRate	1.0
paused	False
seeking	False
stalled	False
ended	False
volume	1.0
muted	False
resolution	`null`
audio-tracks	Empty array
video-tracks	Empty array
text-tracks	Empty array

9.2. Remote Playback API

An Open Screen Protocol agent that implements the Remote Playback API must support the control-remote-playback capability. It may support the send-streaming capability so it can send HTMLMediaElement media data through media remoting.

An an OSP agent that is a remote playback device for the Remote Playback API must support the receive-remote-playback capability. It may support the receive-streaming capability so it can receive HTMLMediaElement data through media remoting.

The same OSP agent may implement both the Remote Playback API and be a remote playback device for that API.

Note: These roles are independent of which agent was the advertising agent or the listening agent during discovery and connection establishment.

This is how the Remote Playback API uses the messages defined in § 9 Remote Playback Protocol:

When section 5.2.1.2 says "This list contains remote playback devices and is populated based on an implementation specific discovery mechanism" and section 5.2.1.4 says "Retrieve available remote playback devices (using an implementation specific mechanism)", the user agent may use the mDNS, QUIC, agent-info-request, and remote-playback-availability-request messages defined previously in this spec to discover receivers. The remote-playback-availability-request URLs must contain the availability sources set.

When section 5.2.4 says "Request connection of remote to device. The implementation of this step is specific to the user agent." and "Synchronize the current media element state with the remote playback state", the controller may send the remote-playback-start-request message to the receiver to start remote playback. The remote-playback-start-request URLs must contain the remote playback source. The current Remote Playback API only allows a single source, but the protocol allows for several and future versions of Remote Playback API may allow for several.

When section 5.2.4 says "The mechanism that is used to connect the user agent with the remote playback device and play the remote playback source is an implementation choice of the user agent. The connection will likely have to provide a two-way messaging abstraction capable of carrying media commands to the remote playback device and receiving media playback state in order to keep the media element state and remote playback state in sync", the controller may send remote-playback-modify-request messages to the receiver to change the remote playback state based on changes to the local media element and receive remote-playback-modify-response and remote-playback-state-event messages to change the local media element based on changes to the remote playback state.

When section 5.2.7 says "Request disconnection of remote from the device. The implementation of this step is specific to the user agent," the controller may send the remote-playback-termination-request message to the receiver.

10. Streaming Protocol

This section defines the use of the Open Screen Protocol for streaming media from a media sender to a media receiver.

If an Open Screen Protocol agent is a media sender, it must advertise the send-streaming capability. If an OSP agent is a media receiver, it must advertise the receive-streaming capability. The same agent may be a media sender and a media receiver.

Note: These roles are independent of which agent was the advertising agent or the listening agent during discovery and connection establishment.

10.1. Streaming Protocol Capabilities

If the advertiser is already authenticated, the requester has the ability to request additional information by sending an streaming-capabilities-request message, and receive back a streaming-capabilities-response message with the following fields:

receive-audio (required): A list of capabilities for receiving audio. For an explanation of fields, see below.
receive-video (required): A list of capabilities for receiving video. For an explanation of fields, see below.

The format type is used as the basis for audio and video capabilities. Formats are composed of the following fields:

codec-name (required): A fully qualified codec string listed in the [WEBCODECS-CODEC-REGISTRY] and further specified by the codec-specific registrations referenced in that registry.

For codec-name, Open Screen agents may also accept a single-codec codec parameter as described in [RFC6381] for codecs not listed in the [WEBCODECS-CODEC-REGISTRY].

Audio capabilities are composed of the above format type, with the following additional fields:

max-audio-channels (optional): An optional field indicating the maximum amount of audio channels the media receiver is capable of supporting. Default value is "2," meaning a stereo speaker channel setup.
min-bit-rate (optional): An optional field indicating the minimum audio bit rate that the media receiver can handle, in kilobits per second. Default is no minimum.

Video capabilities are similarly composed of the above format type, with the following additional fields:

max-resolution (optional): An optional field indicating the maximum video-resolution (width, height) that the media receiver is capable of processing. Default is no maximum.
max-frames-per-second (optional): An optional field indicating the maximum frames-per-second the media receiver is capable of processing. Default is no maximum.
max-pixels-per-second (optional): An optional field indicating the maximum pixels-per-second the media receiver is capable of processing, in pixels per second. Default is no maximum.
min-video-bit-rate (optional): An optional field indicating the minimum video bit rate the device is capable of processing, in kilobits per second. Default is no minimum.
aspect-ratio (optional): An optional field indicating what its ideal aspect ratio is, e.g. a 16:10 display could return this value as 1.6 to indicate its preferred content scaling. Default is none.
color-gamut (optional): An optional field indicating the widest color space that can be decoded and rendered by the media receiver. The media sender may use this value to determine how to encode video, and should assume all narrower color spaces are supported. Valid values correspond to ColorGamut in the Media Capabilities API. The default value is "srgb".

NOTE: Support for "p3" implies support for "srgb", and support for "rec2020" implies support for "p3" and "srgb".

hdr-formats (optional)

An optional field indicating what HDR transfer functions and metadata formats can be decoded and rendered by the media receiver. Each video-hdr-format consists of two fields, transfer-function and hdr-metadata.

The transfer-function field must be a valid TransferFunction and the hdr-metadata field must be a valid HdrMetadataType, both defined in the Media Capabilities API.

If a video-hdr-format is provided with a transfer-function but no hdr-metadata, then the media receiver can render the transfer-function without any associated metadata. (This is the case, for example, with the "hlg" transfer-function.)

The media receiver should ignore duplicate entries in hdr-formats. If no hdr-formats are listed, then the media reciever cannot decode any HDR formats.

native-resolutions (optional)

An optional field indicating what video-resolutions the media receiver supports and considers to be "native," meaning that scaling is not required. The default value is none.

supports-scaling (optional)

An optional boolean field indicating whether the media receiver can scale content provided in a video-resolution not listed in the native-resolutions list (if provided) or of a different aspect ratio. The default value is true.

supports-rotation (optional)

An optional boolean field indicating whether the media receiver can receive video frames with the rotation field set. The default value is true.

10.2. Sessions

To start a streaming session, a sender may send a streaming-session-start-request message with the following fields:

streaming-session-id: Identifies the streaming session. Must be unique for the (sender, receiver) pair. Can be used later to modify or terminate a streaming session. These IDs should be treated like other IDs with regards to the state-token as specified in § 11 Requests, Responses, and Watches.
desired-stats-interval: Indicates the frequency the receiver should send stats messages to the sender.
stream-offers: Indicates the streams that the receiver can request from the sender.

Each stream offer contains the following fields:

media-stream-id: Identifies the media stream being offered. Must be unique within the streaming session. Can be used by the receiver to request the media session. These IDs should be treated like other IDs with regards to the state-token as specified in § 11 Requests, Responses, and Watches.
display-name: An optional name intended to be shown to a user, such that the receiver may allow the user to choose which media streams to receive, or if they are received automatically by the receiver, give the user some information about what the media stream is.
audio: A list of audio encodings offered. An audio encoding is a series of encoded audio frames. Encodings define fields needed by the receiver to know how to decode the encoding, such as codec. They can differ by codec and related fields, but should be different encodings of the same audio.
video: A list of video encodings offered. A video encoding is a series of encoded video frames. Encodings define fields needed by the receiver to know how to decode the encoding, such as codec and default duration. They can differ by codec and potentially other fields, but should be different encodings of the same video.
data: A list of data encodings offered. A data encoding is a series of data frames. Encodings define fields needed by the receiver to know how to interpret the encoding, such as data type and default duration. They can differ by data type and potentially other fields, but should be different encodings of the same data. (For encodings of different data, use distinct media streams, not distinct encodints with the same media stream).

Each audio encoding offered defines the following fields:

encoding-id: Identifies the audio encoding being offered. Must be unique within the media stream. These IDs should be treated like request IDs with regards to the state-token as specified in § 11 Requests, Responses, and Watches.
codec-name: The name of the codec used by the encoding, following the same rules as codec-name in § 10.1 Streaming Protocol Capabilities.
time-scale: The time scale used by all audio frames. This allows senders to make audio-frame messages smaller by not including the time scale in each one.
default-duration:: The duration of an audio frame. This allows senders to make audio-frame messages smaller by not including the duration for audio-frame messages that have the default duration.

Each video encoding offered defines the following fields:

encoding-id: Identifies the video encoding being offered. Must be unique within the media stream. These IDs should be treated like request IDs with regards to the state-token as specified in § 11 Requests, Responses, and Watches.
codec-name: The name of the codec used by the encoding, following the same rules as codec-name in § 10.1 Streaming Protocol Capabilities.
time-scale: The time scale used by all video frames. This allows senders to make video-frame messages smaller by not including the time scale in each one.
default-duration:: The default duration of a video frame. This allows senders to make video-frame messages smaller by not including the duration for video-frame messages that have the default duration.
default-rotation:: The default rotation of a video frame. This allows senders to make video-frame messages smaller by not including the rotation for video-frame messages that have the default rotation.

Each data encoding offered defines the following fields:

encoding-id: Identifies the data encoding being offered. Must be unique within the media stream. These IDs should be treated like request IDs with regards to the state-token as specified in § 11 Requests, Responses, and Watches.
data-type-name: The name of the data type used by the encoding.
time-scale: The time scale used by all data frames. This allows senders to make data-frame messages smaller by not including the time scale in each one.
default-duration:: The duration of an data frame . This allows senders to make data-frame messages smaller by not including the duration for data-frame messages that have the default duration.

After receiving a streaming-session-start-request message, a receiver should send back a streaming-session-start-response message with the following fields:

desired-stats-interval: Indicates the frequency the sender should send stats messages to the receiver.
stream-requests: Indicates which media streams the receiver would like to receive from the sender.

Each stream request contains the following fields:

media-stream-id: The ID of the stream reqeusted.
audio (optional): The requested audio encoding, by encoding ID
video (optional): The requested video encoding, by encoding ID. It may include a target resolution and maximum frame rate. The sender should not exceed the maximum frame rate and should attempt to send at the target bitrate, possibly exceeding it by a small amount.
data (optional): The requested data encoding, by encoding ID

During a streaming session, the receiver can modify the requests it made for encodings by sending a streaming-session-modify-request containing a modified list of stream-requests. When the sender receives a streaming-session-modify-request, it should send back a streaming-session-modify-response indicate whether or not the application of the new request from the streaming-session-modify-request was successful.

NOTE: If the sender wishes to send an encoding other than the one selected by the receiver in a streaming-session-start-response or streaming-session-modify-request, it must terminate the current session and start a new session.

Finally, the sender may terminate the streaming session by sending a streaming-session-terminate-request command. When the receiver receives the streaming-session-terminate-request, it should send back a streaming-session-terminate-response. The receiver can terminate at any point and notify the sender by sending a streaming-session-terminate-event message.

10.3. Audio

Media senders may send audio to media receivers by sending audio-frame messages (see Appendix A: Messages) with the following keys and values. An audio frame message contains a set of encoded audio samples for a range of time. A series of encoded audio frames that share a codec and a timeline form an audio encoding.

Unlike most Open Screen Protocol messages, this one uses an array-based grouping rather than a struct-based grouping. For required fields, this allows for a more efficient use of bytes on the wire, which is important for streaming audio because the payload is typically so small and every byte of overhead is relatively large. In order to accomodate optional values in the array-based grouping, one optional field in the array is used to hold all optional values in a struct-based grouping. This will hopefully provide a good balance of efficiency and flexibility.

To allow for audio frames to be sent out of order, they should be sent in separate QUIC streams.

encoding-id: Identifies the media encoding to which this audio frame belongs. This can be used to reference fields of the encoding (from the audio-encoding-offer message) such as the codec, codec properties, time scale, and default duration. Referencing fields of the encoding through the encoding id helps to avoid sending duplicate information in every frame.
start-time: Identifies the beginning of the time range of the audio frame. The end time can be inferred from the start time and duration. The time scale is equal to the value in the time-scale field of the audio-encoding-offer message referenced by the encoding-id.
duration: If present, the duration of the audio frame. If not present, the duration is equal to the default-duration field of the audio-encoding-offer message referenced by the encoding-id. The time scale is equal to the value in the time-scale field of the audio-encoding-offer message referenced by the encoding-id.
sync-time: If present, a time used to synchronize the start time of this audio frame (and thus, this encoding) with that of other media encodings on different timelines. It may be wall clock time, but it need not be; it can be any clock chosen by the media sender.
payload: The encoded audio. The codec is equal to the codec-name field of the audio-encoding-offer message referenced by the encoding-id.

10.4. Video

Media senders may send video to media receivers by sending video-frame messages (see Appendix A: Messages) with the following keys and values. A video frame message contains an encoded video frame (an encoded image) at a specific point in time or over a specfic time range (if the duration is known). A series of encoded video frames that share a codec and a timeline form a video encoding.

To allow for video frames to be sent out of order, they may be sent in separate QUIC streams. If the encoding is a long chain of encoded video frames dependent on the previous one back until an independent frame, it may make sense to send them in a single QUIC stream starting at the indepdendent frame and ending at the last dependent frame.

encoding-id: Identifies the media encoding to which this video frame belongs. This can be used to reference fields of the encoding such as the codec, codec properties, time scale, and default rotation. Referencing fields of the encoding through the encoding id helps to avoid sending duplicate information in every frame.
sequence-number: Identifies the frame and its order in the encoding. Within an encoding, larger sequence numbers mean later start times. Within an encoding, gaps in sequence numbers mean frames are missing.
depends-on: If present, the sequence numbers of the frames this frame depends on. If a sequence numbers is negative, it is treated as a relative sequence numbers and the sequence numbers is calculated by adding it to the sequence number of this frame. If empty, this is an independent frame (a key frame). If not present, the default value is [-1].
start-time: Identifies the beginning of the time range of the video frame. The end time can be inferred from the start time and duration. The time scale is equal to the value in the time-scale field of the video-encoding-offer message referenced by the encoding-id.
duration: If present, the duration of the video frame. If not present, that means duration is unknown. The time scale is equal to the value in the time-scale field of the video-encoding-offer message referenced by the encoding-id.
sync-time: If present, a time used to synchronize the start time of this frame (and thus, this encoding) with that of other media encodings on different timelines.
rotation: If present, indicates how the frame should be rotated after decoding but before rendering. Rotation is clockwise in increments of 90 degrees. The default is equal to the default-rotation field of the video-encoding-offer message referenced by the encoding-id.
payload: The encoded video frame (encoded image). The codec is equal to the codec-name field of the video-encoding-offer message referenced by the encoding-id.

10.5. Data

Media senders may send timed data to media receivers by sending data-frame messages (see Appendix A: Messages) with the following keys and values. A data frame message contains an arbitrary payload that can be synchronized with audio and video. A series of data frames that share a data type and timeline form a data encoding.

To allow for data frames to be sent out of order, they may be sent in separate QUIC streams, but more than one data frame may be sent in one QUIC stream if that makes sense for a specific type of data.

encoding-id: Identifies the data encoding to which this data frame belongs. This can be used to reference fields of the encoding such as the type of data and time scale. Referencing fields of the encoding through the encoding id helps to avoid sending duplicate information in every frame.
sequence-number: Identifies the frame and its order in the encoding. Within an encoding, larger sequence numbers mean later start times. Within an encoding, gaps in sequence numbers mean frames are missing.
start-time: Identifies the beginning of the time range of the data frame. The end time can be inferred from the start time and duration. The time scale is equal to the value in the time-scale field of the data-encoding-offer message referenced by the encoding-id.
duration: If present, the duration of the data frame. If not present, the duration is equal to the default-duration field of the data-encoding-offer message referenced by the encoding-id. The time scale is equal to the value in the time-scale field of the data-encoding-offer message referenced by the encoding-id.
sync-time: If present, a time used to synchronize the start time of this data frame (and thus, this encoding) with that of other media encodings on different timelines.
payload: The data. The data type is equal to the data-type-name field of the the data-encoding-offer message referenced by the encoding-id.

10.6. Feedback

The media receiver can send feedback to the media sender, such as key frame requests.

A video key frame is requested by sending a video-request message with the following keys and values.

To allow for video frames to be sent out of order, they may be sent in separate QUIC streams.

encoding-id: The encoding for which the media sender should send a new key frame.
sequence-number: Gives the order in the encoding. Within an encoding, larger sequence numbers invalidate previous ones. A media sender may ignore smaller sequence numbers after a larger one has been processed. This it to prevent out-of-order requests from generating more key frames than necessary.
highest-decoded-frame-sequence-number: uint: If set, the media sender may generate a video frame dependent on the last decoded frame. If not set, the media sender must generate an indepdendent (key) frame.

10.7. Stats

During a streaming session, the sender should send stats with the streaming-session-sender-stats-event at the interval the receiver requested. It should send all of the following stats for all of the media streams it is sending. The streaming-session-sender-stats-event message contains the following fields:

streaming-session-id: The ID of the streaming session these stats apply to.
system-time: The time when the stats were calculated, using a monotonic system clock.
audio: Stats specific to audio. Stats for multiple encodings can be sent at once, but encodings need not be included if the stats haven’t changed. See below.
video: Stats specific to video. Stats for multiple encodings can be sent at once, but encodings need not be included if the stats haven’t changed. See below.

Audio encoding sender stats include the following fields:

encoding-id: The ID of the encoding for which the stats apply.
cumulative-sent-frames: The total number of frames sent.
cumulative-encode-delay: The sum of the time spent encoding frames sent.

Video encoding sender stats include the following fields:

encoding-id: The ID of the encoding for which the stats apply.
cumulative-sent-duration: The sum of all of the durations of all of the audio frames sent.
cumulative-encode-delay: The sum of the time spent encoding frames sent.
cumulative-dropped frames: The total number of frames that were not sent due to network, CPU, or other contraints.

During a streaming session, the receiver should send stats with the streaming-session-receiver-stats-event at the interval the sender requested. It should send all of the following stats for all of the media streams it is receiving.

If the receiver is using a buffer to hold frames before playing them out, it should also send the status of that buffer using the remote-buffer-status field. It can have one of three values:

enough-data: The buffer has neither too much data nor insufficient data.
insufficient-data: The buffer will underrun and not have sufficient frame data at the time it is scheduled to be played out.
too-much-data: At the current send rate, the buffer will overrun and future frame data will be discarded before it can be played out.

A sender that receives a status of insufficient-data should increase its send rate, or switch to a more efficient encoding for future frames. A sender that receives a status of too-much-data should decrease its send rate.

If the receiver is playing frames immediately without buffering, it should always report a buffering status of enough-data.

The streaming-session-receiver-stats-event message contains the following fields:

streaming-session-id: The ID of the streaming session these stats apply to.
system-time: The time when the stats were calculated, using a monotonic system clock.
audio: Stats specific to audio. Stats for multiple encodings can be sent at once, but encodings need not be included if the stats haven’t changed. See below.
video: Stats specific to video. Stats for multiple encodings can be sent at once, but encodings need not be included if the stats haven’t changed. See below.

Audio encoding receiver stats include the following fields. If not present, that indicates the value has not changed since the last value.

encoding-id: The ID of the encoding for which the stats apply.
cumulative-decoded-frames: The total number of audio frames received and decoded.
cumulative-received-duration: The sum of all of the durations of all of the audio frames received.
cumulative-lost-duration: The sum of all of the durations of all of the audio frames detected as lost.
cumulative-buffer-delay: The sum of the time frames spent buffered between receipt and playout.
cumulative-decode-delay: The sum of the time spent decoding frames received.
remote-buffer-status : streaming-buffer-status: The status of the remote buffer for this encoding.

Video encoding receiver stats include the following fields. If not present, that indicates the value has not changed since the last value.

encoding-id: The ID of the encoding for which the stats apply.
cumulative-decoded-frames: The total number of video frames received and decoded.
cumulative-lost-frames: The total number of video frames detected as lost.
cumulative-buffer-delay: The sum of the time frames spent buffered between receipt and render.
cumulative-decode-delay: The sum of the time spent decoding frames received.
remote-buffer-status : streaming-buffer-status: The status of the remote buffer for this encoding.

11. Requests, Responses, and Watches

Multiple sub-protocols in the Open Screen Protocol have messages that act as requests, responses, watches, and events. Most requests have a request-id, and the agent that receives the request must send exactly one reponse message in return with the same request-id. A watch request has a watch-id, and the agent that receives the request may send any number of event messages in response with the same watch-id, until the watch request expires.

request-id and watch-id values are unsigned integer IDs that are assigned from a counter kept by each agent that starts at 1 and increments by 1 for each ID. Whenever an agent changes its state-token, it must reset its counter to 1.

When an agent sees that another agent has reset its state (by virtue of advertising a new state-token), it should discard any requests, responses, watches and events for that agent.

Other IDs that must be unique and would cause confusion if one side loses state, such as streaming-session-id, media-session-id, and encoding-id should be treated the same.

Note: Request and watch IDs are not tied to any particular QUIC connection between agents. If a QUIC connection is closed, an agent should not discard requests, responses, watches, or events related to the other party. This allows agents to save power by closing unused connections.

Note: Request and watch IDs are not unique across agents. An agent can combine a request ID with a unique identifier for the agent that sent it (like its certificate fingerprint) to track requests across multiple agents.

12. Protocol Extensions

Open Screen Protocol agents may exchange extension messages that are not defined by this specification. This could be used for experimentation, customization or other purposes.

To add new extension messages, extension authors must register a capability ID with a range of message type keys in a public registry. Agents may then indicate that they accept an extension by including the corresponding capability ID in the capabilities field of its agent-info message.

Capability IDs 1-999 are reserved for use by the Open Screen Protocol. Capability IDs 1000 and above are available for extensions. See Appendix B: Message Type Key Ranges for legal ranges for extension message type keys.

Note: The purpose of the public registry is to prevent conflicts between multiple extension authors' capability IDs and message type keys.

Agents must not send extension messages to another agent that has not advertised the corresponding extension capability ID.

Note: See § 5 Messages delivery using CBOR and QUIC streams for how agents handle unknown message type keys.

It is recommended that extension messages are also encoded in CBOR, to simplify implementations and provide an easier path to standardization of extension protocols. However, this is not required; agents that support non-CBOR extensions must be able to decode QUIC streams that contain a mix of CBOR messages and non-CBOR extension messages.

12.1. Protocol Extension Fields

It is legal for an agent to add additional, extension fields to any map-valued CBOR message type defined by the Open Screen Protocol. Extension fields must be optional, and the Open Screen Protocol message must make sense both with and without the field set.

Agents must not add extended fields to the audio-frame message directly. Instead, they may add them to its nested optional value.

Extension fields should use string keys to avoid conflicts with integer keys in Open Screen Protocol messages. An agent should not send extension fields to another agent unless that agent advertises an extension capability ID in its agent-info that indicates that it understands the extension fields.

13. Security and Privacy

The Open Screen Protocol allows two OSP agents to discover each other and exchange user and application data. As such, its security and privacy considerations should be closely examined. We first evaluate the protocol itself using the W3C Security and Privacy Questionnaire. We then examine whether the security and privacy guidelines recommended by the Presentation API and the Remote Playback API are met. Finally we discuss recommended mitigations that agents can use to meet these security and privacy requirements.

13.1. Threat Models

13.1.1. Passive Network Attackers

The Open Screen Protocol should assume that all parties that are connected to the same LAN are able to observe all data flowing between OSP agents.

These parties will be able collect any data exposed through unencrypted messages, such as mDNS records and the QUIC handshakes.

These parties may attempt to learn cryptographic parameters by observing data flows on the QUIC connection, or by observing cryptographic timing.

13.1.2. Active Network Attackers

Active attackers, such as compromised routers, will be able to manipulate data exchanged between agents. They can inject traffic into existing QUIC connections and attempt to initiate new QUIC connections. These abilities can be used to attempt the following:

Impersonate an agent or one already authenticated by the user, in an attempt to convince the user to authenticate to it.
Connect to an agent and query its capabilities.
Connect to and control a presentation or remote playback, or extract data from the application state of the presentation or remote playback.

One particular attack of concern is misconfigured or compromised routers that expose local network devices (such as OSP agents) to the Internet. This vector of attack has been used by malicious parties to take control of printers and smart TVs by connecting to local network services that would normally be inaccessible from the Internet.

13.1.3. Denial of Service

Parties with connected to the LAN may attempt to deny access to OSP agents. For example, an attacker my attempt to open a large number of QUIC connections to an agent in an attempt to block legitimate connections or exhaust the agent’s system resources. They may also multicast spurious DNS-SD records in an attempt to exhaust the cache capacity for mDNS listeners, or to get listeners to open a large number of bogus QUIC connections.

13.1.4. Same-Origin Policy Violations

The Presentation API allows cross-origin communication between controlling pages and presentations with the consent of each origin (through their use of the API). This is similar to cross-origin communication via postMessage() with a targetOrigin of *. However, the Presentation API does not convey source origin information with each message. Therefore, the Open Screen Protocol does not convey origin information between its agents.

The presentation identifier carries some protection against unrestricted cross-origin access; but, rigorous authentication of the parties connected by a PresentationConnection must be done at the application level.

13.2. Open Screen Protocol Security and Privacy Considerations

13.2.1. Personally Identifiable Information & High-Value Data

The following data exchanged by the protocol can be personally identifiable and/or high value data:

Presentation URLs and availability results
Presentation identifiers
Presentation connection IDs
Presentation connection messages
Remote playback URLs
Remote playback commands and status messages

Presentation identifiers are considered high value data because they can be used in conjunction with a Presentation URL to connect to a running presentation.

Presentation display names, model names, and capabilities, while not considered personally identifiable, are important to protect to prevent an attacker from changing them or substituting other values during the discovery and authentication process.

The following data cannot be reasonably made confidential and should be considered public:

IP addresses and ports used by the Open Screen Protocol.
Data advertised through mDNS, including the display name prefix, the certificate fingerprint, and the metadata version.
Data provided by an agent through agent-info, including its display name, its device model name, its capabilities, and its preferred locales.

13.2.2. Cross Origin State Considerations

Access to origin state across browsing sessions is possible through the Presentation API by reconnecting to a presentation that was started by a previous session. This scenario is addressed in Presentation API § 7.2 Cross-origin access.

Receiver availability is available cross-origin depending on the user’s network context. Exposure of this data to the Web is also discussed in Presentation API § 7.1 Personally identifiable information and Remote Playback API § 6.1 Personally identifiable information.

13.2.3. Origin Access to Other Devices

By design, the Open Screen Protocol allows access to receivers from the Web. By implementing the protocol, these devices are knowingly making themselves available to the Web and should be designed accordingly.

Below, we discuss mitigation steps to prevent malicious use of these devices.

13.2.4. Private Browsing Mode

The Open Screen Protocol itself does not distinguish between the user agent’s normal browsing and private browsing modes.

However, it’s recommended that user agents use separate authentication contexts (see § 6 Authentication) and QUIC connections (see § 4 Transport and metadata discovery with QUIC) for normal and private browsing from the same user agent instance. This makes it more difficult for OSP agents to match activities occurring in normal and private browsing by the same user.

13.2.5. Persistent State

An agent is likely to persist the identity of agents that have successfully completed § 6 Authentication. This may include the public key fingerprints, metadata versions, and metadata for those parties.

However, this data is not normally exposed to the Web, only through the native UI of the user agent during the display selection or display authentication process. It can be an implementation choice whether the user agent clears or retains this data when the user clears browsing data.

Fate of metadata / authentication history when clearing browsing data. [Issue #132]

13.2.6. Other Considerations

The Open Screen Protocol does not grant to the Web additional access to the following:

New script loading mechanisms
Access to the user’s location
Access to device sensors
Access to the user’s local computing environment
Control over the user agent’s native UI
Security characteristics of the user agent

13.3. Presentation API Considerations

Presentation API § 7 Security and privacy considerations place these requirements on the Open Screen Protocol:

Presentation URLs and presentation identifiers should remain private among the parties that are allowed to connect to a presentation, per the cross-origin access guidelines.
Controllers and receivers should be notified when connections representing multiple user agent profiles have been made to a presentation, per the user interface guidelines.
Messaging between controllers and receivers should be authenticated and confidential, per the guidelines for messaging between presentation connections.

The Open Screen Protocol addresses these considerations by:

Requiring mutual authentication and a TLS-secured QUIC connection before presentation URLs, IDs, or messages are exchanged.
Adding explicit messages and connection IDs for individual PresentationConnections so that agents can track the number of active connections.

13.4. Remote Playback API Considerations

The Remote Playback API § 6 Security and privacy considerations also state that messaging between controllers and receivers should also be authenticated and confidential.

This consideration is handled by requiring mutual authentication and a TLS-secured QUIC connection before any remote playback related messages are exchanged.

13.5. Mitigation Strategies

13.5.1. Local passive network attackers

Local passive attackers may attempt to harvest data about user activities and device capabilities using the Open Screen Protocol. The main strategy to address this is data minimization, by only exposing opaque public key fingerprints before user-mediated authentication takes place.

Passive attackers may also attempt timing attacks to learn the cryptographic parameters of the TLS 1.3 QUIC connection. The application profile for TLS 1.3 mandates constant-time ciphers and TLS 1.3 implementations should use elliptic curve signing operations that are resistant to side channel attacks.

13.5.2. Local active network attackers

Local active attackers may attempt to impersonate a presentation display the user would normally trust. The § 6 Authentication step of the Open Screen Protocol prevents a man-in-the-middle from impersonating an agent, without knowledge of a shared secret. However, it is possible for an attacker to impersonate an existing, trusted agent or a newly discovered agent that is not yet authenticated and try to convince the user to authenticate to it. (Trust in this context means that a user has completed § 6 Authentication from their agent to another agent.)

This can be addressed through a combination of techniques. The first is detecting attempts at impersonation. Agents should detect the following situations and flag an agent that meets any of the criteria as a suspicious agent:

Agents with distinct IP endpoints whose public key fingerprints collide during concurrent advertisement.
Untrusted agents whose display name differs from the one previously advertised under a given public key fingerprint.
Untrusted agents that fail the authentication challenge a certain number of times.
Untrusted agents that advertise a display name that is similar to that from an already-trusted agent.
Already-trusted agents whose metadata provided through the agent-info message has changed.

The second is through management of the low-entropy secret during mutual authentication:

Rotate the low-entropy secret to prevent brute force attacks.
Use an increasing backoff to respond to authentication challenges, also to prevent brute force attacks.
Use a cryptographically sound source of entropy to generate the shared secret.

The active attacker may also attempt to disrupt data exchanged over the QUIC connection by injecting or modifying traffic. These attacks should be mitigated by a correct implementation of TLS 1.3. See Appendix E of [RFC8446] for a detailed security analysis of the TLS 1.3 protocol.

13.5.3. Remote active network attackers

Unfortunately, we cannot rely on network devices to fully protect OSP agents, because a misconfigured firewall or NAT could expose a LAN-connected agent to the broader Internet. OSP agents should be secure against attack from any Internet host.

Advertising agents must set the at field in their mDNS TXT record to protect themselves from off-LAN attempts to initiate § 6 Authentication, which result in user annoyance (display or input of PSK) and potential brute force attacks against the PSK.

13.5.4. Denial of service

It will be difficult to completely prevent denial service of attacks that originate on the user’s local area network. OSP agents can refuse new connections, close connections that receive too many messages, or limit the number of mDNS records cached from a specific responder in an attempt to allow existing activities to continue in spite of such an attack.

13.5.5. Malicious input

OSP agents should be robust against malicious input that attempts to compromise the target device by exploiting parsing vulnerabilities.

CBOR is intended to be less vulnerable to such attacks relative to alternatives like JSON and XML. Still, agents should be thoroughly tested using approaches like fuzz testing.

Where possible, OSP agents (including the content rendering components) should use defense-in-depth techniques like sandboxing to prevent vulnerabilities from gaining access to user data or leading to persistent exploits.

13.6. User Interface Considerations

This specification does not make any specific requirements of the security relevant user interfaces of OSP agents. However there are important considerations when designing these user interfaces, as PSK-based authentication requires users to make informed decisions about which agents to trust.

Before an agent has authenticated another device, the agent should make it clear that any agent-info or other data from that device has not been verified by authentication. (See below for how this applies to DNS-SD Instance Names.)
A suspicious agent should be displayed differently from trusted agents that are not suspicious, or not displayed at all.
The user interface to present a PSK during authentication should be done in trusted UI and be difficult to spoof. It should be clear to the user which physical device is presenting the PSK.
The user interface to input a PSK during authentication should be done in trusted UI and be difficult to spoof.
The user should be required to take action to input the PSK, to prevent the user from blindly clicking through this step.
The user interfaces to render and input a PSK should meet accessibility guidelines.

13.6.1. Instance and Display Names

Because DNS-SD Instance Names are the primary information that the user sees prior to authentication, careful presentation of these names is necessary.

Agents must treat Instance Names as unverified information, and should check that the Instance Name is a prefix of the display name received through the agent-info message after a successful QUIC connection. Once an agent has done this check, it can show the name as a verified display name.

Agents should show only complete display names to the user, instead of truncated display names from DNS-SD. A truncated display name should be verified as above before being shown in full as a verified display name.

This means there are three categories of display names that agents should be capable of handling:

Truncated and unverified DNS-SD Instance Names, which should not be shown to the user.
Complete but unverified DNS-SD Instance Names, which can be shown as unverified prior to § 6 Authentication.
Verified display names.

Appendix A: Messages

The following messages are defined using the Concise Data Definition Language syntax. When integer keys are used, a comment is appended to the line to indicate the name of the field. Object definitions in this specification have this unusual syntax to reduce the number of bytes-on-the-wire, while maintaining a human-readable name for each key. Integer keys are used instead of object arrays to allow for easy indexing of optional fields.

Each root message (one that can be put into a QUIC stream without being enclosed by another message) has a comment indicating the message type key.

Smaller numbers should be reserved for message that will be sent more frequently or are very small or both and larger numbers should be reserved for messages that are infrequently sent or large or both because smaller type keys encode on the wire smaller.

; type key 10
agent-info-request = {
  request
}

; type key 11
agent-info-response = {
  response
  1: agent-info ; agent-info
}

; type key 120
agent-info-event = {
  0: agent-info ; agent-info
}

agent-capability = &(
  receive-audio: 1
  receive-video: 2
  receive-presentation: 3
  control-presentation: 4
  receive-remote-playback: 5
  control-remote-playback: 6
  receive-streaming: 7
  send-streaming: 8
)

agent-info = {
  0: text ; display-name
  1: text ; model-name
  2: [* agent-capability] ; capabilities
  3: text ; state-token
  4: [* text] ; locales
}

; type key 12
agent-status-request = {
  request
  ? 1: status ; status
}

; type key 13
agent-status-response = {
  response
  ? 1: status ; status
}

status = {
  0: text ; status
}

request = (
  0: request-id ; request-id
)

response = (
  0: request-id ; request-id
)

request-id = uint

microseconds = uint

epoch-time = int

media-timeline = float64

media-timeline-range = [
  start: media-timeline
  end: media-timeline
]

; type key 1001
auth-capabilities = {
  0: uint ; psk-ease-of-input
  1: [* psk-input-method] ; psk-input-methods
  2: uint ; psk-min-bits-of-entropy
}

psk-input-method = &(
  numeric: 0
  qr-code: 1
)

auth-initiation-token = {
  ? 0: text ; token
}

auth-spake2-psk-status = &(
  psk-needs-presentation: 0
  psk-shown: 1
  psk-input: 2
)

; type key 1003
auth-spake2-confirmation = {
  0: bytes .size 64 ; confirmation-value
}

auth-status-result = &(
  authenticated: 0
  unknown-error: 1
  timeout: 2
  secret-unknown: 3
  validation-took-too-long : 4
  proof-invalid: 5
)

; type key 1004
auth-status = {
  0: auth-status-result ; result
}

; type key 1005
auth-spake2-handshake = {
  0: auth-initiation-token; initiation-token
  1: auth-spake2-psk-status ; psk-status
  2: bytes ; public-value
}

watch-id = uint

; type key 14
presentation-url-availability-request = {
  request
  1: [1* text] ; urls
  2: microseconds ; watch-duration
  3: watch-id ; watch-id
}

; type key 15
presentation-url-availability-response = {
  response
  1: [1* url-availability] ; url-availabilities
}

; type key 103
presentation-url-availability-event = {
  0: watch-id ; watch-id
  1: [1* url-availability] ; url-availabilities
}

; idea: use HTTP response codes?
url-availability = &(
  available: 0
  unavailable: 1
  invalid: 10
)

; type key 104
presentation-start-request = {
  request
  1: text ; presentation-id
  2: text ; url
  3: [* http-header] ; headers
}

http-header = [
  key: text
  value: text
]

; type key 105
presentation-start-response = {
  response
  1: &result ; result
  2: uint ; connection-id
  ? 3: uint ; http-response-code
}

presentation-termination-source = &(
  controller: 1
  receiver: 2
  unknown: 255
)

presentation-termination-reason = &(
  application-request: 1
  user-request: 2
  receiver-replaced-presentation: 20
  receiver-idle-too-long: 30
  receiver-attempted-to-navigate: 31
  receiver-powering-down: 100
  receiver-error: 101
  unknown: 255
)

; type key 106
presentation-termination-request = {
  request
  1: text ; presentation-id
  2: presentation-termination-reason ; reason
}

; type key 107
presentation-termination-response = {
  response
  1: &result ; result
}

; type key 108
presentation-termination-event = {
  0: text ; presentation-id
  1: presentation-termination-source ; source
  2: presentation-termination-reason ; reason
}

; type key 109
presentation-connection-open-request = {
  request
  1: text ; presentation-id
  2: text ; url
}

; type key 110
presentation-connection-open-response = {
  response
  1: &result ; result
  2: uint ; connection-id
  3: uint ; connection-count
}

; type key 113
presentation-connection-close-event = {
  0: uint ; connection-id
  1: &(
    close-method-called: 1
    connection-object-discarded: 10
    unrecoverable-error-while-sending-or-receiving-message: 100
  ) ; reason
  ? 2: text ; error-message
  3: uint ; connection-count
}

; type key 121
presentation-change-event = {
  0: text ; presentation-id
  1: uint ; connection-count
}

; type key 16
presentation-connection-message = {
  0: uint ; connection-id
  1: bytes / text ; message
}

result = (
  success: 1
  invalid-url: 10
  invalid-presentation-id: 11
  timeout: 100
  transient-error: 101
  permanent-error: 102
  terminating: 103
  unknown-error: 199
)

; type key 17
remote-playback-availability-request = {
  request
  1: [* remote-playback-source] ; sources
  2: microseconds ; watch-duration
  3: watch-id ; watch-id
}

; type key 18
remote-playback-availability-response = {
  response
  1: [* url-availability] ; url-availabilities
}

; type key 114
remote-playback-availability-event = {
  0: watch-id ; watch-id
  1: [* url-availability] ; url-availabilities
}

; type key 115
remote-playback-start-request = {
  request
  1: remote-playback-id ; remote-playback-id
  ? 2: [* remote-playback-source] ; sources
  ? 3: [* text] ; text-track-urls
  ? 4: [* http-header] ; headers
  ? 5: remote-playback-controls ; controls
  ? 6: {streaming-session-start-request-params} ; remoting
}

remote-playback-source = {
  0: text; url
  1: text; extended-mime-type
}

; type key 116
remote-playback-start-response = {
  response
  ? 1: remote-playback-state ; state
  ? 2: {streaming-session-start-response-params} ; remoting
}

; type key 117
remote-playback-termination-request = {
  request
  1: remote-playback-id ; remote-playback-id
  2: &(
    user-terminated-via-controller: 11
    unknown: 255
  ) ; reason
}

; type key 118
remote-playback-termination-response = {
  response
  1: &result ; result
}

; type key 119
remote-playback-termination-event = {
  0: remote-playback-id ; remote-playback-id
  1: &(
    receiver-called-terminate: 1
    user-terminated-via-receiver: 2
    receiver-idle-too-long: 30
    receiver-powering-down: 100
    receiver-crashed: 101
    unknown: 255
  ) ; reason
}

; type key 19
remote-playback-modify-request = {
  request
  1: remote-playback-id ; remote-playback-id
  2: remote-playback-controls ; controls
}

; type key 20
remote-playback-modify-response = {
  response
  1: &result ; result
  ? 2: remote-playback-state ; state
}

; type key 21
remote-playback-state-event = {
  0: remote-playback-id ; remote-playback-id
  1: remote-playback-state ; state
}

remote-playback-id = uint

remote-playback-controls = {
  ? 0: remote-playback-source ; source
  ? 1: &(
    none: 0
    metadata: 1
    auto: 2
  ) ; preload
  ? 2: bool ; loop
  ? 3: bool ; paused
  ? 4: bool ; muted
  ? 5: float64 ; volume
  ? 6: media-timeline ; seek
  ? 7: media-timeline ; fast-seek
  ? 8: float64 ; playback-rate
  ? 9: text ; poster
  ? 10: [* text] ; enabled-audio-track-ids
  ? 11: text ; selected-video-track-id
  ? 12: [* added-text-track] ; added-text-tracks
  ? 13: [* changed-text-track] ; changed-text-tracks
}

remote-playback-state = {
  ? 0: {
    0: bool ; rate
    1: bool ; preload
    2: bool ; poster
    3: bool ; added-text-track
    4: bool ; added-cues
  } ; supports
  ? 1: remote-playback-source ; source
  ? 2: &(
    empty: 0
    idle: 1
    loading: 2
    no-source: 3
  ) ; loading
  ? 3: &(
    nothing: 0
    metadata: 1
    current: 2
    future: 3
    enough: 4
  ) ; loaded
  ? 4: media-error ; error
  ? 5: epoch-time / null ; epoch
  ? 6: media-timeline / null ; duration
  ? 7: [* media-timeline-range] ; buffered-time-ranges
  ? 8: [* media-timeline-range] ; seekable-time-ranges
  ? 9: [* media-timeline-range] ; played-time-ranges
  ? 10: media-timeline ; position
  ? 11: float64 ; playbackRate
  ? 12: bool ; paused
  ? 13: bool ; seeking
  ? 14: bool ; stalled
  ? 15: bool ; ended
  ? 16: float64 ; volume
  ? 17: bool ; muted
  ? 18: video-resolution / null ; resolution
  ? 19: [* audio-track-state] ; audio-tracks
  ? 20: [* video-track-state] ; video-tracks
  ? 21: [* text-track-state] ; text-tracks
}

added-text-track = {
  0: &(
    subtitles: 1
    captions: 2
    descriptions: 3
    chapters: 4
    metadata: 5
  ) ; kind
  ? 1: text ; label
  ? 2: text ; language
}

changed-text-track = {
  0: text ; id
  1: text-track-mode ; mode
  ? 2: [* text-track-cue] ; added-cues
  ? 3: [* text] ; removed-cue-ids
}

text-track-mode = &(
  disabled: 1
  showing: 2
  hidden: 3
)

text-track-cue = {
  0: text ; id
  1: media-timeline-range ; range
  2: text ; text
}

media-sync-time = [
  value: uint
  scale: uint
]

media-error = [
  code: &(
    user-aborted: 1
    network-error: 2
    decode-error: 3
    source-not-supported: 4
    unknown-error: 5
  )
  message: text
]

track-state = (
  0: text ; id
  1: text ; label
  2: text ; language
)

audio-track-state = {
  track-state
  3: bool ; enabled
}

video-track-state = {
  track-state
  3: bool ; selected
}

text-track-state = {
  track-state
  3: text-track-mode ; mode
}

; type key 22
audio-frame = [
  encoding-id: uint
  start-time: uint
  payload: bytes
  ? optional: {
    ? 0: uint ; duration
    ? 1: media-sync-time ; sync-time
  }
]

; type key 23
video-frame = {
  0: uint ; encoding-id
  1: uint ; sequence-number
  ? 2: [* int] ; depends-on
  3: uint ; start-time
  ? 4: uint ; duration
  5: bytes ; payload
  ? 6: uint ; video-rotation
  ? 7: media-sync-time ; sync-time
}

; type key 24
data-frame = {
  0: uint ; encoding-id
  ? 1: uint ; sequence-number
  ? 2: uint ; start-time
  ? 3: uint ; duration
  4: bytes ; payload
  ? 5: media-sync-time ; sync-time
}

ratio = [
  antecedent: uint
  consequent: uint
]

; type key 122
streaming-capabilities-request = {
  request
}

; type key 123
streaming-capabilities-response = {
  response
  1: streaming-capabilities ; streaming-capabilities
}

streaming-capabilities = {
  0: [* receive-audio-capability] ; receive-audio
  1: [* receive-video-capability] ; receive-video
  2: [* receive-data-capability] ; receive-data
}

format = {
  0: text ; codec-name
}

receive-audio-capability = {
  0: format ; codec
  ? 1: uint ; max-audio-channels
  ? 2: uint ; min-bit-rate
}

video-resolution = {
  0: uint ; height
  1: uint ; width
}

video-hdr-format = {
  0: text; transfer-function
  ? 1: text; hdr-metadata
}

receive-video-capability = {
  0: format ; codec
  ? 1: video-resolution ; max-resolution
  ? 2: ratio ; max-frames-per-second
  ? 3: uint ; max-pixels-per-second
  ? 4: uint ; min-bit-rate
  ? 5: ratio ; aspect-ratio
  ? 6: text ; color-gamut
  ? 7: [* video-resolution] ; native-resolutions
  ? 8: bool ; supports-scaling
  ? 9: bool ; supports-rotation
  ? 10: [* video-hdr-format] ; hdr-formats
}

receive-data-capability = {
  0: format ; data-type
}

; type key 124
streaming-session-start-request = {
  request
  streaming-session-start-request-params
}

; type key 125
streaming-session-start-response = {
  response
  streaming-session-start-response-params
}

; A separate group so it can be used in remote-playback-start-request
streaming-session-start-request-params = (
  1: uint ; streaming-session-id
  2: [* media-stream-offer] ;  stream-offers
  3: microseconds ; desired-stats-interval
)

; type key 126
streaming-session-modify-request = {
  request
  streaming-session-modify-request-params
}

; A separate group so it can be used in remote-playback-start-response
streaming-session-start-response-params = (
  1: &result ; result
  2: [* media-stream-request] ; stream-requests
  3: microseconds ; desired-stats-interval  
)

streaming-session-modify-request-params = (
  1: uint ; streaming-session-id
  2: [* media-stream-request] ; stream-requests
)

; type key 127
streaming-session-modify-response = {
  response
  1: &result ; result
}

; type key 128
streaming-session-terminate-request = {
  request
  1: uint ; streaming-session-id
}

; type key 129
streaming-session-terminate-response = {
  response
}

; type key 130
streaming-session-terminate-event = {
  0: uint ; streaming-session-id
}

media-stream-offer = {
  0: uint ; media-stream-id
  ? 1: text ; display-name
  ? 2: [1* audio-encoding-offer] ; audio
  ? 3: [1* video-encoding-offer] ; video
  ? 4: [1* data-encoding-offer] ; data
}

media-stream-request = {
  0: uint ; media-stream-id
  ? 1: audio-encoding-request ; audio
  ? 2: video-encoding-request ; video
  ? 3: data-encoding-request ; data
}

audio-encoding-offer = {
  0: uint ; encoding-id
  1: text ; codec-name
  2: uint ; time-scale
  ? 3: uint ; default-duration
}

video-encoding-offer = {
  0: uint ; encoding-id
  1: text ; codec-name
  2: uint ; time-scale
  ? 3: uint ; default-duration
  ? 4: video-rotation ; default-rotation
}

data-encoding-offer = {
  0: uint ; encoding-id
  1: text ; data-type-name
  2: uint ; time-scale
  ? 3: uint ; default-duration
}

audio-encoding-request = {
  0: uint ; encoding-id
}

video-encoding-request = {
  0: uint ; encoding-id
  ? 1: video-resolution ; target-resolution
  ? 2: ratio ; max-frames-per-second
}

data-encoding-request = {
  0: uint ; encoding-id
}

video-rotation = &(
  ; Degrees clockwise
  video-rotation-0: 0
  video-rotation-90: 1
  video-rotation-180: 2
  video-rotation-270: 3
)

sender-stats-audio = {
  0: uint ; encoding-id
  ? 1: uint ; cumulative-sent-frames
  ? 2: microseconds ; cumulative-encode-delay
}

sender-stats-video = {
  0: uint ; encoding-id
  ? 1: microseconds ; cumulative-sent-duration
  ? 2: microseconds ; cumulative-encode-delay
  ? 3: uint ; cumulative-dropped-frames
}

; type key 131
streaming-session-sender-stats-event = {
  0: uint; streaming-session-id
  1: microseconds ; system-time
  ? 2: [1* sender-stats-audio] ; audio
  ? 3: [1* sender-stats-video] ; video
}

streaming-buffer-status = &(
  enough-data: 0
  insufficient-data: 1
  too-much-data: 2
)

receiver-stats-audio = {
  0: uint ; encoding-id
  ? 1: microseconds ; cumulative-received-duration
  ? 2: microseconds ; cumulative-lost-duration
  ? 3: microseconds ; cumulative-buffer-delay
  ? 4: microseconds ; cumulative-decode-delay
  ? 5: streaming-buffer-status ; remote-buffer-status
}

receiver-stats-video = {
  0: uint ; encoding-id
  ? 1: uint ; cumulative-decoded-frames
  ? 2: uint ; cumulative-lost-frames
  ? 3: microseconds ; cumulative-buffer-delay
  ? 4: microseconds ; cumulative-decode-delay
  ? 5: streaming-buffer-status ; remote-buffer-status
}

; type key 132
streaming-session-receiver-stats-event = {
  0: uint; streaming-session-id
  1: microseconds ; system-time
  ? 2: [1* receiver-stats-audio] ; audio
  ? 3: [1* receiver-stats-video] ; video
}

    

Appendix B: Message Type Key Ranges

The following appendix describes how the range of message type keys is divided. Legal values are 1 to 2⁶⁴.

Each type key is encoded as a variable-length integer on the wire of 1, 2, 4 or 8 bytes. For each wire byte size, 1/4 to 1/2 of the keys are available for extensions.

Bytes	Range	Purpose
1	1 - 48	Open Screen Protocol
1	49 - 63	Available for extensions
2	64 - 8,192	Open Screen Protocol
2	8,193 - 16,383	Available for extensions
4	16,384 - 2²⁹	Reserved for future use
4	2²⁹+1 - 2³⁰-1	Available for extensions
8	>= 2³⁰	Reserved for future use

Appendix C: PSK Encoding Schemes

The following appendix describes two encoding schemes for PSKs that take a value P between 20 bits and 80 bits in length and produce either a string or a QR code for display to the user.

Agents should use these encoding schemes to maximize the interoperability of the authentication step, which typically requires displaying the PSK on one device and the user inputting it on another device.

Base-10 Numeric

To encode P into a numeric string, follow these steps:

Convert P to a base-10 integer N.
If N has fewer than 9 digits:
- Zero-pad N on the left with 3 - len(N) mod 3 digits.
- Output N in groups of three digits separated by dashes.
If N has more than 9 digits:
- Zero-pad N on the left with 4 - len(N) mod 4 digits.
- Output N in groups of four digits separated by dashes.

For PSK 61488548833, the steps would produce the string 0614-8854-8833.

To decode a string N into a PSK P, follow these steps:

Remove dashes and leading zeros from N.
Parse N as a base-10 decimal number to obtain P.

Note: P values between approximately 2^30 and 2^40 will produce values between 10 and 12 digits in length. Values over 12 digits are inconvenient to input and have limited additional security value.

Note: We do not allow the use of hexadecimal encoding here, because it would be ambiguous with base-10 numeric encodings, and not all devices may support alphanumeric input.

QR Code

To encode a PSK into a QR code, follow these steps:

Set N to the value of P converted to an ASCII-encoded, hexadecimal string.
Construct a text QR code with the value of N.

For PSK 61488548833, the steps would produce the following QR code:

To decode a PSK P given a QR code, follow these steps:

Obtain the string N by decoding the QR code.
Parse N as a hexadecimal number to obtain P.

Appendix D: Entire Flow Chart

This section is non-normative.

Appendix E: Media Time Conversions

To convert between a media synchronization timestamp for a given audio or video frame and a media timeline value, the following formula can be used:

media-timeline-value = media-zero-time + (value / scale)

Where:

media-zero-time is the origin of the media timeline as defined in HTML, converted to an IEEE-754 double precision floating point number [IEEE-754].
value and scale are the values passed in the sync-time field of the corresponding audio-frame or video-frame.
value / scale should be computed with double floating point precision.
media-timeline-value is an IEEE-754 double precision floating point number [IEEE-754].

In the event of an overflow in media-timeline-value, the maximum representable value should be used.

Open Screen Protocol

Abstract

Status of this document

1. Introduction

1.1. Terminology

2. Requirements

2.1. General Requirements

2.2. Presentation API Requirements

2.3. Remote Playback API Requirements

2.4. Non-Functional Requirements

3. Discovery with mDNS

3.1. Computing the Agent Fingerprint

4. Transport and metadata discovery with QUIC

4.1. TLS 1.3

4.2. Agent Certificates

4.3. Metadata Discovery

5. Messages delivery using CBOR and QUIC streams

5.1. Type Key Backwards Compatibility

6. Authentication

6.1. Authentication with SPAKE2

7. Presentation Protocol

7.1. Presentation API

8. Representation Of Time

9. Remote Playback Protocol

9.1. Remote Playback State and Controls

9.2. Remote Playback API

10. Streaming Protocol

10.1. Streaming Protocol Capabilities

10.2. Sessions

10.3. Audio

10.4. Video

10.5. Data

10.6. Feedback

10.7. Stats

11. Requests, Responses, and Watches

12. Protocol Extensions

12.1. Protocol Extension Fields

13. Security and Privacy

13.1. Threat Models

13.1.1. Passive Network Attackers

13.1.2. Active Network Attackers

13.1.3. Denial of Service

13.1.4. Same-Origin Policy Violations

13.2. Open Screen Protocol Security and Privacy Considerations

13.2.1. Personally Identifiable Information & High-Value Data

13.2.2. Cross Origin State Considerations

13.2.3. Origin Access to Other Devices

13.2.4. Private Browsing Mode

13.2.5. Persistent State

13.2.6. Other Considerations

13.3. Presentation API Considerations

13.4. Remote Playback API Considerations

13.5. Mitigation Strategies

13.5.1. Local passive network attackers

13.5.2. Local active network attackers

13.5.3. Remote active network attackers

13.5.4. Denial of service

13.5.5. Malicious input

13.6. User Interface Considerations

13.6.1. Instance and Display Names

Appendix A: Messages

Appendix B: Message Type Key Ranges

Appendix C: PSK Encoding Schemes

Base-10 Numeric

QR Code

Appendix D: Entire Flow Chart

Appendix E: Media Time Conversions

Conformance

Document conventions

Conformant Algorithms

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

Informative References

Issues Index