WebDriver BiDi

Editor’s Draft,

This version:
https://w3c.github.io/webdriver-bidi/
Issue Tracking:
GitHub
Inline In Spec

Abstract

This document defines the BiDirectional WebDriver Protocol, a mechanism for remote control of user agents.

Status of this document

This is a public copy of the editors’ draft. It is provided for discussion only and may change at any moment. Its publication here does not imply endorsement of its contents by W3C. Don’t cite this document other than as work in progress.

GitHub Issues are preferred for discussion of this specification.

This document was produced by the Browser Testing and Tools Working Group.

This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 15 September 2020 W3C Process Document.

1. Introduction

This section is non-normative.

WebDriver defines a protocol for introspection and remote control of user agents. This specification extends WebDriver by introducing bidirectional communication. In place of the strict command/response format of WebDriver, this permits events to stream from the user agent to the controlling software, better matching the evented nature of the browser DOM.

2. Infrastructure

This specification depends on the Infra Standard. [INFRA]

Network protocol messages are defined using CDDL. [RFC8610]

3. Protocol

This section defines the basic concepts of the WebDriver BiDi protocol. These terms are distinct from their representation at the transport layer.

The protocol is defined using a CDDL definition. For the convenience of implementors two seperate CDDL definitions are defined; the remote end definition which defines the format of messages produced on the local end and consumed on the remote end, and the local end definition which defines the format of messages produced on the remote end and consumed on the local end

3.1. Definition

Should this be an appendix?

This section gives the initial contents of the remote end definition and local end definition. These are augmented by the definition fragments defined in the remainder of the specification.

Remote end definition

Command = {
  id: uint,
  CommandData,
  *text => any,
}

CommandData = (
  SessionCommand //
  BrowsingContextCommand
)

EmptyParams = { *text }

Local end definition

Message = (
  CommandResponse //
  ErrorResponse //
  Event
)

CommandResponse = {
  id: uint,
  result: ResultData,
  *text => any
}

ErrorResponse = {
  id: uint / null,
  error: "unknown error" / "unknown method" / "invalid argument",
  message: text,
  ?stacktrace: text,
  *text => any
}

ResultData = (
  EmptyResult //
  SessionResult //
  BrowsingContextResult //
  ScriptResult
)

EmptyResult = {}

Event = {
  EventData,
  *text => any
}

EventData = (
  BrowsingContextEvent //
  ScriptEvent
)

3.2. Session

WebDriver BiDi uses the same session concept as WebDriver.

3.3. Modules

The WebDriver BiDi protocol is organized into modules.

Each module represents a collection of related commands and events pertaining to a certain aspect of the user agent. For example, a module might contain functionality for inspecting and manipulating the DOM, or for script execution.

Each module has a module name which is a string. The command name and event name for commands and events defined in the module start with the module name followed by a period ".".

Modules which contain commands define remote end definition fragments. These provide choices in the CommandData group for the module’s commands, and can also define additional definition properties. They can also define local end definition fragments that provide additional choices in the ResultData group for the results of commands in the module.

Modules which contain events define local end definition fragments that are choices in the Event group for the module’s events.

An implementation may define extension modules. These must have a module name that contains a single colon ":" character. The part before the colon is the prefix; this is typically the same for all extension modules specific to a given implementation and should be unique for a given implementation. Such modules extend the local end definition and remote end definition providing additional groups as choices for the defined commands and events.

3.4. Commands

A command is an asynchronous operation, requested by the local end and run on the remote end, resulting in either a result or an error being returned to the local end. Multiple commands can run at the same time, and commands can potentially be long-running. As a consequence, commands can finish out-of-order.

Each command is defined by:

When commands are send from the local end they have a command id. This is an identifier used by the local end to identify the response from a particular command. From the point of view of the remote end this identifier is opaque and cannot be used internally to identify the command.

Note: This is because the command id is entirely controlled by the local end and isn’t necessarily unique over the course of a session. For example a local end which ignores all responses could use the same command id for each command.

The set of all command names is a set containing all the defined command names, including any belonging to extension modules.

3.5. Events

An event is a notification, sent by the remote end to the local end, signaling that something of interest has occurred on the remote end.

A session has a global event set which is a set containing the event names for events that are enabled for all browsing contexts. This initially contains the event name for events that are in the default event set.

A session has a browsing context event map, which is a map with browsing context keys and values that are maps from an event name to a boolean indicating whether the specified event is enabled or disabled for a given browsing context.

To determine if an event is enabled given session, event name and browsing contexts:

Note: browsing contexts is a set because a shared worker can be associated with multiple contexts.

  1. For each browsing context in browsing contexts:

    1. While browsing context is not null:

      1. Let event map be the browsing context event map for session.

      2. If event map contains browsing context, let browsing context events be event map[browsing context]. Otherwise let browsing context events be null.

      3. If browsing context events is not null, and browsing context events contains for event name return browsing context events[event name].

      4. Let browsing context be the parent browsing context of browsing context, if it has one, or null otherwise.

  2. If the global event set for session contains event name return true.

  3. Return false.

To obtain a set of event names given an name:

  1. Let events be an empty set.

  2. If name contains a U+002E (period):

    1. If name is the event name for an event, append name to events and return success with data events.

    2. Return an error with error code Invalid Argument

  3. Otherwise name is interpreted as representing all the events in a module. If name is not a module name return an error with error code Invalid Argument.

  4. Append the event name for each event in the module with name name to events.

  5. Return success with data events.

4. Transport

Message transport is provided using the WebSocket protocol. [RFC6455]

Note: In the terms of the WebSocket protocol, the local end is the client and the remote end is the server / remote host.

Note: The encoding of commands and events as messages is similar to JSON-RPC, but this specification does not normatively reference it. [JSON-RPC] The normative requirements on remote ends are instead given as a precise processing model, while no normative requirements are given for local ends.

A WebSocket listener is a network endpoint that is able to accept incoming WebSocket connections.

A WebSocket listener has a host, a port, a secure flag, and a list of WebSocket resources.

When a WebSocket listener listener is created, a remote end must start to listen for WebSocket connections on the host and port given by listener’s host and port. If listener’s secure flag is set, then connections established from listener must be TLS encrypted.

A remote end has a set of WebSocket listeners active listeners, which is initially empty.

A WebDriver session has a WebSocket connection which is a network connection that follows the requirements of the WebSocket protocol. This is initially null.

When a client establishes a WebSocket connection connection by connecting to one of the set of active listeners listener, the implementation must proceed according to the WebSocket server-side requirements, with the following steps run when deciding whether to accept the incoming connection:

  1. Let resource name be the resource name from reading the client’s opening handshake. If resource name is not in listener’s list of WebSocket resources, then stop running these steps and act as if the requested service is not available.

  2. Get a session ID for a WebSocket resource with resource name and let session id be that value. If session id is null then stop running these steps and act as if the requested service is not available.

  3. If there is a session in the list of active sessions with session id as its session ID then let session be that session. Otherwise stop running these steps and act as if the requested service is not available.

  4. Run any other implementation-defined steps to decide if the connection should be accepted, and if it is not stop running these steps and act as if the requested service is not available.

  5. Otherwise set session’s WebSocket connection to connection, and proceed with the WebSocket server-side requirements when a server chooses to accept an incoming connection.

Do we support > 1 connection for a single session?

When a WebSocket message has been received for a WebSocket connection connection with type type and data data, a remote end must handle an incoming message given connection, type and data.

When the WebSocket closing handshake is started or when the WebSocket connection is closed for a WebSocket connection connection, a remote end must handle a connection closing given connection.

Note: Both conditions are needed because it is possible for a WebSocket connection to be closed without a closing handshake.

To construct a WebSocket resource name given a session session:

  1. Return the result of concatenating the string "/session/" with session’s session ID.

To construct a WebSocket URL given a WebSocket listener listener and session session:

  1. Let resource name be the result of constructing a WebSocket resource name given session.

  2. Return a WebSocket URI constructed with host set to listener’s host, port set to listener’s port, path set to resource name, following the wss-URI construct if listener’s secure flag is set and the ws-URL construct otherwise.

To get a session ID for a WebSocket resource given resource name:

  1. If resource name doesn’t begin with the byte string "/session/", return null.

  2. Let session id be the bytes in resource name following the "/session/" prefix.

  3. If session id is not the string representation of a UUID, return null.

  4. Return session id.

To start listening for a WebSocket connection given a session session:
  1. If there is an existing WebSocket listener in the set of active listeners which the remote end would like to reuse, let listener be that listener. Otherwise let listener be a new WebSocket listener with implementation-defined host, port, secure flag, and an empty list of WebSocket resources.

  2. Let resource name be the result of constructing a WebSocket resource name given session.

  3. Append resource name to the list of WebSocket resources for listener.

  4. Append listener to the remote end's active listeners.

  5. Return listener.

Note: An intermediary node handling multiple sessions can use one or many WebSocket listeners. WebDriver defines that an endpoint node supports at most one session at a time, so it’s expected to only have a single listener.

Note: For an endpoint node the host in the above steps will typically be "localhost".

To handle an incoming message given a WebSocket connection connection, type type and data data:
  1. If type is not text, respond with an error given connection, null, and invalid argument, and finally return.

  2. Assert: data is a scalar value string, because the WebSocket handling errors in UTF-8-encoded data would already have failed the WebSocket connection otherwise.

    Nothing seems to define what status code is used for UTF-8 errors.

  3. Let parsed be the result of parsing JSON into Infra values given data. If this throws an exception, then respond with an error given connection, null, and invalid argument, and finally return.

  4. Match parsed against the remote end definition. If this results in a match:

    1. Let matched be the map representing the matched data.

    2. Assert: matched contains "id", "method", and "params".

    3. Let command id be matched["id"].

    4. Let method be matched["method"]

    5. Run the following steps in parallel:

      1. Let result be the result of running the remote end steps for the command with command name method given command parameters matched["params"]

      2. If result is an error, then respond with an error given connection, command id, and result’s error code, and finally return.

      3. Let value be result’s data.

      4. Assert: value matches the definition for the result type corresponding to the command with command name method.

      5. Let response be a new map matching the CommandResponse production in the local end definition with the id field set to command id and the value field set to value.

      6. Let serialized be the result of serialize an infra value to JSON bytes given response.

      7. Send a WebSocket message comprised of serialized over connection and return.

  5. Otherwise:

    1. Let command id be null.

    2. If parsed is a map and parsed["id"] exists and is an integer greater than or equal to zero, set command id to that integer.

    3. Let error code be invalid argument.

    4. If parsed is a map and parsed["method"] exists and is a string, but parsed["method"] is not in the set of all command names, set error code to unknown command.

    5. Respond with an error given connection, command id, and error code.

To emit an event given body and related contexts:
  1. Assert: body has size 2 and contains "method" and "params".

  2. If the current session is null, or the current session's WebSocket Connection is null then return.

  3. If event is enabled given current session, body["method"] and related contexts:

    1. Let connection be the current session's WebSocket connection.

    2. Let serialized be the result of serialize an infra value to JSON bytes given body.

    3. Send a WebSocket message comprised of serialized over connection.

To respond with an error given a WebSocket connection connection, command id, and error code:
  1. Let error data be a new map matching the ErrorResponse production in the local end definition, with the id field set to command id, the error field set to error code, the message field set to an implementation-defined string containing a human-readable definition of the error that occurred and the stacktrace field optionally set to an implementation-defined string containing a stack trace report of the active stack frames at the time when the error occurred.

  2. Let response be the result of serialize an infra value to JSON bytes given error data.

    Note: command id can be null, in which case the id field will also be set to null, not omitted from response.

  3. Send a WebSocket message comprised of response over connection.

To handle a connection closing given a WebSocket connection connection:
  1. If there is a WebDriver session with connection as its connection, set the connection on that session to null.

This should also reset any internal state

Note: This does not end any session.

Need to hook in to the session ending to allow the UA to close the listener if it wants.

4.1. Establishing a Connection

WebDriver clients opt in to a bidirectional connection by requesting a capability with the name "webSocketUrl" and value true.

This specification defines an additional webdriver capability with the capability name "webSocketUrl".

The additional capability deserialization algorithm for the "webSocketUrl" capability, with parameter value is:
  1. If value is not a boolean, return error with code invalid argument.

  2. Return success with data value.

The matched capability serialization algorithm for the "webSocketUrl" capability, with parameter value is:
  1. If value is false, return success with data null.

  2. Return success with data true.

The WebDriver new session algorithm defined by this specification, with parameters session and capabilities is:
  1. Let webSocketUrl be the result of getting a property named "webSocketUrl" from capabilities.

  2. If webSocketUrl is undefined, return.

  3. Assert: webSocketUrl is true.

  4. Let listener be the result of start listening for a WebSocket connection given session.

  5. Set webSocketUrl to the result of constructing a WebSocket URL given listener and session.

  6. Set a property on capabilities named "webSocketUrl" to webSocketUrl.

5. Common Data Types

5.1. Remote Value

Values accessible from the ECMAScript runtime are represented by a mirror object, specified as RemoteValue. The value’s type is specified in the type property. In the case of JSON-representable primitive values, this contains the value in the value property; in the case of non-JSON-representable primitives, the value property contains a string representation of the value. For non-primitive objects, the objectId property contains a string id that provides a unique handle to the object, valid for its lifetime inside the engine. For some non-primitive types, the value property contains a representation of the data in the ECMAScript object; for container types this can contain further RemoteValue instances. The value property can be null if there is a duplicate object i.e. the object has already been serialized in the current RemoteValue, perhaps as part of a cycle, or otherwise when the maximum serialization depth is reached.

Nodes are also represented by RemoteValue instances. These have a partial serialization of the node in the value property.

Note: mirror objects do not keep the original object alive in the runtime. If an object is discarded in the runtime subsequent attempts to access it via the protocol will result in an error.

A session has an object id map. This is a weak map from objects to their corresponding id.

Should this be explicitly per realm?

To get the object id for an object given an object:
  1. If the object id map for the current session does not contain object run the following steps:

    1. Let object id be a new, unique, string identifier for object. If object is an element this must be the web element reference for object; if it’s a WindowProxy object, this must be the window handle for object.

    2. Set the value of object in the object id map to object id.

  2. Return the result of getting the value for object in object id map.

remote end definition and local end definition

RemoteValue = {
  UndefinedValue //
  NullValue //
  StringValue //
  NumberValue //
  BooleanValue //
  BigIntValue //
  SymbolValue //
  ArrayValue //
  FunctionValue //
  ObjectValue //
  FunctionValue //
  RegExpValue //
  DateValue //
  MapValue //
  SetValue //
  WeakMapValue //
  WeakSetValue //
  IteratorValue //
  GeneratorValue //
  ErrorValue //
  ProxyValue //
  PromiseValue //
  TypedArrayValue //
  ArrayBufferValue //
  NodeValue //
  WindowProxyValue //
}

ObjectId = text;

ListValue = [*RemoteValue];

MappingValue = [*[(RemoteValue / text), RemoteValue]];

UndefinedValue = {
  type: "undefined",
}

NullValue = {
  type: "null",
}

StringValue = {
  type: "string",
  value: text,
}

SpecialNumber = "NaN" / "-0" / "+Infinity" / "-Infinity";

NumberValue = {
  type: "number",
  value: number / SpecialNumber,
}

BooleanValue = {
  type: "boolean",
  value: bool,
}

BigIntValue = {
  type: "bigint",
  value: text,
}

SymbolValue = {
  type: "symbol",
  objectId: ObjectId,
}

ArrayValue = {
  type: "array",
  objectId: ObjectId,
  value?: ListValue,
}

ObjectValue = {
  type: "object",
  objectId: ObjectId,
  value?: MappingValue,
}

FunctionValue = {
  type: "function",
  objectId: ObjectId,
}

RegExpValue = {
  type: "regexp",
  objectId: ObjectId,
  value: text
}

DateValue = {
  type: "date",
  objectId: ObjectId,
  value: text
}

MapValue = {
  type: "map",
  objectId: ObjectId,
  value?: MappingValue,
}

SetValue = {
  type: "set",
  objectId: ObjectId,
  value?: ListValue
}

WeakMapValue = {
  type: "weakmap",
  objectId: ObjectId,
}

WeakSetValue = {
  type: "weakset",
  objectId: ObjectId,
}

ErrorValue = {
  type: "error",
  objectId: ObjectId,
}

PromiseValue = {
  type: "promise",
  objectId: ObjectId,
}

TypedArrayValue = {
  type: "typedarray",
  objectId: ObjectId,
}

ArrayBufferValue = {
  type: "arraybuffer",
  objectId: ObjectId,
}

NodeValue = {
  type: "node",
  objectId: ObjectId,
  value?: NodeProperties,
}

NodeProperties = {
  nodeType: uint,
  nodeValue: text,
  localName?: text,
  namespaceURI?: text,
  childNodeCount: uint,
  children?: [*NodeValue],
  attributes?: {*text => text},
  shadowRoot?: NodeValue / null,
}

WindowProxyValue = {
  type: "window",
  objectId: ObjectId,
}

Add WASM types?

Should WindowProxy get attributes in a similar style to Node?

handle String / Number / etc. wrapper objects specially?

To serialize as a remote value given an value, a max depth, node details, and a set of known objects:

  1. In the following list of conditions and associated steps, run the first set of steps for which the associated condition is true:

    Type(value) is Undefined
    Let remote value be a map matching the UndefinedValue production in the local end definition.
    Type(value) is Null
    Let remote value be a map matching the NullValue production in the local end definition.
    Type(value) is String
    Let remote value be a map matching the StringValue production in the local end definition, with the value property set to value.

    This doesn’t handle lone surrogates

    Type(value) is Number
    1. Switch on the value of value:

      NaN
      Let serialized be "NaN"
      -0
      Let serialized be "-0"
      +Infinity
      Let serialized be "+Infinity"
      -Infinity
      Let serialized be "-Infinity"
      Otherwise:
      Let serialized be value
    2. Let remote value be a map matching the NumberValue production in the local end definition, with the value property set to serialized.

    Type(value) is Boolean
    Let remote value be a map matching the BooleanValue production in the local end definition, with the value property set to value.
    Type(value) is BigInt
    Let remote value be a map matching the BigIntValue production in the local end definition, with the value property set to the result of running the ToString operation on value.
    Type(value) is Symbol
    Let remote value be a map matching the SymbolValue production in the local end definition, with the objectId property set to the object id for an object value.
    IsArray(value)
    1. Let serialized be null.

    2. If value is not in the set of known objects, and max depth is not null and greater than 0, run the following steps:

      1. Append value to the set of known objects

      2. Let serialized be the result of serialize as a list given CreateArrayIterator(value, value), max depth, node details and set of known objects.

    3. Let remote value be a map matching the ArrayValue production in the local end definition, with the objectId property set to the object id for an object value, and the value field set to serialized if it’s not null, or ommitted otherwise.

    IsRegExp(value)
    1. Let pattern be ToString(Get(value, "source")).

    2. Let flags be ToString(Get(value, "flags")).

    3. Let serialized be the string-concatenation of "/", pattern, "/", and flags.

    4. Let remote value be a map matching the RegExpValue production in the local end definition, with the objectId property set to the object id for an object object and the value set to serialized

    value has a [[DateValue]] internal slot.
    1. Let serialized be ToDateString(thisTimeValue(value)).

    2. Let remote value be a map matching the DateValue production in the local end definition, with the objectId property set to the object id for an object object and the value set to serialized.

    value has a [[MapData]] internal slot
    1. Let serialized be null.

    2. If value is not in the set of known objects, and max depth is not null and greater than 0, run the following steps:

      1. Append value to the set of known objects

      2. Let serialized be the result of serialize as a mapping given CreateMapIterator(value, key+value), max depth, node details and set of known objects.

    1. Let remote value be a map matching the MapValue production in the local end definition, with the objectId property set to the object id for an object value, and the value field set to serialized if it’s not null, or ommitted otherwise.

    value has a [[SetData]] internal slot
    1. Let serialized be null.

    2. If value is not in the set of known objects, and max depth is not null and greater than 0, run the following steps:

      1. Append value to the set of known objects

      2. Let serialized be the result of serialize as a list given CreateSetIterator(value, value), max depth, node details and set of known objects.

    1. Let remote value be a map matching the SetValue production in the local end definition, with the objectId property set to the object id for an object value, and the value field set to serialized if it’s not null, or ommitted otherwise.

    value has a [[WeakMapData]] internal slot
    Let remote value be a map matching the WeakMapValue production in the local end definition, with the objectId property set to the object id for an object value.
    value has a [[WeakSetData]] internal slot
    Let remote value be a map matching the WeakSetValue production in the local end definition, with the objectId property set to the object id for an object value.
    value has an [[ErrorData]] internal slot
    Let remote value be a map matching the ErrorValue production in the local end definition, with the objectId property set to the object id for an object value.
    IsPromise(value)
    Let remote value be a map matching the PromiseValue production in the local end definition, with the objectId property set to the object id for an object value.
    value has a [[TypedArrayName]] internal slot
    Let remote value be a map matching the TypedArrayValue production in the local end definition, with the objectId property set to the object id for an object value.
    value has an [[ArrayBufferData]] internal slot
    Let remote value be a map matching the ArrayBufferValue production in the local end definition, with the objectId property set to the object id for an object value.
    value is a platform object that implements Node
    1. Let serialized be null.

    2. If node details is true, run the following steps:

      1. Let serialized be a map.

      2. "nodeType", Get(value, "nodeType"), false)

      3. Set serialized["nodeValue"] to Get(value, "nodeValue")

      4. If value is an Element or an Attribute:

        1. Set serialized["localName" to Get(value, "localName")

        2. Set serialized["namespaceURI"] to Get(value, "namespaceURI")

      5. Let child node count be the size of serialized’s children.

      6. Set serialized["childNodeCount" to child node count

      7. If max depth is equal to 0 let children be null. Otherwise, let children be an empty list and, for each node child in the children of value:

        1. Let serialized be the result of serialize as a remote value with child, max depth - 1, node details and set of known objects.

        2. Append serialized to children.

      8. Set serialized["children"] to children.

      9. If value is an Element:

        1. Let attributes be a new map.

        2. For each attribute in value’s attribute list:

          1. Let name be attribute’s qualified name

          2. Let value be attribute’s value.

          3. Set attributes[name] to value

        3. Set serialized["attributes"] to attributes.

        4. Let shadow root be value’s shadow root.

        5. If shadow root is null, let serialized shadow be null. Otherwise let serialized shadow be the result of serialize as a remote value with shadow root, max depth - 1, false and set of known objects.

          Note: this means the objectId for the shadow root will be serialized irrespective of whether the shadow is open or closed, but no properties of the node will be returned.

        6. Set= serialized["shadowRoot"] to serialized shadow.

    3. Let remote value be a map matching the NodeValue production in the local end definition, with the objectId property set to the object id for an object value, and value set to serialized, if serialized is not null.

    value is a platform object that implements WindowProxy
    1. Let remote value be a map matching the WindowProxyValue production in the local end definition, with the objectId property set to the object id for an object value.
    value is a platform object
    1. Let remote value be a map matching the ObjectValue production in the local end definition, with the objectId property set to the object id for an object value.
    IsCallable(value)
    Let remote value be a map matching the FunctionValue production in the local end definition, with the objectId property set to the object id for an object value.
    Otherwise:
    1. Assert: type(value) is Object

    2. let serialized be null.

    3. If value is not in the set of known objects, and max depth is greater than 0, run the following steps:

      1. Append value to the set of known objects

      2. Let serialized be the result of serialize as a mapping given EnumerableOwnPropertyNames(value, key+value), max depth, node details and set of known objects

    4. Let remote value be a map matching the ObjectValue production in the local end definition, with the objectId property set to the object id for an object value, and the value field set to serialized.

  2. Return remote value

Does it make sense to use the same depth parameter for nodes and objects in general?

To serialize as a list given iterable, max depth, node details and set of known objects:
  1. Let serialized be a new list.

  2. For each child value in iterable:

    1. Let serialized child be the result of serialize as a remote value with arguments child value, max depth - 1, node details and set of known objects.

    2. Append serialized child to serialized.

  3. Return serialized

this assumes for-in works on iterators

To serialize as a mapping given iterable, max depth, node details and set of known objects:
  1. Let serialized be a new list.

  2. For item in iterable:

    1. Assert: IsArray(item)

    2. Let property be CreateListFromArrayLike(item)

    3. Assert: property is a list of size 2

    4. Let key be property[0] and let value be property[1]

    5. If Type(key) is String, let serialized key be child key, otherwise let serialized key be the result of serialize as a remote value with arguments child key, max depth - 1, node details and set of known objects.

    6. Let serialized value be the result of serialize as a remote value with arguments value, max depth - 1, node details and set of known objects.

    7. Let serialized child be («serialized key, serialized value»).

    8. Append serialized child to serialized.

  3. Return serialized

6. Modules

6.1. The session Module

The session module contains commands and events for monitoring the status of the remote end.

6.1.1. Definition

remote end definition

SessionCommand = (SessionStatusCommand //
                  SessionSubscribeCommand)

local end definition

SessionResult = (StatusResult)

To update the event map, given session, list of event names, list of contexts, and enabled:

  1. Let global event set be the global event set for session.

  2. Let event map be the browsing context event map for session.

  3. Let event names be an empty set.

    1. For each entry name in the list of event names, let event names be the union of event names and the result of trying to obtain a set of event names with name.

    2. If the list of contexts is null:

      1. If enabled is true, for each event name in event names, append event name to global event set. Otherwise for for each event name in event names, if the global event set contains event name, remove event name from the global event set.

      2. Return

    3. Let targets be an empty list.

    4. For each entry context id in the list of contexts:

      1. Let context be the result of trying to get a browsing context with context id. If the event map does not contain an entry for context, set the value of the entry for context to a new empty map.

      2. Get the entry from the event map for context and append it to targets.

    5. For each target in targets:

      1. For each event name in event names:

        1. Set target[event name] to enabled.

    6. Return success with data null.

    Note: Implementations that do additional work when an event is enabled, e.g. subscribing to the relevant engine-internal events, will likely perform those additional steps when updating the event map. This specification uses a model where hooks are always called and then the event map is used to filter only those that ought to be returned to the local end.

6.1.2. Commands

6.1.2.1. The session.status Command

The session.status command returns information about whether a remote end is in a state in which it can create new sessions, but may additionally include arbitrary meta information that is specific to the implementation.

Command Type
SessionStatusCommand = {
  method: "session.status",
  params: EmptyParams,
}
Return Type
SessionStatusResult = {
  ready: bool,
  message: text,
}

The remote end steps are:

  1. Let body be a new map with the following properties:

    "ready"
    The remote end’s readiness state.
    "message"
    An implementation-defined string explaining the remote end’s readiness state.
  2. Return success with data body

6.1.2.2. The session.subscribe Command

The session.subscribe command enables certain events either globally or for a set of browsing contexts

This needs to be generalized to work with realms too

Command Type
SessionSubscribeCommand = {
  method: "session.subscribe",
  params: SubscribeParameters
}

SessionSubscribeParameters = {
  events: [*text],
  ?contexts: [*BrowsingContext],
}
Return Type
EmptyResult

The remote end steps with command parameters are:

  1. Let the list of event names be the value of the events field of command parameters.

  2. Let the list of contexts be the value of the contexts field of command parameters if it is present or null if it isn’t.

  3. Return the result of updating the event map with current session, list of event names, list of contexts and enabled true.

6.1.2.3. The session.unsubscribe Command

The session.unsubscribe command disables events either globally or for a set of browsing contexts

This needs to be generalised to work with realms too

Command Type
SessionUnsubscribeCommand = {
  method: "session.unsubscribe",
  params: SubscribeParameters
}
Return Type
EmptyResult

The remote end steps with command parameters are:

  1. Let the list of event names be the value of the events field of command parameters.

  2. Let the list of contexts be the value of the contexts field of command parameters if it is present or null if it isn’t.

  3. Return the result of updating the event map with current session, list of event names, list of contexts and enabled false.

6.2. The browsingContext Module

The browsingContext module contains commands and events relating to browsing contexts.

6.2.1. Definition

remote end definition

BrowsingContextCommand = (BrowsingContextGetTreeCommand)

local end definition

BrowsingContextResult = (BrowsingContextGetTreeResult)

BrowsingContextEvent = (
    BrowsingContextCreatedEvent //
    BrowsingContextDestroyedEvent
)

6.2.2. Types

6.2.2.1. The browsingContext.BrowsingContext Type

remote end definition and local end definition

BrowsingContext = text;

Each browsing context has an associated browsing context id, which is a string uniquely identifying that browsing context. This is implicitly set when the context is created. For browsing contexts with an associated WebDriver window handle the browsing context id must be the same as the window handle.

To get a browsing context given context id:
  1. If context id is null, return success with data null.

  2. If there is no browsing context with browsing context id context id return error with error code no such frame

  3. Let context be the browsing context with id context id.

  4. Return success with data context

6.2.2.2. The browsingContext.BrowsingContextInfo Type

local end definition

BrowsingContextInfoList = [* BrowsingContextInfo]

BrowsingContextInfo = {
  context: BrowsingContext,
  ?parent: BrowsingContext / null,
  url: text,
  children: BrowsingContextInfoList / null
}

The BrowsingContextInfo type represents the properties of a browsing context.

To get the browsing context info given context, depth and max depth:
  1. Let context id be the browsing context id for context.

  2. If context has a parent browsing context let parent id be the browsing context id of that parent. Otherwise let parent id be null.

  3. Let document be context’s active document.

  4. Let url be the result of running the URL serializer, given document’s URL.

    Note: This includes the fragment component of the URL.

  5. Let child info be the result of get the descendent browsing contexts given context id, depth + 1, and max depth.

  6. Let context info be a map matching the BrowsingContextInfo production with the context field set to context id, the parent field set to parent id if depth is 0, or unset otherwise, the url field set to url, and the children field set to child info.

  7. Return context info.

To get the descendent browsing contexts given parent id, depth and max depth:
  1. If max depth is greater than zero, and depth is equal to max depth, return null.

  2. Let parent be the result of trying to get a browsing context given parent id.

  3. If parent is null, let child contexts be a list containing all top-level browsing contexts. Otherwise let child contexts be a list containing all browsing contexts which are child browsing contexts of parent.

  4. Let contexts info be a list.

  5. For each context of child contexts:

    1. Let info be the result of get the browsing context info given context, depth, and max depth.

    2. Append info to contexts info

  6. Return contexts info

6.2.3. Commands

6.2.3.1. The browsingContext.getTree Command

The browsingContext.getTree command returns a tree of all browsing contexts that are descendents of the given context, or all top-level contexts when no parent is provided.

Command Type
BrowsingContextGetTreeCommand = {
  method: "browsingContext.getTree",
  params: BrowsingContextGetTreeParameters
}

BrowsingContextGetTreeParameters = {
  ?maxDepth: uint,
  ?parent: BrowsingContext,
}
Return Type
BrowsingContextGetTreeResult = {
  contexts: BrowsingContextInfoList
}
The remote end steps with command parameters are:
  1. Let the parent id be the value of the parent field of command parameters if present, or null otherwise.

  2. Let max depth be the value of the maxDepth field of command parameters if present, or 0 otherwise.

  3. Let depth be 0.

  4. Let contexts be the result of get the descendent browsing contexts, given parent id, depth, and max depth.

  5. Let body be a map matching the BrowsingContextGetTreeResult production, with the contexts field set to contexts.

  6. Return success with data body.

6.2.4. Events

6.2.4.1. The browsingContext.contextCreated Event
Event Type
 BrowsingContextCreatedEvent = {
  method: "browsingContext.contextCreated",
  params: BrowsingContextInfo
}
The remote end event trigger is:
When the create a new browsing context algorithm is invoked, after the active document of the browsing context is set, run the following steps:
  1. Let context be the newly created browsing context.

  2. Let related contexts be a set containing the parent browsing context of context, if that is not null, or an empty set otherwise.

  3. Let params be the result of get the browsing context info given context, 0, and 1.

  4. Let body be a map matching the BrowsingContextCreatedEvent production, with the params field set to params.

  5. Emit an event with body and related contexts.

6.2.4.2. The browsingContext.contextDestroyed Event
Event Type
 BrowsingContextDestroyedEvent = {
  method: "browsingContext.contextDestroyed",
  params: BrowsingContextInfo
}
The remote end event trigger is:

Run the following browsing context tree discarded steps:

  1. If the current session is null, return.

  2. Let context be the browsing context being discarded.

  3. Let params be the result of get the browsing context info, given context, 0, and 0.

  4. Let body be a map matching the BrowsingContextDestroyedEvent production, with the params field set to params.

  5. Let related contexts be a set containing the parent browsing context of context, if that is not null, or an empty set otherwise.

  6. Emit an event with body and related contexts.

the way this hooks into HTML feels very fragile. See https://github.com/whatwg/html/issues/6194

It’s unclear if we ought to only fire this event for browsing contexts that have active documents; navigation can also cause contexts to become inaccessible but not yet get discarded because bfcache.

6.3. The script Module

The script module contains commands and events relating to script realms and execution.

6.3.1. Definition

Remote end definition

ScriptCommand = (ScriptGetRealmsCommand)

local end definition

ScriptResult = (ScriptGetRealmsResult)

ScriptEvent = (
    ScriptRealmCreatedEvent //
    ScriptRealmDestroyedEvent
)

6.3.2. Types

6.3.2.1. The script.Realm type

Remote end definition and local end definition

Realm = text;

Each realm has an associated realm id, which is a string uniquely identifying that realm. This is implicitly set when the realm is created.

6.3.2.2. The script.RealmInfo type

Local end definition

RealmInfo = {
  realm: Realm,
  type: RealmType,
  origin: text
}

RealmType = "window" / "dedicated-worker" / "shared-worker" / "service-worker" / "worker" / "paint-worklet" / "audio-worklet" / "worklet" / text

The RealmInfo type represents the properties of a realm.

To get the realm info given environment settings:
  1. Let realm be environment settingsrealm execution context's Realm component.

  2. Let realm id be the realm id for realm.

  3. Run the steps under the first matching condition:

    The global object specified by environment settings is a Window object
    1. Let type be "window".

    The global object specified by environment settings is a DedicatedWorkerGlobalScope object
    1. Let type be "dedicated-worker".

    The global object specified by environment settings is a SharedWorkerGlobalScope object
    1. Let type be "shared-worker".

    The global object specified by environment settings is a ServiceWorkerGlobalScope object
    1. Let type be "service-worker".

    The global object specified by environment settings is a WorkerGlobalScope object
    1. Let type be "worker".

    The global object specified by environment settings is a PaintWorkletGlobalScope object
    1. Let type be "paint-worklet".

    The global object specified by environment settings is a AudioWorkletGlobalScope object
    1. Let type be "audio-worklet".

    The global object specified by environment settings is a WorkletGlobalScope object
    1. Let type be "worklet".

    Otherwise:
    1. Return null.

  4. Let origin be the serialization of an origin given environment settings’s origin.

  5. Let realm info be a map matching the RealmInfo production, with the realm field set to realm id, the type field set to type and the origin field set to origin.

  6. Return realm info

We currently don’t provide information about realms of unknown types. That might be a problem for e.g. extension-related realms.

Note: Future variations of this specification will retain the invariant that the last component of the type name after splitting on "-" will always be "worker" for globals implementing WorkerGlobalScope, and "worklet" for globals implementing WorkletGlobalScope.

6.3.3. Commands

6.3.3.1. The script.getRealms Command

The script.getRealms command returns a list of all realms, optionally filtered to realms of a specific type, or to the realm associated with the document currently loaded in a specified browsing context.

Command Type
ScriptGetRealmsCommand = {
  method: "script.getRealms",
  params: GetRealmsParameters
}

GetRealmsParameters = {
  ?context: BrowsingContext,
  ?type: RealmType,
}
Return Type
RealmInfoList = [* RealmInfo]

ScriptGetRealmsResult = {
  realms: RealmInfoList
}
The remote end steps with command parameters are:
  1. Let environment settings be a list of all the environment settings objects that have their execution ready flag set.

  2. If command parameters contains context:

    1. Let context be the result of trying to get a browsing context with command parameters["context"].

    2. Let document be context’s active document.

    3. Let context environment settings be a list.

    4. For each settings of environment settings:

      1. If any of the following conditions hold:

        Append settings to context environment settings.

    5. Set environment settings to context environment settings.

  3. Let realms be a list.

  4. For each settings of environment settings:

    1. Let realm info be the result of get the realm info given settings

    2. If command parameters contains type and realm info["type"] is not equal to command parameters["type"] then continue.

    3. If realm info is not null, append realm info to realms.

  5. Let body be a map matching the GetRealmsResult production, with the realms field set to realms.

  6. Return success with data body.

Extend this to also allow realm parents e.g. for nested workers? Or get all ancestor workers.

We might want to have a more sophisticated filter system than just a literal match.

6.3.4. Events

6.3.4.1. The script.realmCreated Event
Event Type
 ScriptRealmCreatedEvent = {
  method: "script.realmCreated",
  params: RealmInfo
}
The remote end event trigger is:

When any of the set up a window environment settings object, set up a worker environment settings object or set up a worklet environment settings object algorithms are invoked, immediately prior to returning the settings object:

  1. Let environment settings be the newly created environment settings object.

  2. Let realm info be be the result of get the realm info given environment settings.

  3. If realm info is null, return.

  4. Let related contexts be an empty set.

  5. If the responsible document of settings is a Document, append the responsible document's browsing context to related contexts.

    Otherwise if the global object specified by settings is a WorkerGlobalScope, for each owner in the global object's owner set, if owner is a Document, append owner’s browsing context to related contexts.

  6. Let body be a map matching the RealmCreatedEvent production, with the params field set to realm info.

  7. Emit an event with body and related contexts.

6.3.4.2. The script.realmDestroyed Event
Event Type
RealmDestroyedParameters = {
  realm: Realm
}

ScriptRealmDestroyedEvent = {
  method: "script.realmDestoyed",
  params: RealmDestroyedParameters
}
The remote end event trigger is:
Define the following unloading document cleanup steps with document:
  1. Let related contexts be an empty set.

  2. Append document’s browsing context to related contexts.

  3. For each worklet global scope in document’s worklet global scopes:

    1. Let realm be worklet global scope’s relevant Realm.

    2. Let realm id be the realm id for realm.

    3. Let params be a map mathcing the RealmDestroyedParameters production, with the realm field set of realm id.

    4. Let body be a map matching the RealmDestroyedEvent production, with the params field set to params.

    5. Emit an event with body and related contexts.

  4. Let environment settings be the environment settings object whose responsible document is document.

  5. Let realm be environment settingsrealm execution context's Realm component.

  6. Let realm id be the realm id for realm.

  7. Let params be a map mathcing the RealmDestroyedParameters production, with the realm field set to realm id.

  8. Let body be a map matching the RealmDestroyedEvent production, with the params field set to params.

  9. Emit an event with body and related contexts.

Whenever a worker event loop event loop is destroyed, either because the worker comes to the end of its lifecycle, or prematurely via the terminate a worker algorithm:

  1. Let related contexts be an empty set.

  2. Let environment settings be the environment settings object for which event loop is the responsible event loop.

  3. If the global object specified by environment settings is a WorkerGlobalScope, for each owner in the global object's owner set, if owner is a Document, append owner’s browsing context to related contexts.

  4. Let realm be environment settings’s environment settings object’s Realm.

  5. Let realm id be the realm id for realm.

  6. Let params be a map mathcing the RealmDestroyedParameters production, with the realm field set of realm id.

  7. Let body be a map matching the RealmDestroyedEvent production, with the params field set to params.

7. Patches to Other Specifications

This specification requires some changes to external specifications to provide the necessary integration points. It is assumed that these patches will be committed to the other specifications as part of the standards process.

7.1. HTML

The a browsing context is discarded algorithm is modified to read as follows:

To discard a browsing context browsingContext, run these steps:
  1. If this is not a recursive invocation of this algorithm, call any browsing context tree discarded steps defined in external specifications with browsingContext.

  2. Discard all Document objects for all the entries in browsingContext’s session history.

  3. If browsingContext is a top-level browsing context, then remove a browsing context browsingContext.

The actual patch might be better to split the algorithm into an outer algorithm that is called by external callers and an inner algorithm that’s used for recursive calls. That’s quite hard to express as a patch to the specification since it requires changing multiple parts.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[CSS-PAINT-API-1]
Ian Kilpatrick; Dean Jackson. CSS Painting API Level 1. URL: https://drafts.css-houdini.org/css-paint-api-1/
[DOM]
Anne van Kesteren. DOM Standard. Living Standard. URL: https://dom.spec.whatwg.org/
[ECMASCRIPT]
ECMAScript Language Specification. URL: https://tc39.es/ecma262/
[FETCH]
Anne van Kesteren. Fetch Standard. Living Standard. URL: https://fetch.spec.whatwg.org/
[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[INFRA]
Anne van Kesteren; Domenic Denicola. Infra Standard. Living Standard. URL: https://infra.spec.whatwg.org/
[RFC4122]
P. Leach; M. Mealling; R. Salz. A Universally Unique IDentifier (UUID) URN Namespace. July 2005. Proposed Standard. URL: https://tools.ietf.org/html/rfc4122
[RFC6455]
I. Fette; A. Melnikov. The WebSocket Protocol. December 2011. Proposed Standard. URL: https://tools.ietf.org/html/rfc6455
[RFC8610]
H. Birkholz; C. Vigano; C. Bormann. Concise Data Definition Language (CDDL): A Notational Convention to Express Concise Binary Object Representation (CBOR) and JSON Data Structures. June 2019. Proposed Standard. URL: https://tools.ietf.org/html/rfc8610
[SERVICE-WORKERS-1]
Alex Russell; et al. Service Workers 1. URL: https://w3c.github.io/ServiceWorker/
[URL]
Anne van Kesteren. URL Standard. Living Standard. URL: https://url.spec.whatwg.org/
[WEBAUDIO]
Paul Adenot; Hongchan Choi. Web Audio API. URL: https://webaudio.github.io/web-audio-api/
[WEBDRIVER]
Simon Stewart; David Burns. WebDriver. URL: https://w3c.github.io/webdriver/
[WebIDL]
Boris Zbarsky. Web IDL. URL: https://heycam.github.io/webidl/
[WORKLETS-1]
Ian Kilpatrick. Worklets Level 1. URL: https://drafts.css-houdini.org/worklets/

Informative References

[JSON-RPC]
JSON-RPC Working Group. JSON-RPC 2.0 Specification. 4 January 2013. URL: https://www.jsonrpc.org/specification

Issues Index

Should this be an appendix?
Do we support > 1 connection for a single session?
Nothing seems to define what status code is used for UTF-8 errors.
This should also reset any internal state
Need to hook in to the session ending to allow the UA to close the listener if it wants.
Should this be explicitly per realm?
Add WASM types?
Should WindowProxy get attributes in a similar style to Node?
handle String / Number / etc. wrapper objects specially?
This doesn’t handle lone surrogates
Does it make sense to use the same depth parameter for nodes and objects in general?
this assumes for-in works on iterators
This needs to be generalized to work with realms too
This needs to be generalised to work with realms too
the way this hooks into HTML feels very fragile. See https://github.com/whatwg/html/issues/6194
It’s unclear if we ought to only fire this event for browsing contexts that have active documents; navigation can also cause contexts to become inaccessible but not yet get discarded because bfcache.
We currently don’t provide information about realms of unknown types. That might be a problem for e.g. extension-related realms.
Extend this to also allow realm parents e.g. for nested workers? Or get all ancestor workers.
We might want to have a more sophisticated filter system than just a literal match.
The actual patch might be better to split the algorithm into an outer algorithm that is called by external callers and an inner algorithm that’s used for recursive calls. That’s quite hard to express as a patch to the specification since it requires changing multiple parts.