WebDriver BiDi

Editor’s Draft,

This version:
https://w3c.github.io/webdriver-bidi/
Issue Tracking:
GitHub
Inline In Spec

Abstract

This document defines the BiDirectional WebDriver Protocol, a mechanism for remote control of user agents.

Status of this document

This is a public copy of the editors’ draft. It is provided for discussion only and may change at any moment. Its publication here does not imply endorsement of its contents by W3C. Don’t cite this document other than as work in progress.

GitHub Issues are preferred for discussion of this specification.

This document was produced by the Browser Testing and Tools Working Group.

This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 15 September 2020 W3C Process Document.

1. Introduction

This section is non-normative.

WebDriver defines a protocol for introspection and remote control of user agents. This specification extends WebDriver by introducing bidirectional communication. In place of the strict command/response format of WebDriver, this permits events to stream from the user agent to the controlling software, better matching the evented nature of the browser DOM.

2. Infrastructure

This specification depends on the Infra Standard. [INFRA]

Network protocol messages are defined using CDDL. [RFC8610]

This specification defines a wait queue which is a map.

Surely there’s a better mechanism for doing this "wait for an event" thing.

When an algorithm algorithm running in parallel awaits a set of events events, and resume id:

  1. Pause the execution of algorithm.

  2. Assert: wait queue does not contain resume id.

  3. Set wait queue[resume id] to (events, algorithm).

To resume given name, id and parameters:
  1. If wait queue does not contain id, return.

  2. Let (events, algorithm) be wait queue[id]

  3. For each event in events:

    1. If event equals name:

      1. Remove id from wait queue.

      2. Resume running the steps in algorithm from the point at which they were paused, passing name and parameters as the result of the await.

        Should we have something like microtasks to ensure this runs before any other tasks on the event loop?

3. Protocol

This section defines the basic concepts of the WebDriver BiDi protocol. These terms are distinct from their representation at the transport layer.

The protocol is defined using a CDDL definition. For the convenience of implementors two seperate CDDL definitions are defined; the remote end definition which defines the format of messages produced on the local end and consumed on the remote end, and the local end definition which defines the format of messages produced on the remote end and consumed on the local end

3.1. Definition

Should this be an appendix?

This section gives the initial contents of the remote end definition and local end definition. These are augmented by the definition fragments defined in the remainder of the specification.

Remote end definition

Command = {
  id: uint,
  CommandData,
  *text => any,
}

CommandData = (
  SessionCommand //
  BrowsingContextCommand
)

EmptyParams = { *text }

Local end definition

Message = (
  CommandResponse //
  ErrorResponse //
  Event
)

CommandResponse = {
  id: uint,
  result: ResultData,
  *text => any
}

ErrorResponse = {
  id: uint / null,
  error: "unknown error" / "unknown method" / "invalid argument",
  message: text,
  ?stacktrace: text,
  *text => any
}

ResultData = (
  EmptyResult //
  SessionResult //
  BrowsingContextResult //
  ScriptResult
)

EmptyResult = {}

Event = {
  EventData,
  *text => any
}

EventData = (
  BrowsingContextEvent //
  ScriptEvent
)

3.2. Session

WebDriver BiDi uses the same session concept as WebDriver.

3.3. Modules

The WebDriver BiDi protocol is organized into modules.

Each module represents a collection of related commands and events pertaining to a certain aspect of the user agent. For example, a module might contain functionality for inspecting and manipulating the DOM, or for script execution.

Each module has a module name which is a string. The command name and event name for commands and events defined in the module start with the module name followed by a period ".".

Modules which contain commands define remote end definition fragments. These provide choices in the CommandData group for the module’s commands, and can also define additional definition properties. They can also define local end definition fragments that provide additional choices in the ResultData group for the results of commands in the module.

Modules which contain events define local end definition fragments that are choices in the Event group for the module’s events.

An implementation may define extension modules. These must have a module name that contains a single colon ":" character. The part before the colon is the prefix; this is typically the same for all extension modules specific to a given implementation and should be unique for a given implementation. Such modules extend the local end definition and remote end definition providing additional groups as choices for the defined commands and events.

3.4. Commands

A command is an asynchronous operation, requested by the local end and run on the remote end, resulting in either a result or an error being returned to the local end. Multiple commands can run at the same time, and commands can potentially be long-running. As a consequence, commands can finish out-of-order.

Each command is defined by:

When commands are send from the local end they have a command id. This is an identifier used by the local end to identify the response from a particular command. From the point of view of the remote end this identifier is opaque and cannot be used internally to identify the command.

Note: This is because the command id is entirely controlled by the local end and isn’t necessarily unique over the course of a session. For example a local end which ignores all responses could use the same command id for each command.

The set of all command names is a set containing all the defined command names, including any belonging to extension modules.

3.5. Events

An event is a notification, sent by the remote end to the local end, signaling that something of interest has occurred on the remote end.

A session has a global event set which is a set containing the event names for events that are enabled for all browsing contexts. This initially contains the event name for events that are in the default event set.

A session has a browsing context event map, which is a map with top-level browsing context keys and values that are a set of event names for events that are enabled in the given browsing context.

To obtain a list of event enabled browsing contexts given session and event name:

  1. Let contexts be an empty set.

  2. For each contextevents of session’s browsing context event map:

    1. If events contains event name, append context to contexts

  3. Return contexts.

To determine if an event is enabled given session, event name and browsing contexts:

Note: browsing contexts is a set because a shared worker can be associated with multiple contexts.

  1. Let top-level browsing contexts be an empty set.

  2. For each browsing context of browsing contexts, append browsing context’s top-level browsing context to top-level browsing contexts.

  3. Let event map be the browsing context event map for session.

  4. For each browsing context of top-level browsing contexts:

    1. If event map contains browsing context, let browsing context events be event map[browsing context]. Otherwise let browsing context events be null.

    2. If browsing context events is not null, and browsing context events contains event name, return true.

  5. If the global event set for session contains event name return true.

  6. Return false.

To obtain a set of event names given an name:

  1. Let events be an empty set.

  2. If name contains a U+002E (period):

    1. If name is the event name for an event, append name to events and return success with data events.

    2. Return an error with error code Invalid Argument

  3. Otherwise name is interpreted as representing all the events in a module. If name is not a module name return an error with error code Invalid Argument.

  4. Append the event name for each event in the module with name name to events.

  5. Return success with data events.

4. Transport

Message transport is provided using the WebSocket protocol. [RFC6455]

Note: In the terms of the WebSocket protocol, the local end is the client and the remote end is the server / remote host.

Note: The encoding of commands and events as messages is similar to JSON-RPC, but this specification does not normatively reference it. [JSON-RPC] The normative requirements on remote ends are instead given as a precise processing model, while no normative requirements are given for local ends.

A WebSocket listener is a network endpoint that is able to accept incoming WebSocket connections.

A WebSocket listener has a host, a port, a secure flag, and a list of WebSocket resources.

When a WebSocket listener listener is created, a remote end must start to listen for WebSocket connections on the host and port given by listener’s host and port. If listener’s secure flag is set, then connections established from listener must be TLS encrypted.

A remote end has a set of WebSocket listeners active listeners, which is initially empty.

A WebDriver session has a WebSocket connection which is a network connection that follows the requirements of the WebSocket protocol. This is initially null.

When a client establishes a WebSocket connection connection by connecting to one of the set of active listeners listener, the implementation must proceed according to the WebSocket server-side requirements, with the following steps run when deciding whether to accept the incoming connection:

  1. Let resource name be the resource name from reading the client’s opening handshake. If resource name is not in listener’s list of WebSocket resources, then stop running these steps and act as if the requested service is not available.

  2. Get a session ID for a WebSocket resource with resource name and let session id be that value. If session id is null then stop running these steps and act as if the requested service is not available.

  3. If there is a session in the list of active sessions with session id as its session ID then let session be that session. Otherwise stop running these steps and act as if the requested service is not available.

  4. Run any other implementation-defined steps to decide if the connection should be accepted, and if it is not stop running these steps and act as if the requested service is not available.

  5. Otherwise set session’s WebSocket connection to connection, and proceed with the WebSocket server-side requirements when a server chooses to accept an incoming connection.

Do we support > 1 connection for a single session?

When a WebSocket message has been received for a WebSocket connection connection with type type and data data, a remote end must handle an incoming message given connection, type and data.

When the WebSocket closing handshake is started or when the WebSocket connection is closed for a WebSocket connection connection, a remote end must handle a connection closing given connection.

Note: Both conditions are needed because it is possible for a WebSocket connection to be closed without a closing handshake.

To construct a WebSocket resource name given a session session:

  1. Return the result of concatenating the string "/session/" with session’s session ID.

To construct a WebSocket URL given a WebSocket listener listener and session session:

  1. Let resource name be the result of constructing a WebSocket resource name given session.

  2. Return a WebSocket URI constructed with host set to listener’s host, port set to listener’s port, path set to resource name, following the wss-URI construct if listener’s secure flag is set and the ws-URL construct otherwise.

To get a session ID for a WebSocket resource given resource name:

  1. If resource name doesn’t begin with the byte string "/session/", return null.

  2. Let session id be the bytes in resource name following the "/session/" prefix.

  3. If session id is not the string representation of a UUID, return null.

  4. Return session id.

To start listening for a WebSocket connection given a session session:
  1. If there is an existing WebSocket listener in the set of active listeners which the remote end would like to reuse, let listener be that listener. Otherwise let listener be a new WebSocket listener with implementation-defined host, port, secure flag, and an empty list of WebSocket resources.

  2. Let resource name be the result of constructing a WebSocket resource name given session.

  3. Append resource name to the list of WebSocket resources for listener.

  4. Append listener to the remote end's active listeners.

  5. Return listener.

Note: An intermediary node handling multiple sessions can use one or many WebSocket listeners. WebDriver defines that an endpoint node supports at most one session at a time, so it’s expected to only have a single listener.

Note: For an endpoint node the host in the above steps will typically be "localhost".

To handle an incoming message given a WebSocket connection connection, type type and data data:
  1. If type is not text, respond with an error given connection, null, and invalid argument, and finally return.

  2. Assert: data is a scalar value string, because the WebSocket handling errors in UTF-8-encoded data would already have failed the WebSocket connection otherwise.

    Nothing seems to define what status code is used for UTF-8 errors.

  3. Let parsed be the result of parsing JSON into Infra values given data. If this throws an exception, then respond with an error given connection, null, and invalid argument, and finally return.

  4. Match parsed against the remote end definition. If this results in a match:

    1. Let matched be the map representing the matched data.

    2. Assert: matched contains "id", "method", and "params".

    3. Let command id be matched["id"].

    4. Let method be matched["method"]

    5. Run the following steps in parallel:

      1. Let result be the result of running the remote end steps for the command with command name method given command parameters matched["params"]

      2. If result is an error, then respond with an error given connection, command id, and result’s error code, and finally return.

      3. Let value be result’s data.

      4. Assert: value matches the definition for the result type corresponding to the command with command name method.

      5. Let response be a new map matching the CommandResponse production in the local end definition with the id field set to command id and the value field set to value.

      6. Let serialized be the result of serialize an infra value to JSON bytes given response.

      7. Send a WebSocket message comprised of serialized over connection and return.

  5. Otherwise:

    1. Let command id be null.

    2. If parsed is a map and parsed["id"] exists and is an integer greater than or equal to zero, set command id to that integer.

    3. Let error code be invalid argument.

    4. If parsed is a map and parsed["method"] exists and is a string, but parsed["method"] is not in the set of all command names, set error code to unknown command.

    5. Respond with an error given connection, command id, and error code.

To get related browsing contexts given an settings object settings:

  1. Let related browsing contexts be an empty set

  2. If the responsible document of settings is a Document, append the responsible document's browsing context to related browsing contexts.

    Otherwise if the global object specified by settings is a WorkerGlobalScope, for each owner in the global object's owner set, if owner is a Document, append owner’s browsing context to related browsing contexts.

  3. Return related browsing contexts.

To emit an event given body and related browsing contexts:
  1. Assert: body has size 2 and contains "method" and "params".

  2. If the current session is null, or the current session's WebSocket Connection is null then return false.

  3. If event is enabled given current session, body["method"] and related browsing contexts:

    1. Let connection be the current session's WebSocket connection.

    2. Let serialized be the result of serialize an infra value to JSON bytes given body.

    3. Send a WebSocket message comprised of serialized over connection.

    4. Return true

  4. Return false

To respond with an error given a WebSocket connection connection, command id, and error code:
  1. Let error data be a new map matching the ErrorResponse production in the local end definition, with the id field set to command id, the error field set to error code, the message field set to an implementation-defined string containing a human-readable definition of the error that occurred and the stacktrace field optionally set to an implementation-defined string containing a stack trace report of the active stack frames at the time when the error occurred.

  2. Let response be the result of serialize an infra value to JSON bytes given error data.

    Note: command id can be null, in which case the id field will also be set to null, not omitted from response.

  3. Send a WebSocket message comprised of response over connection.

To handle a connection closing given a WebSocket connection connection:

  1. If there is a WebDriver session with connection as its connection, set the connection on that session to null.

This should also reset any internal state

Note: This does not end any session.

Need to hook in to the session ending to allow the UA to close the listener if it wants.

4.1. Establishing a Connection

WebDriver clients opt in to a bidirectional connection by requesting a capability with the name "webSocketUrl" and value true.

This specification defines an additional webdriver capability with the capability name "webSocketUrl".

The additional capability deserialization algorithm for the "webSocketUrl" capability, with parameter value is:
  1. If value is not a boolean, return error with code invalid argument.

  2. Return success with data value.

The matched capability serialization algorithm for the "webSocketUrl" capability, with parameter value is:
  1. If value is false, return success with data null.

  2. Return success with data true.

The WebDriver new session algorithm defined by this specification, with parameters session and capabilities is:
  1. Let webSocketUrl be the result of getting a property named "webSocketUrl" from capabilities.

  2. If webSocketUrl is undefined, return.

  3. Assert: webSocketUrl is true.

  4. Let listener be the result of start listening for a WebSocket connection given session.

  5. Set webSocketUrl to the result of constructing a WebSocket URL given listener and session.

  6. Set a property on capabilities named "webSocketUrl" to webSocketUrl.

5. Common Data Types

5.1. Remote Value

Values accessible from the ECMAScript runtime are represented by a mirror object, specified as RemoteValue. The value’s type is specified in the type property. In the case of JSON-representable primitive values, this contains the value in the value property; in the case of non-JSON-representable primitives, the value property contains a string representation of the value. For non-primitive objects, the objectId property contains a string id that provides a unique handle to the object, valid for its lifetime inside the engine. For some non-primitive types, the value property contains a representation of the data in the ECMAScript object; for container types this can contain further RemoteValue instances. The value property can be null if there is a duplicate object i.e. the object has already been serialized in the current RemoteValue, perhaps as part of a cycle, or otherwise when the maximum serialization depth is reached.

Nodes are also represented by RemoteValue instances. These have a partial serialization of the node in the value property.

Note: mirror objects do not keep the original object alive in the runtime. If an object is discarded in the runtime subsequent attempts to access it via the protocol will result in an error.

A session has an object id map. This is a weak map from objects to their corresponding id.

Should this be explicitly per realm?

To get the object id for an object given an object:
  1. If the object id map for the current session does not contain object run the following steps:

    1. Let object id be a new, unique, string identifier for object. If object is an element this must be the web element reference for object; if it’s a WindowProxy object, this must be the window handle for object.

    2. Set the value of object in the object id map to object id.

  2. Return the result of getting the value for object in object id map.

remote end definition and local end definition

RemoteValue = {
  UndefinedValue //
  NullValue //
  StringValue //
  NumberValue //
  BooleanValue //
  BigIntValue //
  SymbolValue //
  ArrayValue //
  ObjectValue //
  FunctionValue //
  RegExpValue //
  DateValue //
  MapValue //
  SetValue //
  WeakMapValue //
  WeakSetValue //
  IteratorValue //
  GeneratorValue //
  ErrorValue //
  ProxyValue //
  PromiseValue //
  TypedArrayValue //
  ArrayBufferValue //
  NodeValue //
  WindowProxyValue //
}

ObjectId = text;

ListValue = [*RemoteValue];

MappingValue = [*[(RemoteValue / text), RemoteValue]];

UndefinedValue = {
  type: "undefined",
}

NullValue = {
  type: "null",
}

StringValue = {
  type: "string",
  value: text,
}

SpecialNumber = "NaN" / "-0" / "+Infinity" / "-Infinity";

NumberValue = {
  type: "number",
  value: number / SpecialNumber,
}

BooleanValue = {
  type: "boolean",
  value: bool,
}

BigIntValue = {
  type: "bigint",
  value: text,
}

SymbolValue = {
  type: "symbol",
  objectId: ObjectId,
}

ArrayValue = {
  type: "array",
  objectId: ObjectId,
  value?: ListValue,
}

ObjectValue = {
  type: "object",
  objectId: ObjectId,
  value?: MappingValue,
}

FunctionValue = {
  type: "function",
  objectId: ObjectId,
}

RegExpValue = {
  type: "regexp",
  objectId: ObjectId,
  value: text
}

DateValue = {
  type: "date",
  objectId: ObjectId,
  value: text
}

MapValue = {
  type: "map",
  objectId: ObjectId,
  value?: MappingValue,
}

SetValue = {
  type: "set",
  objectId: ObjectId,
  value?: ListValue
}

WeakMapValue = {
  type: "weakmap",
  objectId: ObjectId,
}

WeakSetValue = {
  type: "weakset",
  objectId: ObjectId,
}

ErrorValue = {
  type: "error",
  objectId: ObjectId,
}

PromiseValue = {
  type: "promise",
  objectId: ObjectId,
}

TypedArrayValue = {
  type: "typedarray",
  objectId: ObjectId,
}

ArrayBufferValue = {
  type: "arraybuffer",
  objectId: ObjectId,
}

NodeValue = {
  type: "node",
  objectId: ObjectId,
  value?: NodeProperties,
}

NodeProperties = {
  nodeType: uint,
  nodeValue: text,
  localName?: text,
  namespaceURI?: text,
  childNodeCount: uint,
  children?: [*NodeValue],
  attributes?: {*text => text},
  shadowRoot?: NodeValue / null,
}

WindowProxyValue = {
  type: "window",
  objectId: ObjectId,
}

Add WASM types?

Should WindowProxy get attributes in a similar style to Node?

handle String / Number / etc. wrapper objects specially?

To serialize as a remote value given an value, a max depth, node details, and a set of known objects:

  1. In the following list of conditions and associated steps, run the first set of steps for which the associated condition is true:

    Type(value) is Undefined
    Let remote value be a map matching the UndefinedValue production in the local end definition.
    Type(value) is Null
    Let remote value be a map matching the NullValue production in the local end definition.
    Type(value) is String
    Let remote value be a map matching the StringValue production in the local end definition, with the value property set to value.

    This doesn’t handle lone surrogates

    Type(value) is Number
    1. Switch on the value of value:

      NaN
      Let serialized be "NaN"
      -0
      Let serialized be "-0"
      +Infinity
      Let serialized be "+Infinity"
      -Infinity
      Let serialized be "-Infinity"
      Otherwise:
      Let serialized be value
    2. Let remote value be a map matching the NumberValue production in the local end definition, with the value property set to serialized.

    Type(value) is Boolean
    Let remote value be a map matching the BooleanValue production in the local end definition, with the value property set to value.
    Type(value) is BigInt
    Let remote value be a map matching the BigIntValue production in the local end definition, with the value property set to the result of running the ToString operation on value.
    Type(value) is Symbol
    Let remote value be a map matching the SymbolValue production in the local end definition, with the objectId property set to the object id for an object value.
    IsArray(value)
    1. Let serialized be null.

    2. If value is not in the set of known objects, and max depth is not null and greater than 0, run the following steps:

      1. Append value to the set of known objects

      2. Let serialized be the result of serialize as a list given CreateArrayIterator(value, value), max depth, node details and set of known objects.

    3. Let remote value be a map matching the ArrayValue production in the local end definition, with the objectId property set to the object id for an object value, and the value field set to serialized if it’s not null, or ommitted otherwise.

    IsRegExp(value)
    1. Let pattern be ToString(Get(value, "source")).

    2. Let flags be ToString(Get(value, "flags")).

    3. Let serialized be the string-concatenation of "/", pattern, "/", and flags.

    4. Let remote value be a map matching the RegExpValue production in the local end definition, with the objectId property set to the object id for an object object and the value set to serialized

    value has a [[DateValue]] internal slot.
    1. Let serialized be ToDateString(thisTimeValue(value)).

    2. Let remote value be a map matching the DateValue production in the local end definition, with the objectId property set to the object id for an object object and the value set to serialized.

    value has a [[MapData]] internal slot
    1. Let serialized be null.

    2. If value is not in the set of known objects, and max depth is not null and greater than 0, run the following steps:

      1. Append value to the set of known objects

      2. Let serialized be the result of serialize as a mapping given CreateMapIterator(value, key+value), max depth, node details and set of known objects.

    1. Let remote value be a map matching the MapValue production in the local end definition, with the objectId property set to the object id for an object value, and the value field set to serialized if it’s not null, or ommitted otherwise.

    value has a [[SetData]] internal slot
    1. Let serialized be null.

    2. If value is not in the set of known objects, and max depth is not null and greater than 0, run the following steps:

      1. Append value to the set of known objects

      2. Let serialized be the result of serialize as a list given CreateSetIterator(value, value), max depth, node details and set of known objects.

    1. Let remote value be a map matching the SetValue production in the local end definition, with the objectId property set to the object id for an object value, and the value field set to serialized if it’s not null, or ommitted otherwise.

    value has a [[WeakMapData]] internal slot
    Let remote value be a map matching the WeakMapValue production in the local end definition, with the objectId property set to the object id for an object value.
    value has a [[WeakSetData]] internal slot
    Let remote value be a map matching the WeakSetValue production in the local end definition, with the objectId property set to the object id for an object value.
    value has an [[ErrorData]] internal slot
    Let remote value be a map matching the ErrorValue production in the local end definition, with the objectId property set to the object id for an object value.
    IsPromise(value)
    Let remote value be a map matching the PromiseValue production in the local end definition, with the objectId property set to the object id for an object value.
    value has a [[TypedArrayName]] internal slot
    Let remote value be a map matching the TypedArrayValue production in the local end definition, with the objectId property set to the object id for an object value.
    value has an [[ArrayBufferData]] internal slot
    Let remote value be a map matching the ArrayBufferValue production in the local end definition, with the objectId property set to the object id for an object value.
    value is a platform object that implements Node
    1. Let serialized be null.

    2. If node details is true, run the following steps:

      1. Let serialized be a map.

      2. "nodeType", Get(value, "nodeType"), false)

      3. Set serialized["nodeValue"] to Get(value, "nodeValue")

      4. If value is an Element or an Attribute:

        1. Set serialized["localName" to Get(value, "localName")

        2. Set serialized["namespaceURI"] to Get(value, "namespaceURI")

      5. Let child node count be the size of serialized’s children.

      6. Set serialized["childNodeCount" to child node count

      7. If max depth is equal to 0 let children be null. Otherwise, let children be an empty list and, for each node child in the children of value:

        1. Let child depth be max depth - 1 if max depth is not null, or null otherwise.

        2. Let serialized be the result of serialize as a remote value with child, child depth, node details and set of known objects.

        3. Append serialized to children.

      8. Set serialized["children"] to children.

      9. If value is an Element:

        1. Let attributes be a new map.

        2. For each attribute in value’s attribute list:

          1. Let name be attribute’s qualified name

          2. Let value be attribute’s value.

          3. Set attributes[name] to value

        3. Set serialized["attributes"] to attributes.

        4. Let shadow root be value’s shadow root.

        5. If shadow root is null, let serialized shadow be null. Otherwise run the following substeps:

          1. Let child depth be max depth - 1 if max depth is not null, or null otherwise.

          2. Let serialized shadow be the result of serialize as a remote value with shadow root, child depth, false and set of known objects.

            Note: this means the objectId for the shadow root will be serialized irrespective of whether the shadow is open or closed, but no properties of the node will be returned.

        6. Set= serialized["shadowRoot"] to serialized shadow.

    3. Let remote value be a map matching the NodeValue production in the local end definition, with the objectId property set to the object id for an object value, and value set to serialized, if serialized is not null.

    value is a platform object that implements WindowProxy
    1. Let remote value be a map matching the WindowProxyValue production in the local end definition, with the objectId property set to the object id for an object value.
    value is a platform object
    1. Let remote value be a map matching the ObjectValue production in the local end definition, with the objectId property set to the object id for an object value.
    IsCallable(value)
    Let remote value be a map matching the FunctionValue production in the local end definition, with the objectId property set to the object id for an object value.
    Otherwise:
    1. Assert: type(value) is Object

    2. let serialized be null.

    3. If value is not in the set of known objects, and max depth is greater than 0, run the following steps:

      1. Append value to the set of known objects

      2. Let serialized be the result of serialize as a mapping given EnumerableOwnPropertyNames(value, key+value), max depth, node details and set of known objects

    4. Let remote value be a map matching the ObjectValue production in the local end definition, with the objectId property set to the object id for an object value, and the value field set to serialized.

  2. Return remote value

Does it make sense to use the same depth parameter for nodes and objects in general?

To serialize as a list given iterable, max depth, node details and set of known objects:
  1. Let serialized be a new list.

  2. For each child value in iterable:

    1. Let child depth be max depth - 1 if max depth is not null, or null otherwise.

    2. Let serialized child be the result of serialize as a remote value with arguments child value, child depth, node details and set of known objects.

    3. Append serialized child to serialized.

  3. Return serialized

this assumes for-in works on iterators

To serialize as a mapping given iterable, max depth, node details and set of known objects:
  1. Let serialized be a new list.

  2. For item in iterable:

    1. Assert: IsArray(item)

    2. Let property be CreateListFromArrayLike(item)

    3. Assert: property is a list of size 2

    4. Let key be property[0] and let value be property[1]

    5. Let child depth be max depth - 1 if max depth is not null, or null otherwise.

    6. If Type(key) is String, let serialized key be child key, otherwise let serialized key be the result of serialize as a remote value with arguments child key, child depth, node details and set of known objects.

    7. Let serialized value be the result of serialize as a remote value with arguments value, child depth, node details and set of known objects.

    8. Let serialized child be («serialized key, serialized value»).

    9. Append serialized child to serialized.

  3. Return serialized

6. Modules

6.1. The session Module

The session module contains commands and events for monitoring the status of the remote end.

6.1.1. Definition

remote end definition

SessionCommand = (SessionStatusCommand //
                  SessionSubscribeCommand)

local end definition

SessionResult = (StatusResult)

To update the event map, given session, list of event names, browsing contexts, and enabled:

Note: The return value of this algorithm is a map between event names and contexts. When the events are being enabled globally, the contexts in the return value are those for which the event was already enabled. When the events are enabled for specific contexts, the contexts in the return value are those for which the event are now enabled but were not previously. When events are disabled, the return value is always empty.

  1. Let global event set be a clone of the global event set for session.

  2. Let event map be a new map.

  3. For each keyvalue of the browsing context event map for session:

    1. Set event map[key] to a clone of value.

  4. Let enabled events be a new map.

  5. Let event names be an empty set.

    1. For each entry name in list of event names, let event names be the union of event names and the result of trying to obtain a set of event names with name.

  6. If browsing contexts is null:

    1. If enabled is true:

      1. For each event name of event names:

        1. If global event set doesn’t contain event name:

          1. Let event enabled contexts be the event enabled browsing contexts given session and event name

          2. Add event name to global event set.

          3. For each context of event enabled contexts, remove event name from event map[context].

          4. Set enabled events[event name] to event enabled contexts.

    2. If enabled is false:

      1. For each event name in event names:

        1. If global event set contains event name, remove event name from global event set. Otherwise return error with error code invalid argument.

  7. Otherwise, if browsing contexts is not null:

    1. Let targets be an empty map.

    2. For each context id in browsing contexts:

      1. Let context be the result of trying to get a browsing context with context id.

      2. Let top-level context be the top-level browsing context for context.

      3. If event map does not contain top-level context, set event map[top-level context] to a new set.

      4. Set targets[top-level context] to event map[top-level context].

    3. For each event name in event names:

      1. If enabled is true and global event set contains event name, continue.

      2. For each contexttarget in targets:

        1. If enabled is true and target does not contain event name:

          1. Add event name to target.

          2. If enabled events does not contain event name, set enabled events[event name] to a new set.

          3. Append context to enabled events[event name].

        2. If enabled is false:

          1. If target contains event name, remove event name from target. Otherwise return error with error code invalid argument.

  8. Set the global event set for session to global event set.

  9. Set the browsing context event map for session to event map.

  10. Return success with data enabled events.

Note: Implementations that do additional work when an event is enabled, e.g. subscribing to the relevant engine-internal events, will likely perform those additional steps when updating the event map. This specification uses a model where hooks are always called and then the event map is used to filter only those that ought to be returned to the local end.

6.1.2. Commands

6.1.2.1. The session.status Command

The session.status command returns information about whether a remote end is in a state in which it can create new sessions, but may additionally include arbitrary meta information that is specific to the implementation.

Command Type
SessionStatusCommand = {
  method: "session.status",
  params: EmptyParams,
}
Return Type
SessionStatusResult = {
  ready: bool,
  message: text,
}

The remote end steps are:

  1. Let body be a new map with the following properties:

    "ready"
    The remote end’s readiness state.
    "message"
    An implementation-defined string explaining the remote end’s readiness state.
  2. Return success with data body

6.1.2.2. The session.subscribe Command

The session.subscribe command enables certain events either globally or for a set of browsing contexts

This needs to be generalized to work with realms too

Command Type
SessionSubscribeCommand = {
  method: "session.subscribe",
  params: SubscribeParameters
}

SessionSubscribeParameters = {
  events: [*text],
  ?contexts: [*BrowsingContext],
}
Return Type
EmptyResult

The remote end steps with command parameters are:

  1. Let the list of event names be the value of the events field of command parameters

  2. Let the list of contexts be the value of the contexts field of command parameters if it is present or null if it isn’t.

  3. Let enabled events be the result of trying to update the event map with current session, list of event names , list of contexts and enabled true.

  4. Let subscribe step events be a new map.

  5. For each event namecontexts in enabled events:

    1. If the event with event name event name defines remote end subscribe steps, set subscribe step events[event name] to contexts.

  6. Sort in ascending order subscribe step events using the following less than algorithm given two entries with keys event name one and event name two:

    1. Let event one be the event with name event name one

    2. Let event two be the event with name event name two

    3. Return true if event one’s subscribe priority is less than event two’s susbscribe priority, or false otherwise.

  7. For each event namecontexts in subscribe step events:

    1. If list of contexts is null, let include contexts be a list of all top-level browsing contexts that are not contained in contexts, and let include global be true.

      Otherwise let include contexts be contexts and let include global be false.

    2. Run the remote end subscribe steps for the event with event name event name given include contexts and include global.

  8. Return success with data null.

6.1.2.3. The session.unsubscribe Command

The session.unsubscribe command disables events either globally or for a set of browsing contexts

This needs to be generalised to work with realms too

Command Type
SessionUnsubscribeCommand = {
  method: "session.unsubscribe",
  params: SubscribeParameters
}
Return Type
EmptyResult

The remote end steps with command parameters are:

  1. Let the list of event names be the value of the events field of command parameters.

  2. Let the list of contexts be the value of the contexts field of command parameters if it is present or null if it isn’t.

  3. Try to update the event map with current session, list of event names, list of contexts and enabled false.

  4. Return success with data null.

6.2. The browsingContext Module

The browsingContext module contains commands and events relating to browsing contexts.

The progress of navigation is communicated using an immutable WebDriver navigation status struct, which has the following items:

id
The navigation id for the navigation, or null when the navigation is canceled before making progress.
status
A status code that is either "canceled", "pending", or "complete".
url
The URL which is being loaded in the navigation

6.2.1. Definition

remote end definition

BrowsingContextCommand = (
    BrowsingContextGetTreeCommand //
    BrowsingContextNavigateCommand
)

local end definition

BrowsingContextResult = (
    BrowsingContextGetTreeResult //
    BrowsingContextNavigateResult
)

BrowsingContextEvent = (
    BrowsingContextCreatedEvent //
    BrowsingContextDestroyedEvent //
    BrowsingContextNavigationStartedEvent //
    BrowsingContextFragmentNavigatedEvent //
    BrowsingContextDomContentLoadedEvent //
    BrowsingContextLoadEvent //
    BrowsingContextDownloadWillBegin //
    BrowsingContextNavigationAbortedEvent //
    BrowsingContextNavigationFailedEvent
)

6.2.2. Types

6.2.2.1. The browsingContext.BrowsingContext Type

remote end definition and local end definition

BrowsingContext = text;

Each browsing context has an associated browsing context id, which is a string uniquely identifying that browsing context. This is implicitly set when the context is created. For browsing contexts with an associated WebDriver window handle the browsing context id must be the same as the window handle.

To get a browsing context given context id:
  1. If context id is null, return success with data null.

  2. If there is no browsing context with browsing context id context id return error with error code no such frame

  3. Let context be the browsing context with id context id.

  4. Return success with data context

6.2.2.2. The browsingContext.BrowsingContextInfo Type

local end definition

BrowsingContextInfoList = [* BrowsingContextInfo]

BrowsingContextInfo = {
  context: BrowsingContext,
  ?parent: BrowsingContext / null,
  url: text,
  children: BrowsingContextInfoList / null
}

The BrowsingContextInfo type represents the properties of a browsing context.

To get the browsing context info given context, depth and max depth:
  1. Let context id be the browsing context id for context.

  2. If context has a parent browsing context let parent id be the browsing context id of that parent. Otherwise let parent id be null.

  3. Let document be context’s active document.

  4. Let url be the result of running the URL serializer, given document’s URL.

    Note: This includes the fragment component of the URL.

  5. Let child info be the result of get the descendent browsing contexts given context id, depth + 1, and max depth.

  6. Let context info be a map matching the BrowsingContextInfo production with the context field set to context id, the parent field set to parent id if depth is 0, or unset otherwise, the url field set to url, and the children field set to child info.

  7. Return context info.

To get the descendent browsing contexts given parent id, depth and max depth:
  1. If max depth is greater than zero, and depth is equal to max depth, return null.

  2. Let parent be the result of trying to get a browsing context given parent id.

  3. If parent is null, let child contexts be a list containing all top-level browsing contexts. Otherwise let child contexts be a list containing all browsing contexts which are child browsing contexts of parent.

  4. Let contexts info be a list.

  5. For each context of child contexts:

    1. Let info be the result of get the browsing context info given context, depth, and max depth.

    2. Append info to contexts info

  6. Return contexts info

6.2.2.3. The browsingContext.Navigation Type

remote end definition and local end definition

Navigation = text;

The Navigation type is a unique string identifying an ongoing navigation.

TODO: Link to the definition in the HTML spec.

6.2.2.4. The browsingContext.NavigationInfo Type

local end definition:

NavigationInfo = {
  context: BrowsingContext,
  navigation: Navigation / null,
  url: text,
}

The NavigationInfo type provides details of an ongoing navigation.

To get the navigation info, given context and navigation status:
  1. Let context id be the browsing context id for context.

  2. Let navigation id be navigation status’s id.

  3. Let url be navigation status’s url.

  4. Return a map matching the NavigationInfo production, with the context field set to context id, the navigation field set to navigation id, and the url field set to the result of the URL serializer given url.

6.2.3. Commands

6.2.3.1. The browsingContext.getTree Command

The browsingContext.getTree command returns a tree of all browsing contexts that are descendents of the given context, or all top-level contexts when no parent is provided.

Command Type
BrowsingContextGetTreeCommand = {
  method: "browsingContext.getTree",
  params: BrowsingContextGetTreeParameters
}

BrowsingContextGetTreeParameters = {
  ?maxDepth: uint,
  ?parent: BrowsingContext,
}
Return Type
BrowsingContextGetTreeResult = {
  contexts: BrowsingContextInfoList
}
The remote end steps with command parameters are:
  1. Let the parent id be the value of the parent field of command parameters if present, or null otherwise.

  2. Let max depth be the value of the maxDepth field of command parameters if present, or 0 otherwise.

  3. Let depth be 0.

  4. Let contexts be the result of get the descendent browsing contexts, given parent id, depth, and max depth.

  5. Let body be a map matching the BrowsingContextGetTreeResult production, with the contexts field set to contexts.

  6. Return success with data body.

6.2.3.2. The browsingContext.navigate Command

The browsingContext.navigate command navigates a browsing context to the given URL.

Command Type
BrowsingContextNavigateCommand = {
  method: "browsingContext.navigate",
  params: BrowsingContextNavigateParameters
}

BrowsingContextNavigateParameters = {
  context: BrowsingContext,
  url: text,
  ?wait: ReadinessState,
}

 ReadinessState = "none" / "interactive" / "complete"
Return Type
BrowsingContextNavigateResult = {
    navigation: Navigation / null,
    url: text,
}
The remote end steps with command parameters are:
  1. Let context id be the value of the context field of command parameters.

  2. Let context be the result of trying to get a browsing context with context id.

  3. Assert: context is not null.

  4. Let wait condition be the value of the wait field of command parameters if present, or "none" otherwise.

  5. Let url be the value of the url field of command parameters.

  6. Let document be context’s active document.

  7. Let base be document’s base URL.

  8. Let url record be the result of applying the URL parser to url, with base URL base.

  9. If url record is failure, return error with error code invalid argument.

  10. Let request be a new request whose URL is url record.

  11. Let navigation id be the string representation of a UUID based on truly random, or pseudo-random numbers.

  12. Navigate context with resource request, and using context as the source browsing context, and with navigation id navigation id.

  13. Let (event received, navigate status) be await given «"navigation started", "navigation failed", and "fragment navigated"» and navigation id.

  14. Assert: navigate status’s id is navigation id.

  15. If navigate status’s status is "complete":

    1. Let body be a map matching the BrowsingContextNavigateResult production, with the navigation field set to navigation id, and the url field set to the result of the URL serializer given navigate status’s url.

    2. Return success with data body, and then run the following steps in parallel:

      1. Run the WebDriver-BiDi fragment navigated steps given context and navigate status

    Note: this is the case if the navigation only caused the fragment to change. The parallel steps here ensure that we return the command result before emitting the event, so the navigation id is known.

  16. If navigate status’s status is "canceled" return error with error code unknown error.

    TODO: is this the right way to handle errors here?

  17. Assert: navigate status’s status is "pending" and navigation id is not null.

  18. If wait condition is "none":

    1. Let body be a map matching the BrowsingContextNavigateResult production, with the navigation field set to navigation id, and the url field set to the result of the URL serializer given navigate status’s url.

    2. Return success with data body, and then run the following steps in parallel:

      1. Run the WebDriver-BiDi navigation started steps given context and navigate status

  19. Run the WebDriver-BiDi navigation started steps given context and navigate status

    Note: this event was previously suppressed to ensure that it would come after the command response in the case that wait condition is "none".

    Replace this suppression mechanism with an event queue.

  20. If wait condition is "interactive", let event name be "domContentLoaded", otherwise let event name be "load".

  21. Let (event received, status) be await given «event name, "download started", "navigation aborted", "navigation failed"» and navigation id.

  22. If event received is "navigation failed" return error with error code unknown error.

    Are we surfacing enough information about what failed and why with an error here? What error code do we want? Is there going to be a problem where local ends parse the implementation-defined strings to figure out what actually went wrong?

  23. Let body be a map matching the BrowsingContextNavigateResult production, with the navigation field set to status’s id, and the url field set to the result of the URL serializer given status’s url.

  24. Return success with data body.

6.2.4. Events

6.2.4.1. The browsingContext.contextCreated Event
Event Type
 BrowsingContextCreatedEvent = {
  method: "browsingContext.contextCreated",
  params: BrowsingContextInfo
}

To Recursively emit context created events given context:

  1. Emit a context created event with context.

  2. For each child browsing context, child, of context:

    1. Recursively emit context created events given child.

To Emit a context created event given context:

  1. Let related contexts be a set containing context.

  2. Let params be the result of get the browsing context info given context, 0, and 1.

  3. Let body be a map matching the BrowsingContextCreatedEvent production, with the params field set to params.

  4. Emit an event with body and related contexts.

The remote end event trigger is:

When the create a new browsing context algorithm is invoked, after the active document of the browsing context is set, run the following steps:

  1. Let context be the newly created browsing context.

  2. Emit a context created event given context.

The remote end subscribe steps, with subscribe priority 1, given contexts and include global are:

  1. For each context in contexts:

    1. Recursively emit context created events given context.

6.2.4.2. The browsingContext.contextDestroyed Event
Event Type
 BrowsingContextDestroyedEvent = {
  method: "browsingContext.contextDestroyed",
  params: BrowsingContextInfo
}
The remote end event trigger is:

Define the following browsing context tree discarded steps:

  1. If the current session is null, return.

  2. Let context be the browsing context being discarded.

  3. Let params be the result of get the browsing context info, given context, 0, and 0.

  4. Let body be a map matching the BrowsingContextDestroyedEvent production, with the params field set to params.

  5. Let related browsing contexts be a set containing the parent browsing context of context, if that is not null, or an empty set otherwise.

  6. Emit an event with body and related browsing contexts.

the way this hooks into HTML feels very fragile. See https://github.com/whatwg/html/issues/6194

It’s unclear if we ought to only fire this event for browsing contexts that have active documents; navigation can also cause contexts to become inaccessible but not yet get discarded because bfcache.

6.2.4.3. The browsingContext.navigationStarted Event
Event Type
 BrowsingContextNavigationStartedEvent = {
  method: "browsingContext.navigationStarted",
  params: NavigationInfo
}
The remote end event trigger is the WebDriver-BiDi navigation started steps given context and navigation status:
  1. If the current session is null, return.

  2. Let params be the result of get the navigation info given context and navigation status.

  3. Let body be a map matching the BrowsingContextNavigationStarted production, with the params field set to params.

  4. Let navigation id be navigation status’s id.

  5. Let related browsing contexts be a set containing context.

  6. Resume with "navigation started", navigation id, and navigation status.

  7. Emit an event with body and related browsing contexts.

6.2.4.4. The browsingContext.fragmentNavigated Event
Event Type
 BrowsingContextFragmentNavigatedEvent = {
  method: "browsingContext.fragmentNavigated",
  params: NavigationInfo
}
The remote end event trigger is the WebDriver-BiDi fragment navigated steps given context and navigation status:
  1. If the current session is null, return.

  2. Let params be the result of get the navigation info given context and navigation status.

  3. Let body be a map matching the BrowsingContextFragmentNavigatedEvent production, with the params field set to params.

  4. Let navigation id be navigation status’s id.

  5. Let related browsing contexts be a set containing context.

  6. Resume with "fragment navigated", navigation id, and navigation status.

  7. Emit an event with body and related browsing contexts.

6.2.4.5. The browsingContext.domContentLoaded Event
Event Type
 BrowsingContextDomContentLoadedEvent = {
  method: "browsingContext.domContentLoaded",
  params: NavigationInfo
}
The remote end event trigger is the WebDriver-BiDi DOM content loaded steps given context and navigation status:
  1. If the current session is null, return.

  2. Let params be the result of get the navigation info given context and navigation status.

  3. Let body be a map matching the BrowsingContextDomContentLoadedEvent production, with the params field set to params.

  4. Let related browsing contexts be a set containing context.

  5. Let navigation id be navigation status’s id.

  6. Resume with "domContentLoaded", navigation id, and navigation status.

  7. Emit an event with body and related browsing contexts.

6.2.4.6. The browsingContext.load Event
Event Type
 BrowsingContextLoadEvent = {
  method: "browsingContext.load",
  params: NavigationInfo
}
The remote end event trigger is the WebDriver-BiDi load complete steps given context and navigation status:
  1. If the current session is null, return.

  2. Let params be the result of get the navigation info given context and navigation status.

  3. Let body be a map matching the BrowsingContextLoadEvent production, with the params field set to params.

  4. Let related browsing contexts be a set containing context.

  5. Let navigation id be navigation status’s id.

  6. Resume with "load", navigation id and navigation status.

  7. Emit an event with body and related browsing contexts.

6.2.4.7. The browsingContext.downloadWillBegin Event
Event Type
 BrowsingContextDownloadWillBegin = {
  method: "browsingContext.downloadWillBegin",
  params: NavigationInfo
}
The remote end event trigger is the WebDriver-BiDi download started steps given context and navigation status:
  1. If the current session is null, return.

  2. Let params be the result of get the navigation info given context and navigation status.

  3. Let body be a map matching the BrowsingContextDownloadWillBegin production, with the params field set to params.

  4. Let navigation id be navigation status’s id.

  5. Let related browsing contexts be a set containing context.

  6. Resume with "download started", navigation id, and navigation status.

  7. Emit an event with body and related browsing contexts.

6.2.4.8. The browsingContext.navigationAborted Event
Event Type
 BrowsingContextNavigationAborted = {
  method: "browsingContext.navigationAborted",
  params: NavigationInfo
}
The remote end event trigger is the WebDriver-BiDi navigation aborted steps given context and navigation status:
  1. If the current session is null, return.

  2. Let params be the result of get the navigation info given context and navigation status.

  3. Let body be a map matching the BrowsingContextNavigationAborted production, with the params field set to params.

  4. Let navigation id be navigation status’s id.

  5. Let related browsing contexts be a set containing context.

  6. Resume with "navigation aborted", navigation id, and navigation status.

  7. Emit an event with body and related browsing contexts.

6.2.4.9. The browsingContext.navigationFailed Event
Event Type
 BrowsingContextNavigationFailed = {
  method: "browsingContext.navigationFailed",
  params: NavigationInfo
}
The remote end event trigger is the WebDriver-BiDi navigation failed steps given context and navigation status:
  1. If the current session is null, return.

  2. Let params be the result of get the navigation info given context and navigation status.

  3. Let body be a map matching the BrowsingContextNavigationFailed production, with the params field set to params.

  4. Let navigation id be navigation status’s id.

  5. Let related browsing contexts be a set containing context.

  6. Resume with "navigation failed", navigation id, and navigation status.

  7. Emit an event with body and related browsing contexts.

6.3. The script Module

The script module contains commands and events relating to script realms and execution.

6.3.1. Definition

Remote end definition

ScriptCommand = (ScriptGetRealmsCommand)

local end definition

ScriptResult = (ScriptGetRealmsResult)

ScriptEvent = (
    ScriptRealmCreatedEvent //
    ScriptRealmDestroyedEvent
)

6.3.2. Types

6.3.2.1. The script.Realm type

Remote end definition and local end definition

Realm = text;

Each realm has an associated realm id, which is a string uniquely identifying that realm. This is implicitly set when the realm is created.

6.3.2.2. The script.RealmInfo type

Local end definition

RealmInfo = {
  realm: Realm,
  type: RealmType,
  origin: text
}

RealmType = "window" / "dedicated-worker" / "shared-worker" / "service-worker" / "worker" / "paint-worklet" / "audio-worklet" / "worklet" / text

The RealmInfo type represents the properties of a realm.

To get the realm info given environment settings:
  1. Let realm be environment settingsrealm execution context's Realm component.

  2. Let realm id be the realm id for realm.

  3. Run the steps under the first matching condition:

    The global object specified by environment settings is a Window object
    1. Let type be "window".

    The global object specified by environment settings is a DedicatedWorkerGlobalScope object
    1. Let type be "dedicated-worker".

    The global object specified by environment settings is a SharedWorkerGlobalScope object
    1. Let type be "shared-worker".

    The global object specified by environment settings is a ServiceWorkerGlobalScope object
    1. Let type be "service-worker".

    The global object specified by environment settings is a WorkerGlobalScope object
    1. Let type be "worker".

    The global object specified by environment settings is a PaintWorkletGlobalScope object
    1. Let type be "paint-worklet".

    The global object specified by environment settings is a AudioWorkletGlobalScope object
    1. Let type be "audio-worklet".

    The global object specified by environment settings is a WorkletGlobalScope object
    1. Let type be "worklet".

    Otherwise:
    1. Return null.

  4. Let origin be the serialization of an origin given environment settings’s origin.

  5. Let realm info be a map matching the RealmInfo production, with the realm field set to realm id, the type field set to type and the origin field set to origin.

  6. Return realm info

We currently don’t provide information about realms of unknown types. That might be a problem for e.g. extension-related realms.

Note: Future variations of this specification will retain the invariant that the last component of the type name after splitting on "-" will always be "worker" for globals implementing WorkerGlobalScope, and "worklet" for globals implementing WorkletGlobalScope.

6.3.3. Commands

6.3.3.1. The script.getRealms Command

The script.getRealms command returns a list of all realms, optionally filtered to realms of a specific type, or to the realm associated with the document currently loaded in a specified browsing context.

Command Type
ScriptGetRealmsCommand = {
  method: "script.getRealms",
  params: GetRealmsParameters
}

GetRealmsParameters = {
  ?context: BrowsingContext,
  ?type: RealmType,
}
Return Type
RealmInfoList = [* RealmInfo]

ScriptGetRealmsResult = {
  realms: RealmInfoList
}
The remote end steps with command parameters are:
  1. Let environment settings be a list of all the environment settings objects that have their execution ready flag set.

  2. If command parameters contains context:

    1. Let context be the result of trying to get a browsing context with command parameters["context"].

    2. Let document be context’s active document.

    3. Let context environment settings be a list.

    4. For each settings of environment settings:

      1. If any of the following conditions hold:

        Append settings to context environment settings.

    5. Set environment settings to context environment settings.

  3. Let realms be a list.

  4. For each settings of environment settings:

    1. Let realm info be the result of get the realm info given settings

    2. If command parameters contains type and realm info["type"] is not equal to command parameters["type"] then continue.

    3. If realm info is not null, append realm info to realms.

  5. Let body be a map matching the GetRealmsResult production, with the realms field set to realms.

  6. Return success with data body.

Extend this to also allow realm parents e.g. for nested workers? Or get all ancestor workers.

We might want to have a more sophisticated filter system than just a literal match.

6.3.4. Events

6.3.4.1. The script.realmCreated Event
Event Type
 ScriptRealmCreatedEvent = {
  method: "script.realmCreated",
  params: RealmInfo
}
The remote end event trigger is:

When any of the set up a window environment settings object, set up a worker environment settings object or set up a worklet environment settings object algorithms are invoked, immediately prior to returning the settings object:

  1. Let environment settings be the newly created environment settings object.

  2. Let realm info be be the result of get the realm info given environment settings.

  3. If realm info is null, return.

  4. Let related browsing contexts be the result of get related browsing contexts given environment settings.

  5. Let body be a map matching the RealmCreatedEvent production, with the params field set to realm info.

  6. Emit an event with body and related browsing contexts.

The remote end subscribe steps with subscribe priority 2, given contexts and include global are:

  1. Let environment settings be a list of all the environment settings objects that have their execution ready flag set.

  2. For each settings of environment settings:

    1. If the responsible document of settings is a Document:

      1. Let context be settings’s responsible document's browsing context's top-level browsing context.

      2. If context is not in contexts, continue.

      3. Append context to related contexts.

      Otherwise, if include global is false, continue.

    2. Let realm info be the result of get the realm info given settings

    3. Let body be a map matching the RealmCreatedEvent production, with the params field set to realm info.

    4. Emit an event with body and related contexts.

Should the order here be better defined?

6.3.4.2. The script.realmDestroyed Event
Event Type
RealmDestroyedParameters = {
  realm: Realm
}

ScriptRealmDestroyedEvent = {
  method: "script.realmDestoyed",
  params: RealmDestroyedParameters
}
The remote end event trigger is:
Define the following unloading document cleanup steps with document:
  1. Let related browsing contexts be an empty set.

  2. Append document’s browsing context to related browsing contexts.

  3. For each worklet global scope in document’s worklet global scopes:

    1. Let realm be worklet global scope’s relevant Realm.

    2. Let realm id be the realm id for realm.

    3. Let params be a map mathcing the RealmDestroyedParameters production, with the realm field set of realm id.

    4. Let body be a map matching the RealmDestroyedEvent production, with the params field set to params.

    5. Emit an event with body and related browsing contexts.

  4. Let environment settings be the environment settings object whose responsible document is document.

  5. Let realm be environment settingsrealm execution context's Realm component.

  6. Let realm id be the realm id for realm.

  7. Let params be a map mathcing the RealmDestroyedParameters production, with the realm field set to realm id.

  8. Let body be a map matching the RealmDestroyedEvent production, with the params field set to params.

  9. Emit an event with body and related browsing contexts.

Whenever a worker event loop event loop is destroyed, either because the worker comes to the end of its lifecycle, or prematurely via the terminate a worker algorithm:

  1. Let environment settings be the environment settings object for which event loop is the responsible event loop.

  2. Let related browsing contexts be the result of get related browsing contexts given environment settings.

  3. Let realm be environment settings’s environment settings object’s Realm.

  4. Let realm id be the realm id for realm.

  5. Let params be a map mathcing the RealmDestroyedParameters production, with the realm field set of realm id.

  6. Let body be a map matching the RealmDestroyedEvent production, with the params field set to params.

6.4. Log

The log module contains functionality and events related to logging.

A session has a log event buffer which is a map from browsing context id to a list of log events for that context that have not been emitted. User agents may impose a maximum size on this buffer, subject to the condition that if events A and B happen in the same context with A occuring before B, and both are added to the buffer, the entry for B must not be removed before the entry for A.

To buffer a log event given contexts and event:

  1. Let buffer be the current session's log event buffer.

  2. Let context ids be a new list.

  3. For each context of contexts:

    1. Append the browsing context id for context to context ids.

  4. For each context id in context ids:

    1. Let other contexts be an empty list

    2. For each other id in context ids:

    3. If other id is not equal to context id, append other id to other contexts.

    4. If buffer does not contain context id, let buffer[context id] be a new list.

    5. Append (event, other contexts) to buffer[context id].

Note: we store the other contexts here so that each event is only emitted once. In practice this is only relevant for workers that can be associated with multiple browsing contexts.

Do we want to key this on browsing context or top-level browsing context? The difference is in what happens if an event occurs in a frame and that frame is then navigated before the local end subscribes to log events for the top level context.

6.4.1. Definition

remote end definition

LogEvent = (
  LogEntryAddedEvent
)

6.4.2. Types

6.4.2.1. log.LogEntry
LogLevel = "debug" / "info" / "warning" / "error"

LogEntry = (
  GenericLogEntry //
  ConsoleLogEntry //
  JavascriptLogEntry
)

BaseLogEntry = {
  level: LogLevel,
  text: text / null,
  timestamp: int,
  ?stackTrace: [*StackFrame],
}

GenericLogEntry = {
  BaseLogEntry,
  type: text,
}

ConsoleLogEntry = {
  BaseLogEntry,
  type: "console",
  method: text,
  realm: Realm,
  args: [*RemoteValue],
}

JavascriptLogEntry = {
  BaseLogEntry,
  type: "javascript",
}

Each log event is represented by a LogEntry object. This has a type property which represents the type of log entry added, a level property representing severity, a text property with the log message string itself, and a timestamp property corresponding to the time the log entry was generated. Specific variants of the LogEntry are used to represent logs from different sources, and provide additional fields specific to the entry type.

6.4.2.2. log.StackFrame
StackFrame = {
  url: text,
  functionName: text,
  lineNumber: int,
  columnNumber: int,
}

A frame in a stacktrace is represented by a StackFrame object. This has a url property, which represents the URL of the script, a functionName property which represents the name of the executing function, and lineNumber and columnNumber properties, which represent the line and column number of the executed code.

The current stack trace is a representation of the stack of the running execution context. The details of this are unspecified, and so the behaviour here is implementation defined, but the general process is as follows:

  1. Let stack trace be a new list.

  2. For each stack frame frame in the stack of the running execution context, starting from the most recently executed frame, run the following steps:

    1. Let url be the result of running the URL serializer, given the URL of frame’s associated script resource.

    2. Let functionName be the name of frame’s associated function.

    3. Let lineNumber and columnNumber be the one-based line and zero-based column numbers, respectively, of the location in frame’s associated script resource corresponding to frame.

    4. Let frame info be a new map matching the StackFrame production, with the url field set to url, the functionName field set to functionName, the lineNumber field set to lineNumber and the columnNumber field set to columnNumber.

  3. Append frame info to stack trace.

  4. Return stack trace

6.4.3. Events

6.4.3.1. entryAdded
Event Type
 LogEntryAddedEvent = {
  method: "log.entryAdded",
  params: LogEntry,
}

The remote end event trigger is:

Define the following console steps with method, args, and options:

  1. If method is "error" or "assert", let level be "error". If method is "debug" or "trace" let level be "debug". If method is "warn" or warning, let level be "warning". Otherwise let level be "info".

  2. Let timestamp be a time value representing the current date and time in UTC.

  3. Let text be an empty string.

  4. If Type(|args[0]) is String, and args[0] contains a formatting specifier, let formatted args be Formatter(args). Otherwise let formatted args be args.

    This is underdefined in the console spec, so it’s unclar if we can get interoperable behaviour here.

  5. For each arg in formatted args:

    1. If arg is not the first entry in args, append a U+0020 SPACE to text.

    2. If arg is a primitive value, append ToString(arg) to text. Otherwise append an implementation-defined string to text.

  6. Let serialized args be a new list.

  7. For each arg of args, append the result of serialize as a remote value given arg, null, true, and an empty set to serialized args.

  8. Let realm be the realm id of the current Realm Record.

  9. Let stack be the current stack trace.

  10. Let entry be a map matching the ConsoleLogEntry production, with the the level field set to level, the text field set to text, the timestamp field set to timestamp, the stackTrace field set to stack if stack is not null, or omitted otherwise, the method field set to method, the realm field set to realm and the args field set to serialized args.

  11. Let body be a map matching the LogEntryAddedEvent production, with the params field set to entry.

  12. Let settings be the current settings object

  13. Let related browsing contexts be the result of get related browsing contexts given settings.

  14. Let emitted be the result of emit an event with body and related browsing contexts.

  15. If emitted is false, append (related browsing contexts, body) to the current session's log event buffer.

Define the following error reporting steps with arguments script, line number, column number, message and handled:

  1. If handled is true return.

  2. Let settings be script’s settings object.

  3. Let stack be the current stack trace for the exception.

  4. Let entry be a map matching the JavascriptLogEntry production, with level set to "error", text set to message, and the timestamp field set to timestamp.

  5. Let related browsing contexts be the result of get related browsing contexts given settings.

  6. Let emitted be the result of emit an event with body and related browsing contexts.

  7. If emitted is false, buffer a log event given related browsing contexts and body.

Lots more things require logging. CDP has LogEntryAdded types xml, javascript, network, storage, appcache, rendering, security, deprecation, worker, violation, intervention, recommendation, other. These are in addition to the js exception and console API types that are represented by different methods.

Allow implementation-defined log types

The remote end subscribe steps, with subscribe priority 10, given contexts and include global are:

  1. For each context idevents in log event buffer:

    1. Let maybe context be the result of getting a browsing context given context id.

    2. If maybe context is an error, remove context id from log event buffer and continue.

    3. Let context be maybe context’s data

    4. Let top level context be context’s top-level browsing context.

    5. Let related contexts be a new set containing context.

    6. If include global is true and top level context is not in contexts, or if include global is false and top level context is in contexts:

      1. For each (event, other contexts) in events:

        1. Emit an event with event and related contexts.

        2. For each other context id in other contexts:

          1. If log event buffer contains other context id, remove event from log event buffer[other context id].

7. Patches to Other Specifications

This specification requires some changes to external specifications to provide the necessary integration points. It is assumed that these patches will be committed to the other specifications as part of the standards process.

7.1. HTML

The a browsing context is discarded algorithm is modified to read as follows:

To discard a browsing context browsingContext, run these steps:
  1. If this is not a recursive invocation of this algorithm, call any browsing context tree discarded steps defined in other applicable specifications with browsingContext.

  2. Discard all Document objects for all the entries in browsingContext’s session history.

  3. If browsingContext is a top-level browsing context, then remove a browsing context browsingContext.

The actual patch might be better to split the algorithm into an outer algorithm that is called by external callers and an inner algorithm that’s used for recursive calls. That’s quite hard to express as a patch to the specification since it requires changing multiple parts.

The report an error algorithm is modified with an additional step at the end:

  1. Call any error reporting steps defined in external specifications with script, line, col, message, and true if the error is handled, or false otherwise.

7.2. Console

Other specifications can define console steps. When any method of the console interface is called, with method name method and argument args:

  1. If that method does not call the Printer operation, call any console steps defined in external specification with arguments method, args and, undefined.

    Otherwise, at the point when the Printer operation is called with arguments name, printerArgs and options (which is undefined if the argument is not provided), call any console steps defined in external specification with arguments name, printerArgs, and options.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[CONSOLE]
Dominic Farolino; Robert Kowalski; Terin Stock. Console Standard. Living Standard. URL: https://console.spec.whatwg.org/
[CSS-PAINT-API-1]
Ian Kilpatrick; Dean Jackson. CSS Painting API Level 1. URL: https://drafts.css-houdini.org/css-paint-api-1/
[DOM]
Anne van Kesteren. DOM Standard. Living Standard. URL: https://dom.spec.whatwg.org/
[ECMASCRIPT]
ECMAScript Language Specification. URL: https://tc39.es/ecma262/multipage/
[FETCH]
Anne van Kesteren. Fetch Standard. Living Standard. URL: https://fetch.spec.whatwg.org/
[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[INFRA]
Anne van Kesteren; Domenic Denicola. Infra Standard. Living Standard. URL: https://infra.spec.whatwg.org/
[RFC4122]
P. Leach; M. Mealling; R. Salz. A Universally Unique IDentifier (UUID) URN Namespace. July 2005. Proposed Standard. URL: https://www.rfc-editor.org/rfc/rfc4122
[RFC6455]
I. Fette; A. Melnikov. The WebSocket Protocol. December 2011. Proposed Standard. URL: https://www.rfc-editor.org/rfc/rfc6455
[RFC8610]
H. Birkholz; C. Vigano; C. Bormann. Concise Data Definition Language (CDDL): A Notational Convention to Express Concise Binary Object Representation (CBOR) and JSON Data Structures. June 2019. Proposed Standard. URL: https://www.rfc-editor.org/rfc/rfc8610
[SERVICE-WORKERS-1]
Alex Russell; et al. Service Workers 1. URL: https://w3c.github.io/ServiceWorker/
[URL]
Anne van Kesteren. URL Standard. Living Standard. URL: https://url.spec.whatwg.org/
[WEBAUDIO]
Paul Adenot; Hongchan Choi. Web Audio API. URL: https://webaudio.github.io/web-audio-api/
[WEBDRIVER]
Simon Stewart; David Burns. WebDriver. URL: https://w3c.github.io/webdriver/
[WebIDL]
Boris Zbarsky. Web IDL. URL: https://heycam.github.io/webidl/

Informative References

[JSON-RPC]
JSON-RPC Working Group. JSON-RPC 2.0 Specification. 4 January 2013. URL: https://www.jsonrpc.org/specification

Issues Index

Surely there’s a better mechanism for doing this "wait for an event" thing.
Should we have something like microtasks to ensure this runs before any other tasks on the event loop?
Should this be an appendix?
Do we support > 1 connection for a single session?
Nothing seems to define what status code is used for UTF-8 errors.
This should also reset any internal state
Need to hook in to the session ending to allow the UA to close the listener if it wants.
Should this be explicitly per realm?
Add WASM types?
Should WindowProxy get attributes in a similar style to Node?
handle String / Number / etc. wrapper objects specially?
This doesn’t handle lone surrogates
Does it make sense to use the same depth parameter for nodes and objects in general?
this assumes for-in works on iterators
This needs to be generalized to work with realms too
This needs to be generalised to work with realms too
Replace this suppression mechanism with an event queue.
Are we surfacing enough information about what failed and why with an error here? What error code do we want? Is there going to be a problem where local ends parse the implementation-defined strings to figure out what actually went wrong?
the way this hooks into HTML feels very fragile. See https://github.com/whatwg/html/issues/6194
It’s unclear if we ought to only fire this event for browsing contexts that have active documents; navigation can also cause contexts to become inaccessible but not yet get discarded because bfcache.
We currently don’t provide information about realms of unknown types. That might be a problem for e.g. extension-related realms.
Extend this to also allow realm parents e.g. for nested workers? Or get all ancestor workers.
We might want to have a more sophisticated filter system than just a literal match.
Should the order here be better defined?
Do we want to key this on browsing context or top-level browsing context? The difference is in what happens if an event occurs in a frame and that frame is then navigated before the local end subscribes to log events for the top level context.
This is underdefined in the console spec, so it’s unclar if we can get interoperable behaviour here.
Lots more things require logging. CDP has LogEntryAdded types xml, javascript, network, storage, appcache, rendering, security, deprecation, worker, violation, intervention, recommendation, other. These are in addition to the js exception and console API types that are represented by different methods.
Allow implementation-defined log types
The actual patch might be better to split the algorithm into an outer algorithm that is called by external callers and an inner algorithm that’s used for recursive calls. That’s quite hard to express as a patch to the specification since it requires changing multiple parts.