1. Introduction
This section is non-normative.
WebDriver defines a protocol for introspection and remote control of user agents. This specification extends WebDriver by introducing bidirectional communication. In place of the strict command/response format of WebDriver, this permits events to stream from the user agent to the controlling software, better matching the evented nature of the browser DOM.
2. Infrastructure
This specification depends on the Infra Standard. [INFRA]
Network protocol messages are defined using CDDL. [RFC8610]
3. Protocol
This section defines the basic concepts of the WebDriver BiDi protocol. These terms are distinct from their representation at the transport layer.
The protocol is defined using a CDDL definition. For the convenience of implementors two seperate CDDL definitions are defined; the remote end definition which defines the format of messages produced on the local end and consumed on the remote end, and the local end definition which defines the format of messages produced on the remote end and consumed on the local end
3.1. Definition
This section gives the initial contents of the remote end definition and local end definition. These are augmented by the definition fragments defined in the remainder of the specification.
Command = { id: uint, CommandData, *text => any, } CommandData = ( SessionCommand // BrowsingContextCommand ) EmptyParams = { *text }
Message = ( CommandResponse // ErrorResponse // Event ) CommandResponse = { id: uint, result: ResultData, *text => any } ErrorResponse = { id: uint / null, error: "unknown error" / "unknown method" / "invalid argument", message: text, ?stacktrace: text, *text => any } ResultData = ( EmptyResult // SessionResult // BrowsingContextResult // ScriptResult ) EmptyResult = {} Event = { EventData, *text => any } EventData = ( BrowsingContextEvent // ScriptEvent )
3.2. Session
WebDriver BiDi uses the same session concept as WebDriver.
3.3. Modules
The WebDriver BiDi protocol is organized into modules.
Each module represents a collection of related commands and events pertaining to a certain aspect of the user agent. For example, a module might contain functionality for inspecting and manipulating the DOM, or for script execution.
Each module has a module name which is a string. The command name and event name for commands and events defined in the
module start with the module name followed by a period ".
".
Modules which contain commands define remote end definition fragments. These provide choices in the CommandData
group for the
module’s commands, and can also define additional definition properties. They
can also define local end definition fragments that provide additional choices
in the ResultData
group for the results of commands in the module.
Modules which contain events define local end definition fragments that are
choices in the Event
group for the module’s events.
An implementation may define extension modules. These must have a module name that contains a single colon ":
" character. The
part before the colon is the prefix; this is typically the same for all
extension modules specific to a given implementation and should be unique for a
given implementation. Such modules extend the local end definition and remote
end definition providing additional groups as choices for the defined commands and events.
3.4. Commands
A command is an asynchronous operation, requested by the local end and run on the remote end, resulting in either a result or an error being returned to the local end. Multiple commands can run at the same time, and commands can potentially be long-running. As a consequence, commands can finish out-of-order.
Each command is defined by:
-
A command type which is defined by a remote end definition fragment containing a group. Each such group has two fields:
-
method
which is a string literal of the form[module name].[method name]
. This is the command name. -
params
which defines a mapping containing data that to be passed into the command. The populated value of this map is the command parameters.
-
-
A result type, which is defined by a local end definition fragment.
-
A set of remote end steps which define the actions to take for a command given command parameters and return an instance of the command return type.
When commands are send from the local end they have a command id. This is an identifier used by the local end to identify the response from a particular command. From the point of view of the remote end this identifier is opaque and cannot be used internally to identify the command.
Note: This is because the command id is entirely controlled by the local end and isn’t necessarily unique over the course of a session. For example a local end which ignores all responses could use the same command id for each command.
The set of all command names is a set containing all the defined command names, including any belonging to extension modules.
3.5. Events
An event is a notification, sent by the remote end to the local end, signaling that something of interest has occurred on the remote end.
-
An event type is defined by a local end definition fragment containing a group. Each such group has two fields:
-
A remote end event trigger which defines when the event is triggered and steps to construct the event type data.
-
Optionally, a set of remote end subscribe steps, which define steps to take when a local end subscribes to an event. Where defined these steps have an associated subscribe priority which is an integer controlling the order in which the steps are run when multiple events are enabled at once, with lower integers indicating steps that run earlier.
A session has a global event set which is a set containing the event names for events that are enabled for all browsing contexts. This initially contains the event name for events that are in the default event set.
A session has a browsing context event map, which is a map with top-level browsing context keys and values that are a set of event names for events that are enabled in the given browsing context.
To obtain a list of event enabled browsing contexts given session and event name:
-
Let contexts be an empty set.
-
For each context → events of session’s browsing context event map:
-
If events contains event name, append context to contexts
-
-
Return contexts.
To determine if an event is enabled given session, event name and browsing contexts:
Note: browsing contexts is a set because a shared worker can be associated with multiple contexts.
-
Let top-level browsing contexts be an empty set.
-
For each browsing context of browsing contexts, append browsing context’s top-level browsing context to top-level browsing contexts.
-
Let event map be the browsing context event map for session.
-
For each browsing context of top-level browsing contexts:
-
If the global event set for session contains event name return true.
-
Return false.
To obtain a set of event names given an name:
-
Let events be an empty set.
-
If name contains a U+002E (period):
-
If name is the event name for an event, append name to events and return success with data events.
-
Return an error with error code Invalid Argument
-
-
Otherwise name is interpreted as representing all the events in a module. If name is not a module name return an error with error code Invalid Argument.
-
Append the event name for each event in the module with name name to events.
-
Return success with data events.
4. Transport
Message transport is provided using the WebSocket protocol. [RFC6455]
Note: In the terms of the WebSocket protocol, the local end is the client and the remote end is the server / remote host.
Note: The encoding of commands and events as messages is similar to JSON-RPC, but this specification does not normatively reference it. [JSON-RPC] The normative requirements on remote ends are instead given as a precise processing model, while no normative requirements are given for local ends.
A WebSocket listener is a network endpoint that is able to accept incoming WebSocket connections.
A WebSocket listener has a host, a port, a secure flag, and a list of WebSocket resources.
When a WebSocket listener listener is created, a remote end must start to listen for WebSocket connections on the host and port given by listener’s host and port. If listener’s secure flag is set, then connections established from listener must be TLS encrypted.
A remote end has a set of WebSocket listeners active listeners, which is initially empty.
A WebDriver session has a WebSocket connection which is a network connection that follows the requirements of the WebSocket protocol. This is initially null.
When a client establishes a WebSocket connection connection by connecting to one of the set of active listeners listener, the implementation must proceed according to the WebSocket server-side requirements, with the following steps run when deciding whether to accept the incoming connection:
-
Let resource name be the resource name from reading the client’s opening handshake. If resource name is not in listener’s list of WebSocket resources, then stop running these steps and act as if the requested service is not available.
-
Get a session ID for a WebSocket resource with resource name and let session id be that value. If session id is null then stop running these steps and act as if the requested service is not available.
-
If there is a session in the list of active sessions with session id as its session ID then let session be that session. Otherwise stop running these steps and act as if the requested service is not available.
-
Run any other implementation-defined steps to decide if the connection should be accepted, and if it is not stop running these steps and act as if the requested service is not available.
-
Otherwise set session’s WebSocket connection to connection, and proceed with the WebSocket server-side requirements when a server chooses to accept an incoming connection.
When a WebSocket message has been received for a WebSocket connection connection with type type and data data, a remote end must handle an incoming message given connection, type and data.
When the WebSocket closing handshake is started or when the WebSocket connection is closed for a WebSocket connection connection, a remote end must handle a connection closing given connection.
Note: Both conditions are needed because it is possible for a WebSocket connection to be closed without a closing handshake.
To construct a WebSocket resource name given a session session:
-
Return the result of concatenating the string "
/session/
" with session’s session ID.
To construct a WebSocket URL given a WebSocket listener listener and session session:
-
Let resource name be the result of constructing a WebSocket resource name given session.
-
Return a WebSocket URI constructed with host set to listener’s host, port set to listener’s port, path set to resource name, following the wss-URI construct if listener’s secure flag is set and the ws-URL construct otherwise.
To get a session ID for a WebSocket resource given resource name:
-
If resource name doesn’t begin with the byte string "
/session/
", return null. -
Let session id be the bytes in resource name following the "
/session/
" prefix. -
If session id is not the string representation of a UUID, return null.
-
Return session id.
-
If there is an existing WebSocket listener in the set of active listeners which the remote end would like to reuse, let listener be that listener. Otherwise let listener be a new WebSocket listener with implementation-defined host, port, secure flag, and an empty list of WebSocket resources.
-
Let resource name be the result of constructing a WebSocket resource name given session.
-
Append resource name to the list of WebSocket resources for listener.
-
Append listener to the remote end's active listeners.
-
Return listener.
Note: An intermediary node handling multiple sessions can use one or many WebSocket listeners. WebDriver defines that an endpoint node supports at most one session at a time, so it’s expected to only have a single listener.
Note: For an endpoint node the host in the above steps will
typically be "localhost
".
-
If type is not text, respond with an error given connection, null, and invalid argument, and finally return.
-
Assert: data is a scalar value string, because the WebSocket handling errors in UTF-8-encoded data would already have failed the WebSocket connection otherwise.
Nothing seems to define what status code is used for UTF-8 errors.
-
Let parsed be the result of parsing JSON into Infra values given data. If this throws an exception, then respond with an error given connection, null, and invalid argument, and finally return.
-
Match parsed against the remote end definition. If this results in a match:
-
Let matched be the map representing the matched data.
-
Assert: matched contains "
id
", "method
", and "params
". -
Let command id be matched["
id
"]. -
Let method be matched["
method
"] -
Run the following steps in parallel:
-
Let result be the result of running the remote end steps for the command with command name method given command parameters matched["
params
"] -
If result is an error, then respond with an error given connection, command id, and result’s error code, and finally return.
-
Let value be result’s data.
-
Assert: value matches the definition for the result type corresponding to the command with command name method.
-
Let response be a new map matching the
CommandResponse
production in the local end definition with theid
field set to command id and thevalue
field set to value. -
Let serialized be the result of serialize an infra value to JSON bytes given response.
-
Send a WebSocket message comprised of serialized over connection and return.
-
-
-
Otherwise:
-
Let command id be null.
-
If parsed is a map and parsed["
id
"] exists and is an integer greater than or equal to zero, set command id to that integer. -
Let error code be invalid argument.
-
If parsed is a map and parsed["
method
"] exists and is a string, but parsed["method
"] is not in the set of all command names, set error code to unknown command. -
Respond with an error given connection, command id, and error code.
-
To get related browsing contexts given an settings object settings:
-
Let related browsing contexts be an empty set
-
If the responsible document of settings is a Document, append the responsible document's browsing context to related browsing contexts.
Otherwise if the global object specified by settings is a
WorkerGlobalScope
, for each owner in the global object's owner set, if owner is a Document, append owner’s browsing context to related browsing contexts. -
Return related browsing contexts.
-
If the current session is null, or the current session's WebSocket Connection is null then return false.
-
If event is enabled given current session, body["
method
"] and related browsing contexts:-
Let connection be the current session's WebSocket connection.
-
Let serialized be the result of serialize an infra value to JSON bytes given body.
-
Send a WebSocket message comprised of serialized over connection.
-
Return true
-
-
Return false
-
Let error data be a new map matching the
ErrorResponse
production in the local end definition, with theid
field set to command id, theerror
field set to error code, themessage
field set to an implementation-defined string containing a human-readable definition of the error that occurred and thestacktrace
field optionally set to an implementation-defined string containing a stack trace report of the active stack frames at the time when the error occurred. -
Let response be the result of serialize an infra value to JSON bytes given error data.
Note: command id can be null, in which case the
id
field will also be set to null, not omitted from response. -
Send a WebSocket message comprised of response over connection.
To handle a connection closing given a WebSocket connection connection:
-
If there is a WebDriver session with connection as its connection, set the connection on that session to null.
Note: This does not end any session.
Need to hook in to the session ending to allow the UA to close the listener if it wants.
4.1. Establishing a Connection
WebDriver clients opt in to a bidirectional connection by requesting a
capability with the name "webSocketUrl
" and value
true.
This specification defines an additional webdriver capability with the capability name "webSocketUrl
".
webSocketUrl
" capability, with parameter value is:
-
If value is not a boolean, return error with code invalid argument.
-
Return success with data value.
webSocketUrl
" capability,
with parameter value is:
-
Let webSocketUrl be the result of getting a property named "
webSocketUrl
" from capabilities. -
If webSocketUrl is undefined, return.
-
Assert: webSocketUrl is true.
-
Let listener be the result of start listening for a WebSocket connection given session.
-
Set webSocketUrl to the result of constructing a WebSocket URL given listener and session.
-
Set a property on capabilities named "
webSocketUrl
" to webSocketUrl.
5. Common Data Types
5.1. Remote Value
Values accessible from the ECMAScript runtime are represented by a mirror
object, specified as RemoteValue
. The value’s type is specified in
the type
property. In the case of JSON-representable primitive
values, this contains the value in the value
property; in the case
of non-JSON-representable primitives, the value
property contains a
string representation of the value. For non-primitive objects, the objectId
property contains a string id that provides a unique
handle to the object, valid for its lifetime inside the engine. For some
non-primitive types, the value
property contains a representation
of the data in the ECMAScript object; for container types this can contain
further RemoteValue
instances. The value
property can
be null if there is a duplicate object i.e. the object has already been
serialized in the current RemoteValue
, perhaps as part of a
cycle, or otherwise when the maximum serialization depth is reached.
Nodes are also represented by RemoteValue
instances. These have
a partial serialization of the node in the value property.
Note: mirror objects do not keep the original object alive in the runtime. If an object is discarded in the runtime subsequent attempts to access it via the protocol will result in an error.
A session has an object id map. This is a weak map from objects to their corresponding id.
Should this be explicitly per realm?
-
If the object id map for the current session does not contain object run the following steps:
-
Let object id be a new, unique, string identifier for object. If object is an element this must be the web element reference for object; if it’s a
WindowProxy
object, this must be the window handle for object. -
Set the value of object in the object id map to object id.
-
-
Return the result of getting the value for object in object id map.
remote end definition and local end definition
RemoteValue = { UndefinedValue // NullValue // StringValue // NumberValue // BooleanValue // BigIntValue // SymbolValue // ArrayValue // ObjectValue // FunctionValue // RegExpValue // DateValue // MapValue // SetValue // WeakMapValue // WeakSetValue // IteratorValue // GeneratorValue // ErrorValue // ProxyValue // PromiseValue // TypedArrayValue // ArrayBufferValue // NodeValue // WindowProxyValue // } ObjectId = text; ListValue = [*RemoteValue]; MappingValue = [*[(RemoteValue / text), RemoteValue]]; UndefinedValue = { type: "undefined", } NullValue = { type: "null", } StringValue = { type: "string", value: text, } SpecialNumber = "NaN" / "-0" / "+Infinity" / "-Infinity"; NumberValue = { type: "number", value: number / SpecialNumber, } BooleanValue = { type: "boolean", value: bool, } BigIntValue = { type: "bigint", value: text, } SymbolValue = { type: "symbol", objectId: ObjectId, } ArrayValue = { type: "array", objectId: ObjectId, value?: ListValue, } ObjectValue = { type: "object", objectId: ObjectId, value?: MappingValue, } FunctionValue = { type: "function", objectId: ObjectId, } RegExpValue = { type: "regexp", objectId: ObjectId, value: text } DateValue = { type: "date", objectId: ObjectId, value: text } MapValue = { type: "map", objectId: ObjectId, value?: MappingValue, } SetValue = { type: "set", objectId: ObjectId, value?: ListValue } WeakMapValue = { type: "weakmap", objectId: ObjectId, } WeakSetValue = { type: "weakset", objectId: ObjectId, } ErrorValue = { type: "error", objectId: ObjectId, } PromiseValue = { type: "promise", objectId: ObjectId, } TypedArrayValue = { type: "typedarray", objectId: ObjectId, } ArrayBufferValue = { type: "arraybuffer", objectId: ObjectId, } NodeValue = { type: "node", objectId: ObjectId, value?: NodeProperties, } NodeProperties = { nodeType: uint, nodeValue: text, localName?: text, namespaceURI?: text, childNodeCount: uint, children?: [*NodeValue], attributes?: {*text => text}, shadowRoot?: NodeValue / null, } WindowProxyValue = { type: "window", objectId: ObjectId, }
Should WindowProxy get attributes in a similar style to Node?
handle String / Number / etc. wrapper objects specially?
To serialize as a remote value given an value, a max depth, node details, and a set of known objects:
-
In the following list of conditions and associated steps, run the first set of steps for which the associated condition is true:
- Type(value) is Undefined
- Let remote value be a map matching the
UndefinedValue
production in the local end definition. - Type(value) is Null
- Let remote value be a map matching the
NullValue
production in the local end definition. - Type(value) is String
-
Let remote value be a map matching the
StringValue
production in the local end definition, with thevalue
property set to value. - Type(value) is Number
-
-
Switch on the value of value:
- NaN
- Let serialized be
"NaN"
- -0
- Let serialized be
"-0"
- +Infinity
- Let serialized be
"+Infinity"
- -Infinity
- Let serialized be
"-Infinity"
- Otherwise:
- Let serialized be value
-
Let remote value be a map matching the
NumberValue
production in the local end definition, with thevalue
property set to serialized.
-
- Type(value) is Boolean
- Let remote value be a map matching the
BooleanValue
production in the local end definition, with thevalue
property set to value. - Type(value) is BigInt
- Let remote value be a map matching the
BigIntValue
production in the local end definition, with thevalue
property set to the result of running the ToString operation on value. - Type(value) is Symbol
- Let remote value be a map matching the
SymbolValue
production in the local end definition, with theobjectId
property set to the object id for an object value. - IsArray(value)
-
-
Let serialized be null.
-
If value is not in the set of known objects, and max depth is not null and greater than 0, run the following steps:
-
Append value to the set of known objects
-
Let serialized be the result of serialize as a list given CreateArrayIterator(value, value), max depth, node details and set of known objects.
-
-
Let remote value be a map matching the
ArrayValue
production in the local end definition, with theobjectId
property set to the object id for an object value, and thevalue
field set to serialized if it’s not null, or ommitted otherwise.
-
- IsRegExp(value)
-
-
Let serialized be the string-concatenation of "/", pattern, "/", and flags.
-
Let remote value be a map matching the
RegExpValue
production in the local end definition, with theobjectId
property set to the object id for an object object and the value set to serialized
- value has a [[DateValue]] internal slot.
-
-
Let serialized be ToDateString(thisTimeValue(value)).
-
Let remote value be a map matching the
DateValue
production in the local end definition, with theobjectId
property set to the object id for an object object and the value set to serialized.
-
- value has a [[MapData]] internal slot
-
-
Let serialized be null.
-
If value is not in the set of known objects, and max depth is not null and greater than 0, run the following steps:
-
Append value to the set of known objects
-
Let serialized be the result of serialize as a mapping given CreateMapIterator(value, key+value), max depth, node details and set of known objects.
-
-
Let remote value be a map matching the
MapValue
production in the local end definition, with theobjectId
property set to the object id for an object value, and thevalue
field set to serialized if it’s not null, or ommitted otherwise.
-
- value has a [[SetData]] internal slot
-
-
Let serialized be null.
-
If value is not in the set of known objects, and max depth is not null and greater than 0, run the following steps:
-
Append value to the set of known objects
-
Let serialized be the result of serialize as a list given CreateSetIterator(value, value), max depth, node details and set of known objects.
-
-
Let remote value be a map matching the
SetValue
production in the local end definition, with theobjectId
property set to the object id for an object value, and thevalue
field set to serialized if it’s not null, or ommitted otherwise.
-
- value has a [[WeakMapData]] internal slot
- Let remote value be a map matching the
WeakMapValue
production in the local end definition, with theobjectId
property set to the object id for an object value. - value has a [[WeakSetData]] internal slot
- Let remote value be a map matching the
WeakSetValue
production in the local end definition, with theobjectId
property set to the object id for an object value. - value has an [[ErrorData]] internal slot
- Let remote value be a map matching the
ErrorValue
production in the local end definition, with theobjectId
property set to the object id for an object value. - IsPromise(value)
- Let remote value be a map matching the
PromiseValue
production in the local end definition, with theobjectId
property set to the object id for an object value. - value has a [[TypedArrayName]] internal slot
- Let remote value be a map matching the
TypedArrayValue
production in the local end definition, with theobjectId
property set to the object id for an object value. - value has an [[ArrayBufferData]] internal slot
- Let remote value be a map matching the
ArrayBufferValue
production in the local end definition, with theobjectId
property set to the object id for an object value. - value is a platform object that implements Node
-
-
Let serialized be null.
-
If node details is true, run the following steps:
-
Let serialized be a map.
-
"nodeType", Get(value, "nodeType"), false)
-
Set serialized["
nodeValue
"] to Get(value, "nodeValue") -
Set serialized["
childNodeCount
" to child node count -
If max depth is equal to 0 let children be null. Otherwise, let children be an empty list and, for each node child in the children of value:
-
Let child depth be max depth - 1 if max depth is not null, or null otherwise.
-
Let serialized be the result of serialize as a remote value with child, child depth, node details and set of known objects.
-
Append serialized to children.
-
-
Set serialized["
children
"] to children. -
If value is an Element:
-
Let attributes be a new map.
-
For each attribute in value’s attribute list:
-
Let name be attribute’s qualified name
-
Let value be attribute’s value.
-
Set attributes[name] to value
-
-
Set serialized["
attributes
"] to attributes. -
Let shadow root be value’s shadow root.
-
If shadow root is null, let serialized shadow be null. Otherwise run the following substeps:
-
Let child depth be max depth - 1 if max depth is not null, or null otherwise.
-
Let serialized shadow be the result of serialize as a remote value with shadow root, child depth, false and set of known objects.
Note: this means the
objectId
for the shadow root will be serialized irrespective of whether the shadow is open or closed, but no properties of the node will be returned.
-
-
Set= serialized["
shadowRoot
"] to serialized shadow.
-
-
-
Let remote value be a map matching the
NodeValue
production in the local end definition, with theobjectId
property set to the object id for an object value, andvalue
set to serialized, if serialized is not null.
-
- value is a platform object that implements
WindowProxy
- 1. Let remote value be a map matching the
WindowProxyValue
production in the local end definition, with theobjectId
property set to the object id for an object value. - value is a platform object
- 1. Let remote value be a map matching the
ObjectValue
production in the local end definition, with theobjectId
property set to the object id for an object value. - IsCallable(value)
- Let remote value be a map matching the
FunctionValue
production in the local end definition, with theobjectId
property set to the object id for an object value. - Otherwise:
-
-
let serialized be null.
-
If value is not in the set of known objects, and max depth is greater than 0, run the following steps:
-
Append value to the set of known objects
-
Let serialized be the result of serialize as a mapping given EnumerableOwnPropertyNames(value, key+value), max depth, node details and set of known objects
-
-
Let remote value be a map matching the
ObjectValue
production in the local end definition, with theobjectId
property set to the object id for an object value, and thevalue
field set to serialized.
-
Return remote value
Does it make sense to use the same depth parameter for nodes and objects in general?
-
Let serialized be a new list.
-
For each child value in iterable:
-
Let child depth be max depth - 1 if max depth is not null, or null otherwise.
-
Let serialized child be the result of serialize as a remote value with arguments child value, child depth, node details and set of known objects.
-
Append serialized child to serialized.
-
-
Return serialized
this assumes for-in works on iterators
-
Let serialized be a new list.
-
For item in iterable:
-
Assert: IsArray(item)
-
Let property be CreateListFromArrayLike(item)
-
Assert: property is a list of size 2
-
Let key be property[0] and let value be property[1]
-
Let child depth be max depth - 1 if max depth is not null, or null otherwise.
-
If Type(key) is String, let serialized key be child key, otherwise let serialized key be the result of serialize as a remote value with arguments child key, child depth, node details and set of known objects.
-
Let serialized value be the result of serialize as a remote value with arguments value, child depth, node details and set of known objects.
-
Let serialized child be («serialized key, serialized value»).
-
Append serialized child to serialized.
-
-
Return serialized
6. Modules
6.1. The session Module
The session module contains commands and events for monitoring the status of the remote end.
6.1.1. Definition
SessionCommand = (SessionStatusCommand // SessionSubscribeCommand)
SessionResult = (StatusResult)
To update the event map, given session, list of event names, browsing contexts, and enabled:
Note: The return value of this algorithm is a map between event names and contexts. When the events are being enabled globally, the contexts in the return value are those for which the event was already enabled. When the events are enabled for specific contexts, the contexts in the return value are those for which the event are now enabled but were not previously. When events are disabled, the return value is always empty.
-
Let global event set be a clone of the global event set for session.
-
Let event map be a new map.
-
For each key → value of the browsing context event map for session:
-
Set event map[key] to a clone of value.
-
-
Let enabled events be a new map.
-
Let event names be an empty set.
-
For each entry name in list of event names, let event names be the union of event names and the result of trying to obtain a set of event names with name.
-
-
If browsing contexts is null:
-
If enabled is true:
-
For each event name of event names:
-
If global event set doesn’t contain event name:
-
Let event enabled contexts be the event enabled browsing contexts given session and event name
-
Add event name to global event set.
-
For each context of event enabled contexts, remove event name from event map[context].
-
Set enabled events[event name] to event enabled contexts.
-
-
-
-
If enabled is false:
-
For each event name in event names:
-
If global event set contains event name, remove event name from global event set. Otherwise return error with error code invalid argument.
-
-
-
-
Otherwise, if browsing contexts is not null:
-
Let targets be an empty map.
-
For each context id in browsing contexts:
-
Let context be the result of trying to get a browsing context with context id.
-
Let top-level context be the top-level browsing context for context.
-
If event map does not contain top-level context, set event map[top-level context] to a new set.
-
Set targets[top-level context] to event map[top-level context].
-
-
For each event name in event names:
-
If enabled is true and global event set contains event name, continue.
-
For each context → target in targets:
-
If enabled is true and target does not contain event name:
-
Add event name to target.
-
If enabled events does not contain event name, set enabled events[event name] to a new set.
-
Append context to enabled events[event name].
-
-
If enabled is false:
-
If target contains event name, remove event name from target. Otherwise return error with error code invalid argument.
-
-
-
-
-
Set the global event set for session to global event set.
-
Set the browsing context event map for session to event map.
-
Return success with data enabled events.
Note: Implementations that do additional work when an event is enabled, e.g. subscribing to the relevant engine-internal events, will likely perform those additional steps when updating the event map. This specification uses a model where hooks are always called and then the event map is used to filter only those that ought to be returned to the local end.
6.1.2. Commands
6.1.2.1. The session.status Command
The session.status command returns information about whether a remote end is in a state in which it can create new sessions, but may additionally include arbitrary meta information that is specific to the implementation.
- Command Type
-
SessionStatusCommand = { method: "session.status", params: EmptyParams, }
- Return Type
-
SessionStatusResult = { ready: bool, message: text, }
The remote end steps are:
-
Let body be a new map with the following properties:
- "ready"
- The remote end’s readiness state.
- "message"
- An implementation-defined string explaining the remote end’s readiness state.
-
Return success with data body
6.1.2.2. The session.subscribe Command
The session.subscribe command enables certain events either globally or for a set of browsing contexts
This needs to be generalized to work with realms too
- Command Type
-
SessionSubscribeCommand = { method: "session.subscribe", params: SubscribeParameters } SessionSubscribeParameters = { events: [*text], ?contexts: [*BrowsingContext], }
- Return Type
-
EmptyResult
The remote end steps with command parameters are:
-
Let the list of event names be the value of the
events
field of command parameters -
Let the list of contexts be the value of the
contexts
field of command parameters if it is present or null if it isn’t. -
Let enabled events be the result of trying to update the event map with current session, list of event names , list of contexts and enabled true.
-
Let subscribe step events be a new map.
-
For each event name → contexts in enabled events:
-
If the event with event name event name defines remote end subscribe steps, set subscribe step events[event name] to contexts.
-
-
Sort in ascending order subscribe step events using the following less than algorithm given two entries with keys event name one and event name two:
-
Let event one be the event with name event name one
-
Let event two be the event with name event name two
-
Return true if event one’s subscribe priority is less than event two’s susbscribe priority, or false otherwise.
-
-
For each event name → contexts in subscribe step events:
-
If list of contexts is null, let include contexts be a list of all top-level browsing contexts that are not contained in contexts, and let include global be true.
Otherwise let include contexts be contexts and let include global be false.
-
Run the remote end subscribe steps for the event with event name event name given include contexts and include global.
-
-
Return success with data null.
6.1.2.3. The session.unsubscribe Command
The session.unsubscribe command disables events either globally or for a set of browsing contexts
This needs to be generalised to work with realms too
- Command Type
-
SessionUnsubscribeCommand = { method: "session.unsubscribe", params: SubscribeParameters }
- Return Type
-
EmptyResult
The remote end steps with command parameters are:
-
Let the list of event names be the value of the
events
field of command parameters. -
Let the list of contexts be the value of the
contexts
field of command parameters if it is present or null if it isn’t. -
Try to update the event map with current session, list of event names, list of contexts and enabled false.
-
Return success with data null.
6.2. The browsingContext Module
The browsingContext module contains commands and events relating to browsing contexts.
6.2.1. Definition
BrowsingContextCommand = (BrowsingContextGetTreeCommand)
BrowsingContextResult = (BrowsingContextGetTreeResult) BrowsingContextEvent = ( BrowsingContextCreatedEvent // BrowsingContextDestroyedEvent )
6.2.2. Types
6.2.2.1. The browsingContext.BrowsingContext Type
remote end definition and local end definition
BrowsingContext = text;
Each browsing context has an associated browsing context id, which is a string uniquely identifying that browsing context. This is implicitly set when the context is created. For browsing contexts with an associated WebDriver window handle the browsing context id must be the same as the window handle.
-
If context id is null, return success with data null.
-
If there is no browsing context with browsing context id context id return error with error code no such frame
-
Let context be the browsing context with id context id.
-
Return success with data context
6.2.2.2. The browsingContext.BrowsingContextInfo Type
BrowsingContextInfoList = [* BrowsingContextInfo] BrowsingContextInfo = { context: BrowsingContext, ?parent: BrowsingContext / null, url: text, children: BrowsingContextInfoList / null }
The BrowsingContextInfo
type represents the properties of a
browsing context.
-
Let context id be the browsing context id for context.
-
If context has a parent browsing context let parent id be the browsing context id of that parent. Otherwise let parent id be null.
-
Let document be context’s active document.
-
Let url be the result of running the URL serializer, given document’s URL.
Note: This includes the fragment component of the URL.
-
Let child info be the result of get the descendent browsing contexts given context id, depth + 1, and max depth.
-
Let context info be a map matching the
BrowsingContextInfo
production with thecontext
field set to context id, theparent
field set to parent id if depth is 0, or unset otherwise, theurl
field set to url, and thechildren
field set to child info. -
Return context info.
-
If max depth is greater than zero, and depth is equal to max depth, return null.
-
Let parent be the result of trying to get a browsing context given parent id.
-
If parent is null, let child contexts be a list containing all top-level browsing contexts. Otherwise let child contexts be a list containing all browsing contexts which are child browsing contexts of parent.
-
Let contexts info be a list.
-
For each context of child contexts:
-
Let info be the result of get the browsing context info given context, depth, and max depth.
-
Append info to contexts info
-
-
Return contexts info
6.2.3. Commands
6.2.3.1. The browsingContext.getTree Command
The browsingContext.getTree command returns a tree of all browsing contexts that are descendents of the given context, or all top-level contexts when no parent is provided.
- Command Type
-
BrowsingContextGetTreeCommand = { method: "browsingContext.getTree", params: BrowsingContextGetTreeParameters } BrowsingContextGetTreeParameters = { ?maxDepth: uint, ?parent: BrowsingContext, }
- Return Type
-
BrowsingContextGetTreeResult = { contexts: BrowsingContextInfoList }
-
Let the parent id be the value of the
parent
field of command parameters if present, or null otherwise. -
Let max depth be the value of the
maxDepth
field of command parameters if present, or 0 otherwise. -
Let depth be 0.
-
Let contexts be the result of get the descendent browsing contexts, given parent id, depth, and max depth.
-
Let body be a map matching the
BrowsingContextGetTreeResult
production, with thecontexts
field set to contexts. -
Return success with data body.
6.2.4. Events
6.2.4.1. The browsingContext.contextCreated Event
- Event Type
-
BrowsingContextCreatedEvent = { method: "browsingContext.contextCreated", params: BrowsingContextInfo }
To Recursively emit context created events given context:
-
Emit a context created event with context.
-
For each child browsing context, child, of context:
-
Recursively emit context created events given child.
-
To Emit a context created event given context:
-
Let related contexts be a set containing context.
-
Let params be the result of get the browsing context info given context, 0, and 1.
-
Let body be a map matching the
BrowsingContextCreatedEvent
production, with theparams
field set to params. -
Emit an event with body and related contexts.
The remote end event trigger is:
When the create a new browsing context algorithm is invoked, after the active document of the browsing context is set, run the following steps:
-
Let context be the newly created browsing context.
-
Emit a context created event given context.
The remote end subscribe steps, with subscribe priority 1, given contexts and include global are:
-
For each context in contexts:
-
Recursively emit context created events given context.
-
6.2.4.2. The browsingContext.contextDestroyed Event
- Event Type
-
BrowsingContextDestroyedEvent = { method: "browsingContext.contextDestroyed", params: BrowsingContextInfo }
Run the following browsing context tree discarded steps:
-
If the current session is null, return.
-
Let context be the browsing context being discarded.
-
Let params be the result of get the browsing context info, given context, 0, and 0.
-
Let body be a map matching the
BrowsingContextDestroyedEvent
production, with theparams
field set to params. -
Let related browsing contexts be a set containing the parent browsing context of context, if that is not null, or an empty set otherwise.
-
Emit an event with body and related browsing contexts.
the way this hooks into HTML feels very fragile. See https://github.com/whatwg/html/issues/6194
It’s unclear if we ought to only fire this event for browsing contexts that have active documents; navigation can also cause contexts to become inaccessible but not yet get discarded because bfcache.
6.3. The script Module
The script module contains commands and events relating to script realms and execution.
6.3.1. Definition
ScriptCommand = (ScriptGetRealmsCommand)
ScriptResult = (ScriptGetRealmsResult) ScriptEvent = ( ScriptRealmCreatedEvent // ScriptRealmDestroyedEvent )
6.3.2. Types
6.3.2.1. The script.Realm type
Remote end definition and local end definition
Realm = text;
Each realm has an associated realm id, which is a string uniquely identifying that realm. This is implicitly set when the realm is created.
6.3.2.2. The script.RealmInfo type
RealmInfo = { realm: Realm, type: RealmType, origin: text } RealmType = "window" / "dedicated-worker" / "shared-worker" / "service-worker" / "worker" / "paint-worklet" / "audio-worklet" / "worklet" / text
The RealmInfo
type represents the properties of a realm.
-
Let realm be environment settings’ realm execution context's Realm component.
-
Let realm id be the realm id for realm.
-
Run the steps under the first matching condition:
- The global object specified by environment settings is a Window object
-
-
Let type be "
window
".
-
- The global object specified by environment settings is a
DedicatedWorkerGlobalScope
object -
-
Let type be "
dedicated-worker
".
-
- The global object specified by environment settings is a
SharedWorkerGlobalScope
object -
-
Let type be "
shared-worker
".
-
- The global object specified by environment settings is a
ServiceWorkerGlobalScope
object -
-
Let type be "
service-worker
".
-
- The global object specified by environment settings is a
WorkerGlobalScope
object -
-
Let type be "
worker
".
-
- The global object specified by environment settings is a
PaintWorkletGlobalScope
object -
-
Let type be "
paint-worklet
".
-
- The global object specified by environment settings is a
AudioWorkletGlobalScope
object -
-
Let type be "
audio-worklet
".
-
- The global object specified by environment settings is a
WorkletGlobalScope
object -
-
Let type be "
worklet
".
-
- Otherwise:
-
-
Return null.
-
-
Let origin be the serialization of an origin given environment settings’s origin.
-
Let realm info be a map matching the
RealmInfo
production, with therealm
field set to realm id, thetype
field set to type and theorigin
field set to origin. -
Return realm info
We currently don’t provide information about realms of unknown types. That might be a problem for e.g. extension-related realms.
Note: Future variations of this specification will retain the invariant that
the last component of the type name after splitting on "-
"
will always be "worker
" for globals implementing WorkerGlobalScope
, and "worklet
" for globals
implementing WorkletGlobalScope
.
6.3.3. Commands
6.3.3.1. The script.getRealms Command
The script.getRealms command returns a list of all realms, optionally filtered to realms of a specific type, or to the realm associated with the document currently loaded in a specified browsing context.
- Command Type
-
ScriptGetRealmsCommand = { method: "script.getRealms", params: GetRealmsParameters } GetRealmsParameters = { ?context: BrowsingContext, ?type: RealmType, }
- Return Type
-
RealmInfoList = [* RealmInfo] ScriptGetRealmsResult = { realms: RealmInfoList }
-
Let environment settings be a list of all the environment settings objects that have their execution ready flag set.
-
If command parameters contains
context
:-
Let context be the result of trying to get a browsing context with command parameters["
context
"]. -
Let document be context’s active document.
-
Let context environment settings be a list.
-
For each settings of environment settings:
-
If any of the following conditions hold:
-
The responsible document of settings is document
-
The global object specified by settings is a
WorkerGlobalScope
with document in its owner set
Append settings to context environment settings.
-
-
-
Set environment settings to context environment settings.
-
-
Let realms be a list.
-
For each settings of environment settings:
-
Let realm info be the result of get the realm info given settings
-
If command parameters contains
type
and realm info["type
"] is not equal to command parameters["type
"] then continue. -
If realm info is not null, append realm info to realms.
-
-
Let body be a map matching the
GetRealmsResult
production, with therealms
field set to realms. -
Return success with data body.
Extend this to also allow realm parents e.g. for nested workers? Or get all ancestor workers.
We might want to have a more sophisticated filter system than just a literal match.
6.3.4. Events
6.3.4.1. The script.realmCreated Event
- Event Type
-
ScriptRealmCreatedEvent = { method: "script.realmCreated", params: RealmInfo }
When any of the set up a window environment settings object, set up a worker environment settings object or set up a worklet environment settings object algorithms are invoked, immediately prior to returning the settings object:
-
Let environment settings be the newly created environment settings object.
-
Let realm info be be the result of get the realm info given environment settings.
-
If realm info is null, return.
-
Let related browsing contexts be the result of get related browsing contexts given environment settings.
-
Let body be a map matching the
RealmCreatedEvent
production, with theparams
field set to realm info. -
Emit an event with body and related browsing contexts.
The remote end subscribe steps with subscribe priority 2, given contexts and include global are:
-
Let environment settings be a list of all the environment settings objects that have their execution ready flag set.
-
For each settings of environment settings:
-
If the responsible document of settings is a Document:
-
Let context be settings’s responsible document's browsing context's top-level browsing context.
-
If context is not in contexts, continue.
-
Append context to related contexts.
Otherwise, if include global is false, continue.
-
-
Let realm info be the result of get the realm info given settings
-
Let body be a map matching the
RealmCreatedEvent
production, with theparams
field set to realm info. -
Emit an event with body and related contexts.
-
6.3.4.2. The script.realmDestroyed Event
- Event Type
-
RealmDestroyedParameters = { realm: Realm } ScriptRealmDestroyedEvent = { method: "script.realmDestoyed", params: RealmDestroyedParameters }
-
Let related browsing contexts be an empty set.
-
Append document’s browsing context to related browsing contexts.
-
For each worklet global scope in document’s worklet global scopes:
-
Let realm be worklet global scope’s relevant Realm.
-
Let realm id be the realm id for realm.
-
Let params be a map mathcing the
RealmDestroyedParameters
production, with therealm
field set of realm id. -
Let body be a map matching the
RealmDestroyedEvent
production, with theparams
field set to params. -
Emit an event with body and related browsing contexts.
-
-
Let environment settings be the environment settings object whose responsible document is document.
-
Let realm be environment settings’ realm execution context's Realm component.
-
Let realm id be the realm id for realm.
-
Let params be a map mathcing the
RealmDestroyedParameters
production, with therealm
field set to realm id. -
Let body be a map matching the
RealmDestroyedEvent
production, with theparams
field set to params. -
Emit an event with body and related browsing contexts.
Whenever a worker event loop event loop is destroyed, either because the worker comes to the end of its lifecycle, or prematurely via the terminate a worker algorithm:
-
Let environment settings be the environment settings object for which event loop is the responsible event loop.
-
Let related browsing contexts be the result of get related browsing contexts given environment settings.
-
Let realm be environment settings’s environment settings object’s Realm.
-
Let realm id be the realm id for realm.
-
Let params be a map mathcing the
RealmDestroyedParameters
production, with therealm
field set of realm id. -
Let body be a map matching the
RealmDestroyedEvent
production, with theparams
field set to params.
6.4. Log
The log module contains functionality and events related to logging.
A session has a log event buffer which is a map from browsing context id to a list of log events for that context that have not been emitted. User agents may impose a maximum size on this buffer, subject to the condition that if events A and B happen in the same context with A occuring before B, and both are added to the buffer, the entry for B must not be removed before the entry for A.
To buffer a log event given contexts and event:
-
Let buffer be the current session's log event buffer.
-
Let context ids be a new list.
-
For each context of contexts:
-
Append the browsing context id for context to context ids.
-
-
For each context id in context ids:
-
Let other contexts be an empty list
-
For each other id in context ids:
-
If other id is not equal to context id, append other id to other contexts.
-
If buffer does not contain context id, let buffer[context id] be a new list.
-
Append (event, other contexts) to buffer[context id].
-
Note: we store the other contexts here so that each event is only emitted once. In practice this is only relevant for workers that can be associated with multiple browsing contexts.
Do we want to key this on browsing context or top-level browsing context? The difference is in what happens if an event occurs in a frame and that frame is then navigated before the local end subscribes to log events for the top level context.
6.4.1. Definition
LogEvent = ( LogEntryAddedEvent )
6.4.2. Types
6.4.2.1. log.LogEntry
LogLevel = "debug" / "info" / "warning" / "error" LogEntry = ( GenericLogEntry // ConsoleLogEntry // JavascriptLogEntry ) BaseLogEntry = { level: LogLevel, text: text / null, timestamp: int, ?stackTrace: [*StackFrame], } GenericLogEntry = { BaseLogEntry, type: text, } ConsoleLogEntry = { BaseLogEntry, type: "console", method: text, realm: Realm, args: [*RemoteValue], } JavascriptLogEntry = { BaseLogEntry, type: "javascript", }
Each log event is represented by a LogEntry
object. This has a type
property which represents the type of log entry added, a level
property representing severity, a text
property
with the log message string itself, and a timestamp
property
corresponding to the time the log entry was generated. Specific variants of the LogEntry
are used to represent logs from different sources, and
provide additional fields specific to the entry type.
6.4.2.2. log.StackFrame
StackFrame = { url: text, functionName: text, lineNumber: int, columnNumber: int, }
A frame in a stacktrace is represented by a StackFrame
object. This
has a url
property, which represents the URL of the script, a functionName
property which represents the name of the executing
function, and lineNumber
and columnNumber
properties,
which represent the line and column number of the executed code.
The current stack trace is a representation of the stack of the running execution context. The details of this are unspecified, and so the behaviour here is implementation defined, but the general process is as follows:
-
Let stack trace be a new list.
-
For each stack frame frame in the stack of the running execution context, starting from the most recently executed frame, run the following steps:
-
Let url be the result of running the URL serializer, given the URL of frame’s associated script resource.
-
Let functionName be the name of frame’s associated function.
-
Let lineNumber and columnNumber be the one-based line and zero-based column numbers, respectively, of the location in frame’s associated script resource corresponding to frame.
-
Let frame info be a new map matching the
StackFrame
production, with theurl
field set to url, thefunctionName
field set to functionName, thelineNumber
field set to lineNumber and thecolumnNumber
field set to columnNumber.
-
-
Append frame info to stack trace.
-
Return stack trace
6.4.3. Events
6.4.3.1. entryAdded
- Event Type
-
LogEntryAddedEvent = { method: "log.entryAdded", params: LogEntry, }
The remote end event trigger is:
Define the following console steps with method, args, and options:
-
If method is "
error
" or "assert
", let level be "error
". If method is "debug
" or "trace
" let level be "debug
". If method is "warn
" orwarning
, let level be "warning
". Otherwise let level be "info
". -
Let timestamp be a time value representing the current date and time in UTC.
-
Let text be an empty string.
-
If Type(|args[0]) is String, and args[0] contains a formatting specifier, let formatted args be Formatter(args). Otherwise let formatted args be args.
This is underdefined in the console spec, so it’s unclar if we can get interoperable behaviour here.
-
For each arg in formatted args:
-
If arg is not the first entry in args, append a U+0020 SPACE to text.
-
If arg is a primitive value, append ToString(arg) to text. Otherwise append an implementation-defined string to text.
-
-
Let serialized args be a new list.
-
For each arg of args, append the result of serialize as a remote value given arg, null, true, and an empty set to serialized args.
-
Let realm be the realm id of the current Realm Record.
-
Let stack be the current stack trace.
-
Let entry be a map matching the
ConsoleLogEntry
production, with the thelevel
field set to level, thetext
field set to text, thetimestamp
field set to timestamp, thestackTrace
field set to stack if stack is not null, or omitted otherwise, the method field set to method, therealm
field set to realm and theargs
field set to serialized args. -
Let body be a map matching the
LogEntryAddedEvent
production, with theparams
field set to entry. -
Let settings be the current settings object
-
Let related browsing contexts be the result of get related browsing contexts given settings.
-
Let emitted be the result of emit an event with body and related browsing contexts.
-
If emitted is false, append (related browsing contexts, body) to the current session's log event buffer.
Define the following error reporting steps with arguments script, line number, column number, message and handled:
-
If handled is true return.
-
Let settings be script’s settings object.
-
Let stack be the current stack trace for the exception.
-
Let entry be a map matching the
JavascriptLogEntry
production, withlevel
set to "error
",text
set to message, and thetimestamp
field set to timestamp. -
Let related browsing contexts be the result of get related browsing contexts given settings.
-
Let emitted be the result of emit an event with body and related browsing contexts.
-
If emitted is false, buffer a log event given related browsing contexts and body.
Lots more things require logging. CDP has LogEntryAdded types xml, javascript, network, storage, appcache, rendering, security, deprecation, worker, violation, intervention, recommendation, other. These are in addition to the js exception and console API types that are represented by different methods.
The remote end subscribe steps, with subscribe priority 10, given contexts and include global are:
-
For each context id → events in log event buffer:
-
Let maybe context be the result of getting a browsing context given context id.
-
If maybe context is an error, remove context id from log event buffer and continue.
-
Let context be maybe context’s data
-
Let top level context be context’s top-level browsing context.
-
Let related contexts be a new set containing context.
-
If include global is true and top level context is not in contexts, or if include global is false and top level context is in contexts:
-
For each (event, other contexts) in events:
-
Emit an event with event and related contexts.
-
For each other context id in other contexts:
-
If log event buffer contains other context id, remove event from log event buffer[other context id].
-
-
-
-
7. Patches to Other Specifications
This specification requires some changes to external specifications to provide the necessary integration points. It is assumed that these patches will be committed to the other specifications as part of the standards process.
7.1. HTML
The a browsing context is discarded algorithm is modified to read as follows:
-
If this is not a recursive invocation of this algorithm, call any browsing context tree discarded steps defined in external specifications with browsingContext.
-
Discard all
Document
objects for all the entries in browsingContext’s session history. -
If browsingContext is a top-level browsing context, then remove a browsing context browsingContext.
The actual patch might be better to split the algorithm into an outer algorithm that is called by external callers and an inner algorithm that’s used for recursive calls. That’s quite hard to express as a patch to the specification since it requires changing multiple parts.
The report an error algorithm is modified with an additional step at the end:
-
Call any error reporting steps defined in external specifications with script, line, col, message, and true if the error is handled, or false otherwise.
7.2. Console
Other specifications can define console steps. When any method of the console
interface is called, with method name method and argument args:
-
If that method does not call the Printer operation, call any console steps defined in external specification with arguments method, args and, undefined.
Otherwise, at the point when the Printer operation is called with arguments name, printerArgs and options (which is undefined if the argument is not provided), call any console steps defined in external specification with arguments name, printerArgs, and options.