AT Driver

Editor’s Draft,

More details about this document
This version:
https://w3c.github.io/at-driver/
Issue Tracking:
GitHub
Inline In Spec
Editor:
(Bocoup)
Former Editor:
(Bocoup)

Abstract

A protocol for introspection and remote control of assistive technology software

Status of this document

This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

GitHub Issues are preferred for discussion of this specification. Alternatively, you can send comments to the Browser Testing and Tools Working Group’s mailing list, public-browser-tools-testing@w3.org (archives).

This document was published by the Browser Testing and Tools Working Group as an Editor’s Draft.

Publication as an Editor’s Draft does not imply endorsement by W3C and its Members.

This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 03 November 2023 W3C Process Document.

1. Introduction

AT Driver defines a protocol for introspection and remote control of assistive technology software, using a bidirectional communication channel.

2. Explainer

Specify a protocol using WebSocket that maximally reuses concepts and conventions from WebDriver BiDi.

A connection has two endpoints: remote and local. The remote end can control and read from the screen reader, which can either be implemented as a standalone application or be implemented as part of the AT software. The local end is what the test interfaces with, usually in the form of language-specific libraries providing an API.

There should only be the WebSocket form of communication -- as in BiDi-only sessions for WebDriver BiDi.

A connection can have 0 or more sessions. Each session corresponds to an instance of an AT. We may limit the maximum number of sessions per AT to 1 initially.

When a remote end supports multiple sessions, it does not necessarily mean that there will be multiple ATs running at the same time in the same instance of an OS. Some ATs might not be able to function properly if there are other ATs running at the same time. The AT Driver session concept can still be used by having the remote end run in a separate environment and each AT is run in its own OS instance (for example in a virtual machine), and the remote end proxies messages in some fashion.

Commands are grouped into modules. The modules could be: Sessions, Settings, Actions.

Message transport is provided using the WebSocket protocol.

The protocol is defined using a Concise Data Definition Language (CDDL) definition. The serialization is JSON.

2.1. Example

First, the local end would establish a WebSocket connection.

The local end then creates a session by sending

{"method":"session.new","params":{...}}

The local end can then send commands to change settings or send key press actions for that session. The local end assigns a command id (which is included in the message). The remote end sends a message back with the result and the command id, so the local end knows which command the message applies to.

When the screen reader speaks, the remote end will send a message as to the local end with the spoken text. This could be in the form of an event, which is not tied to any particular command.

3. Infrastructure

This specification depends on the Infra Standard. [INFRA]

Network protocol messages are defined using CDDL. [RFC8610]

A Universally Unique Identifier (UUID) is a 128 bits long URN that requires no central registration process. Generating a UUID means creating a UUID Version 4 value and converting it to the string representation. [RFC9562]

Where algorithms that return values are fallible, they are written in terms of returning either success or error. A success value has an associated data field which encapsulates the value returned, whereas an error response has an associated error code.

When calling a fallible algorithm, the construct "Let result be the result of trying to call algorithm" is equivalent to:

  1. Let temp be the result of calling algorithm.

  2. If temp is an error, then return temp. Otherwise, let result be temp’s data field.

Note: This means that errors are propagated upwards when using "trying".

4. Nodes

The AT Driver protocol consists of communication between:

local end

The local end represents the client side of the protocol, which is usually in the form of language-specific libraries providing an API on top of the AT Driver protocol. This specification does not place any restrictions on the details of those libraries above the level of the wire protocol.

remote end

The remote end hosts the server side of the protocol. The remote end is responsible for driving and listening to the assistive technology and sending information to the local end as defined in this specification.

5. Protocol

This section defines the basic concepts of the AT Driver protocol. These terms are distinct from their representation at the transport layer.

The protocol is defined using a CDDL definition. For the convenience of implementors two separate CDDL definitions are defined; the remote end definition which defines the format of messages produced on the local end and consumed on the remote end, and the local end definition which defines the format of messages produced on the remote end and consumed on the local end.

5.1. Definition

This section gives the initial contents of the remote end definition and local end definition. These are augmented by the definition fragments defined in the remainder of the specification.

Remote end definition

Command = {
  id: uint,
  CommandData,
  Extensible,
}

CommandData = (
  SessionCommand //
  SettingsCommand //
  InteractionCommand
)

EmptyParams = { Extensible }

Local end definition:

Message = (
  CommandResponse //
  ErrorResponse //
  Event
)

CommandResponse = {
  id: uint,
  result: ResultData,
  Extensible,
}

ErrorResponse = {
  id: uint / null,
  error: "unknown error" / "unknown command" / "invalid argument" / "session not created",
  message: text,
  ?stacktrace: text,
  Extensible,
}

ResultData = (
  EmptyResult /
  SessionResult /
  SettingsResult
)

EmptyResult = {}

Event = {
  EventData,
  Extensible,
}

EventData = (
  InteractionEvent
)

Remote end definition and local end definition:

Extensible = {
  *text => any
}

5.2. Capabilities

Capabilities are used to communicate the features supported by a given implementation. The local end may use capabilities to define which features it requires the remote end to satisfy when creating a new session. Likewise, the remote end uses capabilities to describe the full feature set for a session.

The following table of standard capabilities enumerates the capabilities each implementation must support.

Standard capabilities
Capability Key Value type Description
AT name "atName" string Identifies the assistive technology.
AT version "atVersion" string Identifies the version of the assistive technology.
Platform "platformName" string Identifies the operating system of the remote end.

Remote ends may introduce extension capabilities that are extra capabilities used to provide configuration or fulfill other vendor-specific needs. Extension capabilities' key must contain a ":" (colon) character, denoting an implementation specific namespace. The value can be arbitrary JSON types.

To process capabilities with argument parameters, the remote end must:

  1. If parameters["capabilities"] exists and parameters["capabilities"]["alwaysMatch"] exists:

    1. Let required capabilities be parameters["capabilities"]["alwaysMatch"].

  2. Otherwise:

    1. Let required capabilities be a new map.

  3. Return the result of match capabilities given required capabilities.

To match capabilities given requested capabilities, the remote end must:

  1. Let matched capabilities be a map with the following entries:

    "atName"

    ASCII lowercase name of the assistive technology as a string.

    "atVersion"

    The assistive technology version, as a string.

    "platformName"

    ASCII lowercase name of the current platform as a string.

  2. Optionally add extension capabilities as entries to matched capabilities.

  3. For each keyvalue of requested capabilities:

    1. Let match value be value.

    2. Switch on key:

      "atName"

      If value is not equal to matched capabilities["atName"], then return success with data null.

      "atVersion"

      Compare value to matched capabilities["browserVersion"] using an implementation-defined comparison algorithm. The comparison is to accept a value that places constraints on the version using the "<", "<=", ">", and ">=" operators.

      "platformName"

      If value is not equal to matched capabilities["platformName"], then return success with data null.

      Otherwise

      If key is the key of an extension capability, set match value to the result of trying implementation-specific steps to match on key with value. If the match is not successful, return success with data null.

    3. Set matched capabilities[key] to match value.

  4. Return success with data matched capabilities.

5.3. Session

A session represents the connection between a local end and a specific remote end.

A remote end has an associated list of active sessions, which is a list of all sessions that are currently started. A remote end has at most one active session at a given time.

A session has an associated session ID (a string representation of a UUID) used to uniquely identify this session. Unless stated otherwise it is null.

5.4. Modules

The AT Driver protocol is organized into modules.

Each module represents a collection of related commands and events pertaining to a certain aspect of the assistive technology.

Each module has a module name which is a string. The command name and event name for commands and events defined in the module start with the module name followed by a period ".".

Modules which contain commands define remote end definition fragments.

An implementation may define extension modules. These must have a module name that contains a single colon ":" character. The part before the colon is the prefix; this is typically the same for all extension modules specific to a given implementation and should be unique for a given implementation. Such modules extend the local end definition and remote end definition providing additional groups as choices for the defined commands and events.

5.5. Commands

A command is an asynchronous operation, requested by the local end and run on the remote end, resulting in either a success or an error being returned to the local end. Multiple commands can run at the same time, and commands can potentially be long-running. As a consequence, commands can finish out-of-order.

Each command is defined by:

A command that can run without an active session is a static command. Commands are not static commands unless stated in their definition.

When commands are sent from the local end they have a command id. This is an identifier used by the local end to identify the response from a particular command. From the point of view of the remote end this identifier is opaque and cannot be used internally to identify the command.

The set of all command names is a set containing all the defined command names, including any belonging to extension modules.

5.6. Events

An event is a notification, sent by the remote end to the local end, signaling that something of interest has occurred on the remote end.

5.7. Errors

The following table lists each error code, its associated JSON error code, and a non-normative description of the error.

Error codes
Error code JSON error code Description
invalid argument invalid argument The arguments passed to a command are either invalid or malformed.
invalid session id invalid session id The session either does not exist or it’s not active.
unknown command unknown command A command could not be executed because the remote end is not aware of it.
session not created session not created A new session could not be created.
cannot simulate keyboard interaction cannot simulate keyboard interaction The remote end cannot simulate keyboard interaction.
invalid OS focus state invalid OS focus state The application that currently has OS focus is not one of the expected applications.

5.8. Security checks

In order to mitigate security risks when using this API, there are some security checks for certain commands.

To check that keyboard interaction can be simulated:

  1. If the remote end cannot simulate keyboard interaction for any implementation-defined reason, then return an error with error code cannot simulate keyboard interaction.

  2. Return success with data null.

To check that one of the expected applications has focus:

  1. If the application that currently has OS focus (and so could act on simulated key presses from this API) is not one of the expected applications, then return an error with error code invalid OS focus state. Which applications are expected is implementation-defined.

    Is the "OS focus" check a viable security restriction for "send keys"? [Issue #77]

  2. Return success with data null.

To determine whether a string text should be withheld:

  1. If the remote end determines that it is unsafe to expose text to an external process for any implementation-defined reason:

    1. Return true.

  2. Return false.

6. Transport

Message transport is provided using the WebSocket protocol. [RFC6455]

A WebSocket listener is a network endpoint that is able to accept incoming WebSocket connections.

A WebSocket listener has a host, a port, and a secure flag.

When a WebSocket listener listener is created, a remote end must start to listen for WebSocket connections on the host and port given by listener’s host and port. If listener’s secure flag is set, then connections established from listener must be TLS encrypted.

A remote end has a set of WebSocket listeners active listeners, which is initially empty.

A remote end has a set of WebSocket connections not associated with a session, which is initially empty.

A WebSocket connection is a network connection that follows the requirements of the WebSocket protocol. [RFC6455]

A session has a set of session WebSocket connections whose elements are WebSocket connections. This is initially empty.

A session session is associated with connection connection if session’s session WebSocket connections contains connection.

Note: Each WebSocket connection is associated with at most one session.

When a client establishes a WebSocket connection connection by connecting to one of the set of active listeners listener, the implementation must proceed according to the WebSocket server-side requirements, with the following steps run when deciding whether to accept the incoming connection:

  1. Let resource name be the resource name from reading the client’s opening handshake. If resource name is not "/session", then stop running these steps and act as if the requested service is not available.

  2. Run any other implementation-defined steps to decide if the connection should be accepted, and if it is not stop running these steps and act as if the requested service is not available.

  3. Add the connection to the set of WebSocket connections not associated with a session.

When a WebSocket message has been received for a WebSocket connection connection with type type and data data, a remote end must handle an incoming message given connection, type and data.

When the WebSocket closing handshake is started or when the WebSocket connection is closed for a WebSocket connection connection, a remote end must handle a connection closing given connection.

Note: Both conditions are needed because it is possible for a WebSocket connection to be closed without a closing handshake.

To start listening for a WebSocket connection:

  1. Let listener be a new WebSocket listener with implementation-defined host, port, and secure flag.

  2. Append listener to the remote end's active listeners.

  3. Return listener.

Note: a future iteration of this specification may allow multiple connections, to support intermediary nodes like in WebDriver.

To handle an incoming message given a WebSocket connection connection, type type and data data:

  1. If type is not text:

    1. Send an error response given connection, null, and invalid argument.

    2. Return.

  2. Assert: data is a scalar value string, because the WebSocket handling errors in UTF-8-encoded data would already have failed the WebSocket connection otherwise.

  3. If there is a session associated with connection connection, let session be that session. Otherwise if connection is in the set of WebSocket connections not associated with a session, let session be null. Otherwise, return.

  4. Let parsed be the result of parsing JSON into Infra values given data. If this throws an exception, then send an error response given connection, null, and invalid argument, and finally return.

  5. Match parsed against the remote end definition. If this results in a match:

    1. Let matched be the map representing the matched data.

    2. Assert: matched contains "id", "method", and "params".

    3. Let command id be matched["id"].

    4. Let method be matched["method"].

    5. Let command be the command with command name method.

    6. If session is null and command is not a static command:

      1. Send an error response given connection, command id, and invalid session id.

      2. Return.

    7. Run the following steps in parallel:

      1. Let result be the result of running the remote end steps for command given session and command parameters matched["params"].

      2. If result is an error:

        1. Send an error response given connection, command id, and result’s error code.

        2. Return.

      3. Let value be result’s data.

      4. Assert: value matches the definition for the result type corresponding to the command with command name method.

      5. If method is "session.new":

        1. Let session be the entry in the list of active sessions whose session ID is equal to the "sessionId" property of value.

        2. Append connection to session’s session WebSocket connections.

        3. Remove connection from the set of WebSocket connections not associated with a session.

      6. Let response be a new map matching the CommandResponse production in the local end definition with the id field set to command id and the value field set to value.

      7. Let serialized be the result of serialize an infra value to JSON bytes given response.

      8. Send a WebSocket message comprised of serialized over connection.

  6. Otherwise:

    1. Let command id be null.

    2. If parsed is a map and parsed["id"] exists and is an integer greater than or equal to zero, set command id to that integer.

    3. Let error code be invalid argument.

    4. If parsed is a map and parsed["method"] exists and is a string, but parsed["method"] is not in the set of all command names, set error code to unknown command.

    5. Send an error response given connection, command id, and error code.

To emit an event given session, and body:

  1. Assert: body has size 2 and contains "method" and "params".

  2. Let serialized be the result of serialize an infra value to JSON bytes given body.

  3. For each connection in session’s session WebSocket connections:

    1. Send a WebSocket message comprised of serialized over connection.

To send an error response given a WebSocket connection connection, command id, and error code:

  1. Let error data be a new map matching the ErrorResponse production in the local end definition, with the id field set to command id, the error field set to error code, the message field set to an implementation-defined string containing a human-readable definition of the error that occurred and the stacktrace field optionally set to an implementation-defined string containing a stack trace report of the active stack frames at the time when the error occurred.

  2. Let response be the result of serialize an infra value to JSON bytes given error data.

    Note: command id can be null, in which case the id field will also be set to null, not omitted from response.

  3. Send a WebSocket message comprised of response over connection.

To handle a connection closing given a WebSocket connection connection:

  1. If there is a session associated with connection connection:

    1. Let session be the session associated with connection connection.

    2. Remove connection from session’s session WebSocket connections.

    3. If session’s session WebSocket connections is empty:

      1. Remove session from active sessions.

  2. Otherwise, if the set of WebSocket connections not associated with a session contains connection, remove connection from that set.

6.1. Establishing a Connection

The URL to the WebSocket server is communicated out-of-band. When an implementation is ready to accept requests to start an AT Driver session, it must:

  1. Start listening for a WebSocket connection.

7. Modules

7.1. The session Module

7.1.1. Definition

Remote end definition:

SessionCommand = (SessionNewCommand)

Local end definition

SessionResult = (SessionNewResult)

7.1.2. Types

7.1.2.1. The session.CapabilitiesRequest Type

Remote end definition and local end definition:

CapabilitiesRequest = {
  ?atName: text,
  ?atVersion: text,
  ?platformName: text,
  Extensible,
}

The CapabilitiesRequest type represents capabilities requested for a session.

7.1.3. Commands

7.1.3.1. The session.new Command

The session.new command allows creating a new session. This is a static command.

Command Type
SessionNewCommand = {
  method: "session.new",
  params: {capabilities: CapabilitiesRequestParameters},
}

CapabilitiesRequestParameters = {
  ?alwaysMatch: CapabilitiesRequest,
}

Note: firstMatch is not included currently to reduce complexity.

Result Type
SessionNewResult = {
  sessionId: text,
  capabilities: {
    atName: text,
    atVersion: text,
    platformName: text,
    Extensible,
  }
}

The remote end steps given session and command parameters are:

  1. If session is not null, return an error with error code session not created.

  2. If the list of active sessions is not empty, then return error with error code session not created.

  3. If the implementation is unable to start a new session for any reason, return an error with error code session not created.

  4. Let capabilities be the result of trying to process capabilities with command parameters.

  5. If capabilities is null, return error with error code session not created.

  6. Let session id be the result of generating a UUID.

  7. Let session be a new session with the session ID of session id.

  8. Append session to active sessions.

  9. Start an instance of the appropriate assistive technology, given capabilities.

  10. Let body be a new map matching the SessionNewResult production, with the sessionId field set to session’s session ID, and the capabilities field set to capabilities.

  11. Return success with data body.

7.2. The settings Module

Currently, there are no standardized settings. Implementations are strongly encouraged to review the security implications of each setting they offer to end users, and only expose the settings that they deem safe. This specification does not define what constitutes a setting, but the settings module is designed to control user preferences such as the default voice, or the default rate of speech.

A remote end has an associated set of supported settings, which is either null or a set of strings which contains the name of every setting that may be referenced by this module.

To validate setting name given string name:

  1. If supported settings is null:

    1. Return "unknown".

  2. If supported settings contains name:

    1. Return "valid".

  3. Return "invalid".

To get settings given a list of strings names:

  1. Let items be a new list.

  2. For each name of names:

    1. If validate setting name given name is "invalid":

      1. Return an error with error code invalid argument.

    2. Let value be the value of the setting named name.

    3. Let item be a new map matching the SettingsGetSettingsResultItem production in the local end definition with the name field set to name and the value field set to value.

    4. Append item to items.

  3. Let body be a new map matching the SettingsGetSettingsResult production, with the settings field set to items.

  4. Return success with data body.

To modify setting given string name and value:

  1. If validate setting name given name is "invalid":

    1. Return an error with error code invalid argument.

  2. Take any implementation-defined steps to change the remote end setting named name to the value value.

  3. If there is any implementation-defined indication that the setting named name does not hold the value value:

    1. Return an error with error code invalid argument.

  4. Return success with data null.

Note: Today’s implementations may not be able to detect invalid setting names or values, limiting their ability to report when operations do not model authentic interactions with the internal state. The algorithms in this module are designed to reflect this.

Require implementations to maintain a static list of supported settings.

7.2.1. Definition

Remote end definition

SettingsCommand = {
  SettingsSetSettingsCommand //
  SettingsGetSettingsCommand //
  SettingsGetSupportedSettingsCommand
}

Local end definition

SettingsResult = {
  SettingsGetSettingsResult
}

7.2.2. Types

7.2.2.1. The SettingsGetSettingsResult type

Local end definition:

SettingsGetSettingsResult = {
  settings: [1* SettingsGetSettingsResultItem ],
}

SettingsGetSettingsResultItem = {
  name: text,
  value: any,
  Extensible,
}

The SettingsGetSettingsResult type contains a list of settings and their values.

7.2.3. Commands

7.2.3.1. The settings.setSettings Command

The settings.setSettings command sets the values of one or more settings.

Note: Today’s implementations may not be able to detect failed modification operations. settings.setSettings is designed to reflect that reality. Clients should therefore interpret successes with some skepticism as such results do not necessarily indicate that the referenced setting has the desired value.

Require implementations to report failures in settings modification operations.

Command Type
SettingsSetSettingsCommand = {
  method: "settings.setSettings",
  params: SettingsSetSettingsParameters
}

SettingsSetSettingsParameters = {
  settings: [1* SettingsSetSettingsParametersItem ],
}

SettingsSetSettingsParametersItem = {
  name: text,
  value: any,
  Extensible,
}
Result Type
EmptyResult

The remote end steps given session and command parameters are:

  1. Let settings be the value of the settings field of command parameters.

  2. For each setting of settings:

    1. Let name be the value of the name field of setting.

    2. Let value be the value of the value field of setting.

    3. Try to modify setting with name and value.

  3. Let body be a new map.

  4. Return success with data body.

7.2.3.2. The settings.getSettings Command

The settings.getSettings command returns a list of the requested settings and their values.

Command Type
SettingsGetSettingsCommand = {
  method: "settings.getSettings",
  params: SettingsGetSettingsParameters
}

SettingsGetSettingsParameters = {
  settings: [1* SettingsGetSettingsParametersItem ],
}

SettingsGetSettingsParametersItem = {
  name: text,
  Extensible,
}
Result Type
SettingsGetSettingsResult

The remote end steps given session and command parameters are:

  1. Let names be the value of the settings field of command parameters.

  2. Return the result of get settings with names.

7.2.3.3. The settings.getSupportedSettings Command

The settings.getSupportedSettings command returns a list of all settings that the remote end supports, and their values.

Command Type
SettingsGetSupportedSettingsCommand = {
  method: "settings.getSupportedSettings",
  params: EmptyParams
}
Result Type
SettingsGetSettingsResult

The remote end steps given session and command parameters are:

  1. If supported settings is null:

    1. Let names be a new list.

  2. Otherwise:

    1. Let names be supported settings.

  3. Let result be the result of get settings with names.

  4. Assert: result is a success value.

  5. Return result.

7.2.4. Events

Do we need a "setting changed" event?

7.3. The Interaction Module

7.3.1. Definition

Remote end definition:

InteractionCommand = (InteractionPressKeysCommand)

Local end definition:

InteractionEvent = (InteractionCapturedOutputEvent)

7.3.2. Types

InteractionCapturedOutputParameters = {
  data: text,
  Extensible,
}

7.3.3. Commands

7.3.3.1. The interaction.pressKeys Command

The interaction.pressKeys command simulates pressing a key combination on a keyboard.

This command does not yet have a means for indicating a screen-reader specific modifier key (or keys). [Issue #34]

This algorithm only supports one specific kind of press/release sequence, and it is not clear if that is sufficient to express all keyboard commands in all implementations. [Issue #51]

Command Type
InteractionPressKeysCommand = {
  method: "interaction.pressKeys",
  params: InteractionPressKeysParameters
}

InteractionPressKeysParameters = {
  "keys" => KeyCombination,
  Extensible,
}

KeyCombination = [
  1* text
]
Result Type
EmptyResult

Note: Each string in KeyCombination represents a "raw key" consisting of a single code point with the same meaning as in WebDriver’s keyboard actions. For example, ["\uE008", "a"] means holding the left shift key and pressing "a", and then releasing the left shift key. [WEBDRIVER]

The remote end steps given session and command parameters are:

  1. Try to check that keyboard interaction can be simulated.

  2. Try to check that one of the expected applications has focus.

  3. Let keys be the value of the keys field of command parameters.

  4. For each key of keys:

    1. Run implementation-defined steps to simulate depressing key.

  5. For each key of keys in reverse List order:

    1. Run implementation-defined steps to simulate releasing key.

  6. Let body be a new map.

  7. Return success with data body.

7.3.4. Events

7.3.4.1. The interaction.capturedOutput Event
Event Type
InteractionCapturedOutputEvent = {
  method: "interaction.capturedOutput",
  params: InteractionCapturedOutputParameters
}

The remote end event trigger is:

When the assistive technology would send some text data (a string, without speech-specific markup or annotations) to the Text-To-Speech system, or equivalent for non-speech assistive technology software, run these steps:

  1. If data should be withheld:

    1. Return.

  2. Let params be a map matching the InteractionCapturedOutputParameters production with the data field set to data.

  3. Let body be a map matching the InteractionCapturedOutputEvent production with the params field set to params.

  4. For each session of active sessions:

    1. Emit an event with session and body.

8. Privacy

It is advisable that remote ends create a new profile when creating a new session. This prevents potentially sensitive session data from being accessible to new sessions, ensuring both privacy and preventing state from bleeding through to the next session.

9. Security

An assistive technology can rely on a command-line flag or a configuration option to test whether to enable AT Driver, or alternatively make the assistive technology initiate or confirm the connection through a privileged content document or control widget, in case the assistive technology does not directly implement the WebSocket endpoints.

It is strongly suggested that assistive technology require users to take explicit action to enable AT Driver, and that AT Driver remains disabled in publicly consumed versions of the assistive technology.

To prevent arbitrary machines on the network from connecting and creating sessions, it is suggested that only connections from loopback devices are allowed by default.

The remote end can include a configuration option to limit the accepted IP range allowed to connect and make requests. The default setting for this might be to limit connections to the IPv4 localhost CIDR range 127.0.0.0/8 and the IPv6 localhost address ::1. [RFC4632]

It is also suggested that assistive technologies make an effort to indicate that a session that is under control of AT Driver. The indication should be accessible also for non-visual users. For example, this can be done through an OS-level notification or alert dialog.

TODO sandbox (limit availability to information that apps usually can’t access, e.g. login screen).

TODO no HID level simulated keypresses.

TODO exclude access to any security-sensitive settings.

TODO exclude access to any security-sensitive commands.

Appendix A: Schemas

The remote end definition and local end definition are available as non-normative CDDL and JSON Schema schemas:

The JSON Schema files are not yet generated from the CDDL and so might be out of date. [Issue #23]

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Conformant Algorithms

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps can be implemented in any manner, so long as the end result is equivalent. In particular, the algorithms defined in this specification are intended to be easy to understand and are not intended to be performant. Implementers are encouraged to optimize.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[INFRA]
Anne van Kesteren; Domenic Denicola. Infra Standard. Living Standard. URL: https://infra.spec.whatwg.org/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119
[RFC6455]
I. Fette; A. Melnikov. The WebSocket Protocol. December 2011. Proposed Standard. URL: https://www.rfc-editor.org/rfc/rfc6455
[RFC8610]
H. Birkholz; C. Vigano; C. Bormann. Concise Data Definition Language (CDDL): A Notational Convention to Express Concise Binary Object Representation (CBOR) and JSON Data Structures. June 2019. Proposed Standard. URL: https://www.rfc-editor.org/rfc/rfc8610
[RFC9562]
K. Davis; B. Peabody; P. Leach. Universally Unique IDentifiers (UUIDs). May 2024. Proposed Standard. URL: https://www.rfc-editor.org/rfc/rfc9562

Informative References

[RFC4632]
V. Fuller; T. Li. Classless Inter-domain Routing (CIDR): The Internet Address Assignment and Aggregation Plan. August 2006. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc4632
[WEBDRIVER]
Simon Stewart; David Burns. WebDriver. URL: https://w3c.github.io/webdriver/

Issues Index

Is the "OS focus" check a viable security restriction for "send keys"? [Issue #77]
Require implementations to maintain a static list of supported settings.
Require implementations to report failures in settings modification operations.
Do we need a "setting changed" event?
This command does not yet have a means for indicating a screen-reader specific modifier key (or keys). [Issue #34]
This algorithm only supports one specific kind of press/release sequence, and it is not clear if that is sufficient to express all keyboard commands in all implementations. [Issue #51]
TODO sandbox (limit availability to information that apps usually can’t access, e.g. login screen).
TODO no HID level simulated keypresses.
TODO exclude access to any security-sensitive settings.
TODO exclude access to any security-sensitive commands.
The JSON Schema files are not yet generated from the CDDL and so might be out of date. [Issue #23]