Copyright © 2021 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and permissive document license rules apply.
This document outlines various accessibility-related user needs, requirements and scenarios for natural language interfaces. These user needs should influence accessibility requirements in various related specifications, as well as the design of applications that include natural language interfaces including speech and voice input. The concept of a natural language interface is first clarified. User needs and associated requirements are then described.
This document is most explicitly not a collection of baseline requirements. It is also important to note that some of the requirements may be implemented at a system or platform level, and some may be authoring requirements for development of applications.
This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This document was published by the Accessible Platform Architectures Working Group as an Editor's Draft.
Publication as an Editor's Draft does not imply endorsement by W3C and its Members.
This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This document is governed by the 2 November 2021 W3C Process Document.
A natural language interface is a user interface in which the user and the system communicate via a natural (human) language. The user provides input as sentences via speech or some other input, and the system generates responses in the form of sentences delivered by speech, text or another suitable modality.
Often, systems that provide natural language interfaces support spoken interaction. In this scenario, speech recognition is used to process the user's input, and speech synthesis is used to generate the system's spoken responses. However, the use of speech is not essential to a natural language interface.
Typical examples of natural language interfaces include:
These examples are indicative of applications that use natural language interfaces. They are not definitive. Variations of the examples and applications that do not fit these patterns at all are possible.
The accessibility of natural language interfaces to users with disabilities can be supported by a variety of features at the platform and application levels, including assistive technologies. Multiple modes of input and output should be provided to enable interaction with the system by users who have a variety of physical and sensory capabilities. For example, whereas speech input may be needed by some users with physical disabilities, keyboard input, switch input, or an eye tracking system may be needed by other users.
Similarly, natural language output may be spoken, or it may be provided as visually displayed text. These and other requirements are elaborated further below. In some cases, these requirements may best be satisfied by an assistive technology. For example, a chat bot that does not provide a spoken interface directly may nevertheless satisfy a user's need for speech input via a dictation function provided as part of the user's browser or operating system. There may also be other 'service' aspects that are specifically needed to better support accessibility that operate at a different layer to any modality input or output considerations.
The design of the application should support the cognitive needs of users, including those who have learning or cognitive disabilities. Discoverability, simplicity, and affordances for example, are important considerations in the design of the natural language interaction.
Voice user interfaces (VUI) using speech such as those found on a range of commercially available devices for home and mobile use represent a part of the stack that make up natural language interfaces. This document aims to identify accessibility related user needs and requirements for VUIs and indicate further areas of work and research in terms of how they relate to new standards like WCAG 3 and other emerging technologies.
Natural language interfaces frequently occur as components of larger user interfaces and systems. For example, a chat bot may be included in a web application. A natural language interface may be an essential part of a multi-modal application that uses a combinations of language and gestural inputs. An example would be an interactive navigation tool that allows the user to issue spoken commands and to interact with a graphical map with a pointing device.
The scope of this document is largely confined to the accessibility of the natural language aspect of the over-all user interface. It is concerned with the accessibility of natural language interactions to users with disabilities.
The scope of this work is currently under active discussion in the Research Questions Task Force.
Behind these interfaces there are services that provide core processing, evaluation and content. This document aims to look at these services and determine to what degree they can and should support the needs of people with disabilities; what system requirements are, or where further research is needed.
Ideally by satisfying system requirements, developers of platforms and applications offering natural language interfaces can meet corresponding user needs. Currently, no stance is taken in this document regarding which needs are best satisfied at the platform level, by an assistive technology, or in the development of applications, but this will change as the document develops. These architectural considerations are left to be decided by system designers, and therefore there may be requirements in accessible system design that they need to be aware of. Often, they also depend on the services provided by the underlying operating system or by the web platform.
If natural language interaction is provided as part of a system that also offers other styles of interaction, this document should be read in combination with guidance provided elsewhere which is relevant to the other interface and service aspects. Notably,
As a general principle, the entire interface of a system or application needs to be accessible to users with disabilities. If only the natural language interaction component is accessible, some users will be unable to complete tasks successfully. For example, a smart agent that answers a user's questions by searching the web for information and then displaying it on screen is only accessible as a whole if both the interaction and the presentation of the information satisfy the user's access needs. If the on-screen information is not accessible, then the user cannot complete the task of acquiring and understanding the information requested.
User needs relate to what a user needs from a particular application or platform to complete a task or to achieve a particular goal. User needs are dependent on the context in which an application is used, including the user's capabilities and the environmental conditions in which interaction with the interface takes place. For example, a spoken interaction would be inaccessible to a person who is deaf, or to a hearing person situated in a noisy environment. Although disability-related needs are the focus of this document, the user needs described here are not limited to people with specific types of disability. The capabilities of users vary greatly. They include a variety of physical, sensory, learning and cognitive abilities that should be taken into account in the design of platforms and applications.
The user needs and associated requirements are actively being reviewed by the Research Questions Task Force (RQTF) and by the Accessible Platform Architectures (APA) Working Group.
This section outlines a variety of user needs and system requirements that can satisfy them.
To achieve adequate security, voice identification may need to be combined with other factors of authentication.
In some cases, this requirement can be met simply by using authentication mechanisms provided by the underlying operating system or browser environment.
This requirement can often be met by supporting the input methods available from the underlying platform, including assistive technologies.
If software that incorporates a natural language interface supports multiple input mechanisms, support for any specific mechanism may be available only on particular hardware devices or in particular environments. For example, a smart speaker may support only speech input, whereas the same smart agent running on a mobile system such as a phone or tablet may support text input via a keyboard or any device capable of emulating a keyboard.
See the requirement to support a keyboard interface specified in WCAG 2.1 [WCAG21], success criterion 2.1.1.
This requirement can often be met by supporting the output methods available from the underlying platform, including assistive technologies.
If software that incorporates a natural language interface supports multiple output mechanisms, support for any specific mechanism may be available only on particular hardware devices or in particular environments. For example, a smart speaker supports only speech output, whereas the same smart agent running on a mobile system such as a phone or tablet may support a visual display as well, and may be compatible with braille devices.
Support for braille displays is assumed to be provided by a screen reader running under the device's operating system. Therefore, support for keyboard input and textual output is the stated requirement for the natural language interface itself, leaving interaction with the braille hardware to the operating system on which the user interface is run.
At present, it is generally infeasible to implement REQ 9a and REQ 9b with sufficient reliability and accuracy to be useful. Sign language processing (including automatic recognition, translation, and production of sign languages) involves challenging research problems. See [Bragg-et-al] for details. These two requirements are nevertheless stated here to encourage further research and development efforts.
Sign languages vary by country and region. Therefore, multiple sign languages may need to be supported, depending on the intended audience of the system.
To ensure this user interface is accessible, it should satisfy relevant accessibility requirements drawn from this document or elsewhere. For example, a system could provide spoken commands, and a settings dialogue in a graphical user interface, as alternative mechanisms for configuring speech properties.
In some cases, this requirement can be met by capabilities of the operating system or browsing environment.
See the text spacing requirement specified in WCAG 2.1 [WCAG21], success criterion 1.4.12.
This need is particularly applicable to systems which can serve a wide range of requests, such as personal assistants. Although all users need to know how to interact with a system to start using it, those with learning or cognitive disabilities are likely to be differentially and adversely affected by designs that do not make it obvious what the system can achieve and how to set about achieving it.
Ensuring this need is met typically involves including people with disabilities in data collection and testing procedures that enable software developers to improve the variety of linguistic inputs to which the system can appropriately respond.
Commands for performing a variety of functions typically supported by speech interfaces used for telephony and multimedia applications are standardized in [ETSI-ES-202-076].
See WCAG 2.1 [WCAG21], success criterion 3.3.4.
The mode of operation described in requirement 19d may be distracting or anxiety-provoking for some users. Therefore, it should be optional.
See WCAG 2.1 [WCAG21], success criteria 2.2.1, 2.2.3, and 2.2.6.
See WCAG 2.1 [WCAG21], success criteria 3.1.3, 3.1.4, and 3.1.5.
See WCAG 2.1 [WCAG21], success criteria 3.3.1, 3.3.3, 3.3.4, and 3.3.6.
See WCAG 2.1 [WCAG21], success criterion 3.3.6.
The purpose of this multimodal presentation of text is to enhance comprehension of the material, especially by people with learning disabilities that affect reading.
Information presented graphically must also be available as text. See '§ 2.2 Means of Input and Output' above.
This work is supported by the EC-funded WAI-Guide Project.