Natural Language Interface Accessibility User Requirements

W3C Editor's Draft

More details about this document
This version:
https://w3c.github.io/apa/
Latest published version:
https://www.w3.org/TR/naur/
Latest editor's draft:
https://w3c.github.io/apa/
History:
https://www.w3.org/standards/history/naur
Commit history
Editors:
(Educational Testing Service)
(W3C)
Feedback:
GitHub w3c/apa (pull requests, new issue, open issues)

Abstract

This document outlines various accessibility-related user needs, requirements and scenarios for natural language interfaces. These user needs should influence accessibility requirements in various related specifications, as well as the design of applications that include natural language interfaces including speech and voice input. The concept of a natural language interface is first clarified. User needs and associated requirements are then described.

This document is most explicitly not a collection of baseline requirements. It is also important to note that some of the requirements may be implemented at a system or platform level, and some may be authoring requirements for development of applications.

Status of This Document

This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

This document was published by the Accessible Platform Architectures Working Group as an Editor's Draft.

Publication as an Editor's Draft does not imply endorsement by W3C and its Members.

This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 2 November 2021 W3C Process Document.

1. Introduction

1.1 What is a Natural Language Interface?

A natural language interface is a user interface in which the user and the system communicate via a natural (human) language. The user provides input as sentences via speech or some other input, and the system generates responses in the form of sentences delivered by speech, text or another suitable modality.

Often, systems that provide natural language interfaces support spoken interaction. In this scenario, speech recognition is used to process the user's input, and speech synthesis is used to generate the system's spoken responses. However, the use of speech is not essential to a natural language interface.

Typical examples of natural language interfaces include:

These examples are indicative of applications that use natural language interfaces. They are not definitive. Variations of the examples and applications that do not fit these patterns at all are possible.

Note

1.2 Natural Language Interfaces and Accessibility

The accessibility of natural language interfaces to users with disabilities can be supported by a variety of features at the platform and application levels, including assistive technologies. Multiple modes of input and output should be provided to enable interaction with the system by users who have a variety of physical and sensory capabilities. For example, whereas speech input may be needed by some users with physical disabilities, keyboard input, switch input, or an eye tracking system may be needed by other users.

Similarly, natural language output may be spoken, or it may be provided as visually displayed text. These and other requirements are elaborated further below. In some cases, these requirements may best be satisfied by an assistive technology. For example, a chat bot that does not provide a spoken interface directly may nevertheless satisfy a user's need for speech input via a dictation function provided as part of the user's browser or operating system. There may also be other 'service' aspects that are specifically needed to better support accessibility that operate at a different layer to any modality input or output considerations.

The design of the application should support the cognitive needs of users, including those who have learning or cognitive disabilities. Discoverability, simplicity, and affordances for example, are important considerations in the design of the natural language interaction.

1.3 Voice User Interfaces

Voice user interfaces (VUI) using speech such as those found on a range of commercially available devices for home and mobile use represent a part of the stack that make up natural language interfaces. This document aims to identify accessibility related user needs and requirements for VUIs and indicate further areas of work and research in terms of how they relate to new standards like WCAG 3 and other emerging technologies.

1.4 Scope

Natural language interfaces frequently occur as components of larger user interfaces and systems. For example, a chat bot may be included in a web application. A natural language interface may be an essential part of a multi-modal application that uses a combinations of language and gestural inputs. An example would be an interactive navigation tool that allows the user to issue spoken commands and to interact with a graphical map with a pointing device.

The scope of this document is largely confined to the accessibility of the natural language aspect of the over-all user interface. It is concerned with the accessibility of natural language interactions to users with disabilities.

Editor's note

The scope of this work is currently under active discussion in the Research Questions Task Force.

1.5 Services and Agents

Behind these interfaces there are services that provide core processing, evaluation and content. This document aims to look at these services and determine to what degree they can and should support the needs of people with disabilities; what system requirements are, or where further research is needed.

Ideally by satisfying system requirements, developers of platforms and applications offering natural language interfaces can meet corresponding user needs. Currently, no stance is taken in this document regarding which needs are best satisfied at the platform level, by an assistive technology, or in the development of applications, but this will change as the document develops. These architectural considerations are left to be decided by system designers, and therefore there may be requirements in accessible system design that they need to be aware of. Often, they also depend on the services provided by the underlying operating system or by the web platform.

If natural language interaction is provided as part of a system that also offers other styles of interaction, this document should be read in combination with guidance provided elsewhere which is relevant to the other interface and service aspects. Notably,

As a general principle, the entire interface of a system or application needs to be accessible to users with disabilities. If only the natural language interaction component is accessible, some users will be unable to complete tasks successfully. For example, a smart agent that answers a user's questions by searching the web for information and then displaying it on screen is only accessible as a whole if both the interaction and the presentation of the information satisfy the user's access needs. If the on-screen information is not accessible, then the user cannot complete the task of acquiring and understanding the information requested.

1.6 User need definition

User needs relate to what a user needs from a particular application or platform to complete a task or to achieve a particular goal. User needs are dependent on the context in which an application is used, including the user's capabilities and the environmental conditions in which interaction with the interface takes place. For example, a spoken interaction would be inaccessible to a person who is deaf, or to a hearing person situated in a noisy environment. Although disability-related needs are the focus of this document, the user needs described here are not limited to people with specific types of disability. The capabilities of users vary greatly. They include a variety of physical, sensory, learning and cognitive abilities that should be taken into account in the design of platforms and applications.

The user needs and associated requirements are actively being reviewed by the Research Questions Task Force (RQTF) and by the Accessible Platform Architectures (APA) Working Group.

2. User needs and requirements

This section outlines a variety of user needs and system requirements that can satisfy them.

2.1 User Identification and Authentication

Note

To achieve adequate security, voice identification may need to be combined with other factors of authentication.

Note

In some cases, this requirement can be met simply by using authentication mechanisms provided by the underlying operating system or browser environment.

2.2 Means of Input and Output

Note

This requirement can often be met by supporting the input methods available from the underlying platform, including assistive technologies.

Note

If software that incorporates a natural language interface supports multiple input mechanisms, support for any specific mechanism may be available only on particular hardware devices or in particular environments. For example, a smart speaker may support only speech input, whereas the same smart agent running on a mobile system such as a phone or tablet may support text input via a keyboard or any device capable of emulating a keyboard.

Note

See the requirement to support a keyboard interface specified in WCAG 2.1 [WCAG21], success criterion 2.1.1.

Note

This requirement can often be met by supporting the output methods available from the underlying platform, including assistive technologies.

Note

If software that incorporates a natural language interface supports multiple output mechanisms, support for any specific mechanism may be available only on particular hardware devices or in particular environments. For example, a smart speaker supports only speech output, whereas the same smart agent running on a mobile system such as a phone or tablet may support a visual display as well, and may be compatible with braille devices.

Note

Support for braille displays is assumed to be provided by a screen reader running under the device's operating system. Therefore, support for keyboard input and textual output is the stated requirement for the natural language interface itself, leaving interaction with the braille hardware to the operating system on which the user interface is run.

2.3 Communicating in a Language that the User Needs

Note

At present, it is generally infeasible to implement REQ 9a and REQ 9b with sufficient reliability and accuracy to be useful. Sign language processing (including automatic recognition, translation, and production of sign languages) involves challenging research problems. See [Bragg-et-al] for details. These two requirements are nevertheless stated here to encourage further research and development efforts.

Note

Sign languages vary by country and region. Therefore, multiple sign languages may need to be supported, depending on the intended audience of the system.

2.4 Speech Recognition and Speech Production

Note

To ensure this user interface is accessible, it should satisfy relevant accessibility requirements drawn from this document or elsewhere. For example, a system could provide spoken commands, and a settings dialogue in a graphical user interface, as alternative mechanisms for configuring speech properties.

2.5 Visually displayed text

Note

In some cases, this requirement can be met by capabilities of the operating system or browsing environment.

Note

See the text spacing requirement specified in WCAG 2.1 [WCAG21], success criterion 1.4.12.

2.6 Designing for Understanding and Effective Use

2.6.1 Understanding How to Interact with the Interface

  • User Need 16: A user who is unfamiliar with the system or who has a learning or cognitive disability needs to know what the system can do and how to ask the system to do it.
  • REQ 16a: Provide commands with which the user can request help or instructions that give an overview of what the system can do and what requests or commands can be used to achieve it.
  • REQ 16b: Provide documentation in a form that satisfies accessibility guidelines which explains and gives examples of how to use the system.
Note

This need is particularly applicable to systems which can serve a wide range of requests, such as personal assistants. Although all users need to know how to interact with a system to start using it, those with learning or cognitive disabilities are likely to be differentially and adversely affected by designs that do not make it obvious what the system can achieve and how to set about achieving it.

  • User Need 17: A user who is unfamiliar with the system or who has a learning or cognitive disability needs to know how to interact with it to achieve a particular goal.
  • REQ 17a: Provide prompts or menus of options that inform the user of what choices are available and what information is requested at each step of a dialogue with the system.
  • REQ 17b: Provide commands or menu options for requesting explanations and instructions that help the user to complete tasks successfully.
  • User Need 18: A user who is unfamiliar with the system or who has a learning or cognitive disability needs to use it without having to learn specific commands, requests, phrases or vocabulary.
  • REQ 18a: Design the system to respond appropriately to a variety of alternative words, phrases and sentences that may be used to ask the same question, to give the same command, or to supply the same information.
  • REQ 18b: Design the system to respond appropriately to words and phrases that are likely to be familiar to users of other systems with similar features.
  • REQ 18c: If the user's input is ambiguous or cannot be processed, prompt for clarification or additional information, or present a menu of relevant choices.
Note

Ensuring this need is met typically involves including people with disabilities in data collection and testing procedures that enable software developers to improve the variety of linguistic inputs to which the system can appropriately respond.

Note

Commands for performing a variety of functions typically supported by speech interfaces used for telephony and multimedia applications are standardized in [ETSI-ES-202-076].

  • User Need 19: A user with a learning or cognitive disability needs to review information, prompts or questions before deciding how to respond.
  • REQ 19a: Design the system to comply with a user's requests for its natural language output (e.g., spoken utterances) to be repeated.
  • REQ 19b: Summarize or present information that has been supplied by the user, then ask the user for confirmation, before performing irreversible actions such as financial transactions.
  • REQ 19c: If the text of the dialogue between the user and the system is presented in writing (e.g., on screen or via a braille device), ensure that the user can review the entire history of the conversation (scrolling the display, if necessary).
Note

See WCAG 2.1 [WCAG21], success criterion 3.3.4.

2.6.2 Giving Users Enough Time to Interact

  • User Need 20: A user with a learning or cognitive disability needs ample time to decide how to respond during a dialogue with the system.
  • REQ 20a: Unless there are compelling reasons to the contrary, do not limit the amount of time available for the user to respond.
  • REQ 20b: If a time limit is unavoidable, allow the length of the time limit to be adjusted, or for the time limit to be eliminated, or prompt for the user to extend it before it expires.
  • REQ 20c: Warn users of time limits before any period of time that is subject to a limit begins.
  • REQ 20d: Provide a mode of operation in which the system reminds the user periodically that it is waiting for input, and of any time limit that has been imposed.
Note

The mode of operation described in requirement 19d may be distracting or anxiety-provoking for some users. Therefore, it should be optional.

Note

See WCAG 2.1 [WCAG21], success criteria 2.2.1, 2.2.3, and 2.2.6.

2.6.3 Communicating in Language that is Clear, Simple, and Appropriate to the Audience

  • User Need 21: Users, especially those who have learning or cognitive disabilities, need the system to use language that is clear and comprehensible to them.
  • REQ 21a: Use language (including vocabulary and syntax) that is no more complex than is necessary for clear communication.
  • REQ 21b: Use vocabulary (including terminology) that is reasonably predicted to be familiar to the intended users of the system, including users who may have learning or cognitive disabilities.
  • REQ 21c: Provide a mode of operation in which simpler language than the default can be requested.
  • REQ 21d: Provide definitions or explanations of terms that are likely to be unfamiliar to intended users of the system, including users who may have learning or cognitive disabilities.
Note

See WCAG 2.1 [WCAG21], success criteria 3.1.3, 3.1.4, and 3.1.5.

  • User Need 22: Users, especially those who have learning or cognitive disabilities, need the system to use language that is appropriate to their social and cultural context in order to be clear and understandable.
  • REQ 22a: Provide a mode of operation in which the use of language, including terminology, currency, units of measure, and date and time formats, is localized according to the user's preferences.
  • REQ 22b: By default, localize the use of language, including terminology, currency, units of measure, and date and time formats, to the user's country and region.

2.6.4 Pronunciation

  • User Need 23: Users, especially those who have learning or cognitive disabilities, need spoken language to be pronounced correctly in order to be understood.
  • REQ 23a: Provide a mode of operation in which the pronunciation (e.g., accent) of spoken language is localized according to the user's preferences.
  • REQ 23b: By default, localize the pronunciation of spoken language according to the user's country and region.
  • REQ 23c: Ensure that spoken text is pronounced correctly, including names, rarely occurring words, and words that have different pronunciations depending on context.

2.6.5 Avoiding and Recovering from Input Errors

  • User Need 24: Users, especially those who have learning or cognitive disabilities, need opportunities to correct data entry errors.
  • REQ 24a: Check information provided by the user for errors.
  • REQ 24b: If errors are detected that can be automatically corrected with high reliability, make the correction and then prompt the user to confirm the information provided.
  • REQ 24c: For errors that cannot be reliably and automatically corrected, provide an explanation to the user and request valid information.
  • REQ 24d: Provide suggestions for correcting the error, if there is a known and relatively short list of alternative, valid responses.
Note

See WCAG 2.1 [WCAG21], success criteria 3.3.1, 3.3.3, 3.3.4, and 3.3.6.

  • User Need 25: Users, especially those with learning or cognitive disabilities, need opportunities to avoid making errors that are irrevocable.
  • REQ 25a: Provide means of reversing actions that can be made reversible.
Note

See WCAG 2.1 [WCAG21], success criterion 3.3.6.

2.6.6 Using Multimodal Interfaces to Enhance Understanding

  • User Need 26: Some users with learning disabilities need textual information to be spoken and presented in written form simultaneously.
  • REQ 26: Provide a mode of operation in which textual information is spoken and presented on screen concurrently, with synchronized visual highlighting of the text as it is spoken.
Note

The purpose of this multimodal presentation of text is to enhance comprehension of the material, especially by people with learning disabilities that affect reading.

  • User Need 27: Some users with learning or cognitive disabilities need graphical content that complements and reinforces the meaning of textual information.
  • REQ 27: If appropriate graphical conventions exist for presenting information that is provided to the user, then display a graphical presentation in addition to any textual (e.g., spoken) output.
Note

Information presented graphically must also be available as text. See '§ 2.2 Means of Input and Output' above.

3. Enabling Funders

This work is supported by the EC-funded WAI-Guide Project.

A. References

A.1 Informative references

[Bragg-et-al]
Sign language recognition, generation, and translation: An interdisciplinary perspective. Danielle Bragg; Oscar Koller; Mary Bellard; Larwan Berke; Patrick Boudreault; Annelies Braffort; Naomi Caselli; Matt Huenerfauth; Hernisa Kacorri; Tessa Verhoef; Christian Vogler; Meredith Ringel Morris. The 21st International ACM SIGACCESS Conference on Computers and Accessibility. October 2019.
[ETSI-ES-202-076]
ETSI ES 202 076 V2.1.1: Human Factors (HF); User Interfaces; Generic spoken command vocabulary for ICT devices and services. ETSI. URL: https://www.etsi.org/deliver/etsi_es/202000_202099/202076/02.01.01_50/es_202076v020101m.pdf
[personal-assistant-architecture]
Intelligent Personal Assistant Architecture and Potential for Standardization Version 1.2. Voice Interaction Community Group. 19 July 2021. URL: https://w3c.github.io/voiceinteraction/voice%20interaction%20drafts/paArchitecture-1-2.htm
[raur]
RTC Accessibility User Requirements. Joshue O'Connor; Janina Sajka; Jason White; Michael Cooper. W3C. 25 May 2021. W3C Working Group Note. URL: https://www.w3.org/TR/raur/
[uaag]
User Agent Accessibility Guidelines (UAAG) 2.0. W3C. 15 December 2015. URL: https://www.w3.org/TR/UAAG20/
[WCAG21]
Web Content Accessibility Guidelines (WCAG) 2.1. Andrew Kirkpatrick; Joshue O'Connor; Alastair Campbell; Michael Cooper. W3C. 5 June 2018. W3C Recommendation. URL: https://www.w3.org/TR/WCAG21/
[xaur]
XR Accessibility User Requirements. W3C. 16 Sept 2020. URL: https://www.w3.org/TR/xaur/