Voice systems are systems that a user interacts with by listening to spoken prompts from an automated system.
The user responds by either
pressing keys on a telephone keypad or by speaking (or both).
Voice systems are widespread in telephone self-service applications for customer support.
It is worth noting that many crucial systems are dependent on this technology such as emergency notification,
healthcare appointment reminders or prescription refilling, and others.
Therefore full accessibility needs to be supported.
Voice systems are often implemented with the W3C VoiceXML standard and supporting standards from the
Voice Browser Working Group.
See [[voicexml20]] and [[voicexml21]]
However, it is important to emphasize that issues of cognitive accessibility for voice systems
apply without regard to whether
a voice system is implemented using the W3C voice standards or with a proprietary technology.It is impossible
for a user to tell what technologies are used in the underlying voice platform, but the usability
principles will be the same whatever the underlying technology is.
An example use case may be as follows:
- The user may be asked "For sports press 1, For weather press 2,
For Stargazer astrophysics press 3." The system then waits for a response.
- Accessibility is discussed for the hard of hearing, and
WCAG and WAI specification are cited as being relevant
Beyond that, no examples or concerns are identified for cognitive accessibility.
Challenges for People with Cognitive Disabilities
Voice technology can be very problematic for people with cognitive disabilities, due to its heavy demands on
memory and on the ability to understand and produce speech in real time.
Effect of memory impairments on users' ability to understand and respond to prompts
A good working memory is essential for using menu-based systems that present several choices
to the user and ask them to select one choice, whether by speaking or through a key presss.
The user needs hold multiple pieces of transitory information in the mind
such as the number that is being presented as an option, whilst processing the terms that follow.
A good short term memory (several seconds) is essential so that the user can remember the
number or the term.
Without these functions the user is likely to select the wrong number.
Users need to be able to decide when to act on a menu choice. While a menu is being presented,
should they wait to hear more
options or should they select a choice that seems correct before hearing all the options?
Limitations of executive function may also cause problems
when the system response is too slow. The user may not know whether their input has
registered with the system, and consequently may press the key or speak again.
Effect of impaired reasoning
The use needs may need to compare similar options
such as "billing", "accounts", "sales" and
decide which is the service that is best suited to solve the issue at hand.
Without strong reasoning skills the user is likely to select the wrong menu option.
Advertisements and additional, unrequested information also increase the amount of processing required.
Effect of attention related limitations
The use needs to focus on the different options and select the correct one.
A person with impaired attention may have difficulties maintaining the necessary
focus for a long or multi level menu. Advertising and additional,
unrequested information also make it harder to retain attention.
Effect of impaired language and auditory perception related functions
The user needs to interpret the correct terms and match them to their needs within a certain time limit.
This involves speech perception and language understanding: sounds of language are heard,
interpreted and understood,
within a given time.
Effect of impaired speech and language production functions (for speech-recognition systems)
The user needs to be able to formulate a spoken response to the prompt before the system "times out" and generates
another prompt. In the most common type of speech-recognition system (directed dialog) the user only
needs to be able to speak a word or short phrase. However, some systems ("natural language systems") allow the
user to describe their issue in detail. While this feature is an advantage for some users because it
does not require them to remember menu options, it can be problematic for users with disorders like
aphasia who have difficulty speaking.
Effect of reduced knowledge
The user needs to be familiar with the terms used in the menu,
even if they are not relevant to the service options required.
- For users who are unable to use the automated system, it must be possible to reach a human, either in a call
center or another operator, through an easy transfer process (that is, not by being directed
to call another phone number).
- There should be a reserved digit for requesting a human operator.
The most common digit used for this
purpose is "0"; however, if another digit is already in widespread use in a particular country, then
that digit should always be available to get to a human agent. Systems especially should not attempt
to make it difficult for users to reach an agent through the use of complex digit combinations.
This could be enforced by requiring implementations to not allow the reserved digit
to mean anything other than going to an operator.
- Other digits similarly could be used for specific reserved functions, keeping in mind that too many
reserved digits will be confusing and difficult to learn. Remembering more than one or two reserved digits may be problematic for some users, but repeated verbal recitals of the reserved digits will also be distracting.
User-specific settings can be used to customize the voice user interface, keeping in mind that
the available mechanisms for
invoking user-specific settings are minimal in a voice interface (speech or DTMF tones). If it is difficult to set user
preferences, they won't be used. Setting preferences by natural language is the most natural ("slow down!") but is not currently very common.
- Extra time should be a user setting for both the speed of speech and ability for the user
to define if they need a slower speech or more input time etc.
- Timed text should be adjustable (as with all accessible media).
- The user should be able to extend or disable time out as a system default on their device
- Error recovery should be simple, and take you to a human operator. Error response should not though the user off the line or send them to a more complex menu. Preferably they should use a reserved digit.
- Timed text should be adjustable (as with all accessible media).
- Advertisement and other information should not be read as it can confuse the user and can make it harder to retain attention.
- Terms used should be as simple as possible.
- Examples and advice should be given on how to build a prompt that reduces the cognitive load
- Example 1: Reducing cognitive load: The prompt "press 1 for the the secretary," requires the user to remember the digit 1 while interpreting the term secretary. It is less good then the prompt "for the secretary (pause): press 1" or " for the secretary (pause) or for more help (pause): press 1"
- Example 2: Setting a default for a human operator as the number 0
Follow best practices in general VUI design
Standard best practices in voice user interface apply to users with cognitive disabilities, and should be followed.
A good reference is published by The Association for
Voice Interaction Design Wiki [AVIxD].
Another good reference is [ETSI ETR 096].
Some examples of generally accepted best practices in voice user interface design:
See the AVIxD wiki cited above for additional recommendation and detail.
- Pauses are important between phrases in order to allow processing time of language and options.
- Options in text should be given before the digit to select, or the instruction to
select that option. This will mean that the user does not need to remember the
digit or instruction whilst processing the term. For example: The
prompt "press 1 for the the secretary,"
requires the user to remember the digit 1 while interpreting the term "secretary".
A better prompt is "for the secretary (pause): press 1" or " for the secretary (pause) or for more help (pause): press 1"
- Error recovery should be simple, and take the user to a human operator if the error persists.
Error responses should not end the call or send the user to a more complex menu.
- Advertisements and other extraneous information should not be read as it can confuse the
user and can make it harder to retain attention.
- Terms used should be as simple and jargon-free as possible.
- Tapered prompts should be used to increase the level of prompt detail when the
user does not respond as expected.
Considerations for Speech Recognition
- For speech recognition based systems,
an existing ETSI standard for voice commands for many European languages
exists and should be used where possible [ETSI 202 076],
keeping in mind that expecting people to learn more than a few commands places a burden on the user.
- Natural language understanding systems allow users to state their
requests in their own words, and can be useful for users who have difficulty
remembering menu options, or who have difficulty mapping the offered menu options to
their goals. However, natural language interfaces can be difficult to
use for users who have difficulty
producing speech or language. Directed dialog (menu-based) fallback or
transfer to an agent
should be provided.
Follow requirements of legislation
For example, the U.S. Telecommunications Act Section 255 Accessibility Guidelines [Section255] paragraph 1193.41 Input, control, and mechanical functions, clauses
(g), (h) and (i) apply to cognitive disabilities and require that equipment should be operable without time-dependent controls, the ability to
speak, and should be operable by persons with limited cognitive skills.
Recent developments in call center technology may be helpful for users with cognitive disabilities.
- Visual IVR. When a call comes in on a smartphone, the system can ask the user if they want to
switch over to a visual interface which mirrors the voice interface. This allows a user to see the prompts
instead of having to remember them.
- Adaptive voice interface. This is a technology that is sensitive to the user's behavior and changes the voice interface dynamically.
For example, it can slow down or speed up to match the user's speech rate [Adaptive].
- Tapered prompts. Best practices in voice user interface design include providing several different prompts for each point in the interaction. The different prompts are used based on the user's behavior. For example, if the user takes a long time to respond to a prompt, a simpler or more explanatory version of the prompt by be used instead of the default.
- Human assistance. Although the user interacts normally with the voice system, in case
the system is unable to process the user's speech, a human agent acts behind the scenes to
perform the necessary processing. This would allow users with a limited ability to speak (whose speech might
not be recognized by a speech recognizer) to
interact with the system.
Status of these solutions
Note. The above proposed solutions have been tested for users in the general population and have
been shown to improve the usability of voice systems, although the extent to which they have been tested with users with cognitive disabilities is not clear.
Currently VoiceXML does not directly enforce accessibility for people with cognitive
disabilities. However, a considerable literature on voice user interface design exists and
is in many cases very applicable to cognitive accessibility for voice systems. Developers must
become aware of these resources and of the need to design systems
with these users in mind.