Production Options for Audio Descriptions
Status: This is an incomplete, unapproved draft. The current draft is at wai-media-guide.netlify.com/
Description basics
Audio descriptions are an extra track of narration (audio or text) that conveys information about important visuals, such as body language, changes in scenery and context, charts and diagrams. Audio descriptions can be open or closed:
- Open audio descriptions are integrated into the program audio track and are heard by everyone. They cannot be turned off.
- Closed audio descriptions can be turned on and off by viewers.
Additionally…
- Audio descriptions are usually timed to play during pauses or breaks in narration or dialog, although extended audio descriptions may be implemented where necessary.
- In cases where no pauses are available, a single summary, called a pre-description, can be inserted at the beginning of the presentation.
- Audio-description tracks can be presented as pre-recorded human-recorded speech or text-to-speech (TTS) audio, or they can be text tracks that are delivered on the fly invisibly and read aloud by screen readers.
Most described content today is presented with open descriptions, using one of two options:
- Two separate videos, one with open descriptions, and the other with no descriptions. Authors give users a link or some other method to choose one or the other.
- A single video that contains two audio tracks, one with descriptions and one without. Authors give users a button or menu to switch from one track to the other.
Pre-produced audio descriptions
Describing a video can be a time-consuming and complex process, depending on the subject matter. Before beginning, take a look at the description decision tree to determine if descriptions are even necessary. For longer videos, it may be more time- and cost-efficient to hire a professional audio-description service provider to write and record descriptions.
Descriptions are usually recorded as human narration before being integrated into the video presentation, but technology and markup now exist to convey descriptions as text which are read aloud on the fly by screen readers or other text-to-speech (TTS) methods. Read more about text-to-speech descriptions.
An excellent place to learn the basics about audio descriptions is The Description Key. See an example of a described video, and be sure to select the “Enable audio description” button located just below the player to turn on the descriptions. Other audio-description samples are available from the American Council of the Blind’s Audio Description Project.
Production workflow: audio descriptions (human-recorded narration)
Basic workflow for creating pre-produced audio descriptions:
When recording the descriptions, create the highest-quality audio files possible. Keep these points in mind:
- Use the highest-quality microphone and recording software available.
- Use a microphone stand and speak clearly into the microphone.
- Record the descriptions in a room that is isolated from all external sounds.
- Avoid rooms with hard surfaces (e.g., tile or wood floors).
- When mixing the descriptions into the program audio, lower the program-audio level when the description plays while simultaneously raising the description's audio level. When the description is finished playing, lower the description audio level and raise the program-audio level to its proper setting. Repeat this process (known as "ducking") for every description instance.
Producing TTS audio descriptions
TTS descriptions are not pre-recorded. Instead, they are transmitted at the appropriate intervals to users during playback, and are read aloud by the user's screen reader. Think of them as an invisible text track that screen readers can read aloud as the text is delivered. See examples of TTS descriptions. The basic workflow for TTS audio descriptions generally follows this pattern:
Basic workflow for creating pre-produced TTS audio descriptions:
Below is an image of a caption editor being used to timestamp an audio-description script.
Using the track
element and the kind
attribute, the descriptions can be delivered at the time of playback and a screen reader will read them aloud. Below is a code sample:
The kind
attribute will cause the description file to be received invisibly (i.e., off-screen) so sighted users will not see it, but screen readers will be aware of it. Screen readers will then read the description text as it is delivered, synchronized at the time of playback. Read more about techniques for delivering TTS descriptions. See functioning examples of TTS descriptions using the track
element along with Javascript to illustrate how screen readers will read off-screen descriptions aloud.
Extended descriptions
Typically, descriptions are written to fit into natural pauses in narration or dialog. However, there will be circumstances where the pauses are not long enough to accommodate a full description. In these cases, extended descriptions may be implemented. In an extended description, the video and audio tracks are programmatically paused when the description begins playing. When the description is finished playing, the video and audio tracks are programmatically resumed. At the next instance of an extended description, the process is repeated. Note that extended and "regular" descriptions may be mixed in a single multimedia presentation.
The only markup-based method for providing extended audio descriptions is to use SMIL 3.0, a language for writing interactive multimedia presentations. Support for SMIL is very limited, however: implementations will most likely require the use of plug-ins and/or heavily customized approaches. Other non-markup-based methods have been experimented with, such as creating an open-described video with a video track that appears to freeze while a separate open-description track plays. See an example of a video with open extended audio descriptions (created using a non-markup method); press the Play button on the player to watch the video and hear the descriptions. Also, read one method for creating videos with extended open audio descriptions (see chapter 4.9).</p>
Resources for writing descriptions
The importance of presenting users with high-quality, accurate descriptions cannot be overemphasized. Use the guidelines below to help create descriptions that are informative and useful.
Related WCAG 2.0 resources
These tutorials provide best-practice guidance on implementing accessibility in different situations. This page combines the following WCAG 2.0 success criteria and techniques from different conformance levels:
Success Criteria:
1.2.3 Audio Description or Media Alternative: An alternative for time-based media or audio description of the prerecorded video content is provided for synchronized media, except when the media is a media alternative for text and is clearly labeled as such. (Level A)
1.2.5 Audio Description (Prerecorded): Audio description is provided for all prerecorded video content in synchronized media. (Level AA)
1.2.7 Extended Audio Description (Prerecorded): Where pauses in foreground audio are insufficient to allow audio descriptions to convey the sense of the video, extended audio description is provided for all prerecorded video content in synchronized media. (Level AAA)
Techniques: