Understanding SC 1.2.4 Captions (Live) (Level AA)

In Brief

Goal: Live videos have captions.
What to do: Provide synchronized text for audio content in real-time videos.
Why it's important: People who are deaf or hard of hearing can understand audio in real-time video content.

Success Criterion (SC)

Captions are provided for all live audio content in synchronized media.

Intent

The intent of this success criterion is to enable people who are deaf or hard of hearing to watch real-time presentations. Captions provide the part of the content available via the audio track. Captions not only include dialogue, but also identify who is speaking and notate sound effects and other significant audio.

This success criterion was intended to apply to broadcast of synchronized media and is not intended to require that two-way multimedia calls between two or more individuals through web apps must be captioned regardless of the needs of users. Responsibility for providing captions would fall to the content providers (the callers) or the “host” caller, and not the application.

Benefits

People who are deaf or have a hearing loss can access the auditory information in the synchronized media content through captions.

Examples

A web cast: A news organization provides a live, captioned web cast.
A music web cast: An orchestra provides Communication Access Realtime Translation (CART) captioning of each real-time web performance. The CART service captures lyrics and dialog as well as identifies non-vocal music by title, movement, composer, and any information that will help the user comprehend the nature of the audio.

Related Resources

Resources are for information purposes only, no endorsement implied.

See 1.2.2 Captions (Prerecorded).

Techniques

Each numbered item in this section represents a technique or combination of techniques that the Accessibility Guidelines Working Group deems sufficient for meeting this success criterion. A technique may go beyond the minimum requirement of the criterion. There may be other ways of meeting the criterion not covered by these techniques. For information on using other techniques, see Understanding Techniques for WCAG Success Criteria, particularly the "Other Techniques" section.

Sufficient Techniques

G9: Creating captions for live synchronized media AND G93: Providing open (always visible) captions
G9: Creating captions for live synchronized media AND G87: Providing closed captions using one of the following techniques:
- SM11: Providing captions through synchronized text streams in SMIL 1.0
- SM12: Providing captions through synchronized text streams in SMIL 2.0
- Using any readily available media format that has a video player that supports closed captioning

Note

Captions may be generated using real-time text translation service.

Key Terms

ASCII art

picture created by a spatial arrangement of characters or glyphs (typically from the 95 printable characters defined by ASCII)

assistive technology

hardware and/or software that acts as a user agent, or along with a mainstream user agent, to provide functionality to meet the requirements of users with disabilities that go beyond those offered by mainstream user agents

Note 1

Functionality provided by assistive technology includes alternative presentations (e.g., as synthesized speech or magnified content), alternative input methods (e.g., voice), additional navigation or orientation mechanisms, and content transformations (e.g., to make tables more accessible).

Note 2

Assistive technologies often communicate data and messages with mainstream user agents by using and monitoring APIs.

Note 3

The distinction between mainstream user agents and assistive technologies is not absolute. Many mainstream user agents provide some features to assist individuals with disabilities. The basic difference is that mainstream user agents target broad and diverse audiences that usually include people with and without disabilities. Assistive technologies target narrowly defined populations of users with specific disabilities. The assistance provided by an assistive technology is more specific and appropriate to the needs of its target users. The mainstream user agent may provide important functionality to assistive technologies like retrieving web content from program objects or parsing markup into identifiable bundles.

Example

Assistive technologies that are important in the context of this document include the following:

screen magnifiers, and other visual reading assistants, which are used by people with visual, perceptual and physical print disabilities to change text font, size, spacing, color, synchronization with speech, etc. in order to improve the visual readability of rendered text and images;
screen readers, which are used by people who are blind to read textual information through synthesized speech or braille;
text-to-speech software, which is used by some people with cognitive, language, and learning disabilities to convert text into synthetic speech;
speech recognition software, which may be used by people who have some physical disabilities;
alternative keyboards, which are used by people with certain physical disabilities to simulate the keyboard (including alternate keyboards that use head pointers, single switches, sip/puff and other special input devices.);
alternative pointing devices, which are used by people with certain physical disabilities to simulate mouse pointing and button activations.

audio

the technology of sound reproduction

Note

Audio can be created synthetically (including speech synthesis), recorded from real world sounds, or both.

audio description

narration added to the soundtrack to describe important visual details that cannot be understood from the main soundtrack alone

Note 1

Audio description of video provides information about actions, characters, scene changes, on-screen text, and other visual content.

Note 2

In standard audio description, narration is added during existing pauses in dialogue. (See also extended audio description.)

Note 3

Where all of the important video information is already provided in existing audio, no additional audio description is necessary.

Note 4

Also called "video description" and "descriptive narration."

captions

synchronized visual and/or text alternative for both speech and non-speech audio information needed to understand the media content

Note 1

Captions are similar to dialogue-only subtitles except captions convey not only the content of spoken dialogue, but also equivalents for non-dialogue audio information needed to understand the program content, including sound effects, music, laughter, speaker identification and location.

Note 2

Closed Captions are equivalents that can be turned on and off with some players.

Note 3

Open Captions are any captions that cannot be turned off. For example, if the captions are visual equivalent images of text embedded in video.

Note 4

Captions should not obscure or obstruct relevant information in the video.

Note 5

In some countries, captions are called subtitles.

Note 6

Audio descriptions can be, but do not need to be, captioned since they are descriptions of information that is already presented visually.

extended audio description

audio description that is added to an audiovisual presentation by pausing the video so that there is time to add additional description

Note

This technique is only used when the sense of the video would be lost without the additional audio description and the pauses between dialogue/narration are too short.

human language

language that is spoken, written or signed (through visual or tactile means) to communicate with humans

Note