Automatically Generated Captions

Status: This is an incomplete, unapproved draft. The current draft is at wai-media-guide.netlify.com/

Link to this section: Shortcut to copy the link: ctrl+C or ⌘C

The basics of automatic captions

Tools exist today that use sophisticated speech-to-text (STT) technology to turn a program's soundtrack into a timed caption file, ready for inclusion with corresponding video. In fact, most videos uploaded to YouTube are captioned by Google's automatic-captioning process, something many authors do not know. Automatic captions are available in a number of languages. However, the accuracy of these captions is frequently quite low and results in poor-quality captions that often contain…

text that does not match words spoken in the audio;
poor timing (e.g., captions that do not appear synchronously with the audio);
spelling errors;
little or no punctuation;
missing capitalization;
occasional obscenities (swears, for example).

Link to this section: Shortcut to copy the link: ctrl+C or ⌘C

E-mail a link to this section

Using automatic captions responsibly

Automatically generated captions should never be used as the sole method to produce captions. However, they can be used as a first-pass or rough-draft effort in the workflow that eventually leads to an accurate, high-quality caption track. Below is a sample workflow for using auto-captions as part of the caption-production process, using YouTube's auto-caption service as an example.

Example:

Upload a video to YouTube.
Generate automatic captions.
Download the track.
Using a caption editor, correct spelling, grammar and timing errors.
Export the cleaned-up caption file to the appropriate caption format for YouTube.
Upload the new caption file to YouTube.

Once an accurate caption track has been uploaded, disable the automatic-caption track.

Link to this section: Shortcut to copy the link: ctrl+C or ⌘C

E-mail a link to this section

These tutorials provide best-practice guidance on implementing accessibility in different situations. This page combines the following WCAG 2.0 success criteria and techniques from different conformance levels:

Success Criteria:

1.2.2 Captions (Prerecorded): Captions are provided for all prerecorded audio content in synchronized media, except when the media is a media alternative for text and is clearly labeled as such. (Level A)

The basics of automatic captions

Using automatic captions responsibly

Related WCAG 2.0 resources