Automatically Generated Captions
Status: This is an incomplete, unapproved draft. The current draft is at wai-media-guide.netlify.com/
The basics of automatic captions
Tools exist today that use sophisticated speech-to-text (STT) technology to turn a program's soundtrack into a timed caption file, ready for inclusion with corresponding video. In fact, most videos uploaded to YouTube are captioned by Google's automatic-captioning process, something many authors do not know. Automatic captions are available in a number of languages. However, the accuracy of these captions is frequently quite low and results in poor-quality captions that often contain…
- text that does not match words spoken in the audio;
- poor timing (e.g., captions that do not appear synchronously with the audio);
- spelling errors;
- little or no punctuation;
- missing capitalization;
- occasional obscenities (swears, for example).
Using automatic captions responsibly
Automatically generated captions should never be used as the sole method to produce captions. However, they can be used as a first-pass or rough-draft effort in the workflow that eventually leads to an accurate, high-quality caption track. Below is a sample workflow for using auto-captions as part of the caption-production process, using YouTube's auto-caption service as an example.
Once an accurate caption track has been uploaded, disable the automatic-caption track.
Related WCAG 2.0 resources
These tutorials provide best-practice guidance on implementing accessibility in different situations. This page combines the following WCAG 2.0 success criteria and techniques from different conformance levels:
Success Criteria:
1.2.2 Captions (Prerecorded): Captions are provided for all prerecorded audio content in synchronized media, except when the media is a media alternative for text and is clearly labeled as such. (Level A)