Producing Captions

Status: This is an incomplete, unapproved draft. The current draft is at wai-media-guide.netlify.com/

Captions enable people who are deaf or hard of hearing to follow media content, although they can be beneficial to everyone. Captions are a text display of dialog and narration and are written in the same language as the audio (subtitles, on the other hand, are a translation of the audio into another language). They also contain important non-speech information, such as sound effects or speaker cues.

Captions can be open or closed:

Closed captions can be hidden and revealed by users, typically by operating a button or menu on the player's control bar.
Open captions are continually displayed and cannot be turned off.

There are three ways to present captions to viewers:

Pop-on captions appear in discrete blocks and usually contain one to three rows of text. They are normally created for pre-produced material, and used for the majority of captioned online videos.

Example:
Roll-up captions scroll up onto the screen, one row at a time. These are normally created for live programming, but can also be used for pre-produced material.

Example:
Paint-on captions are text that appears to quickly unfurl onto the screen, one character at a time, as the data are received. Typically, when one row of captions finishes painting onto the screen, the row scrolls up and a new row begins paint onto the screen.

Example:

Note: If a video has no narration or dialog, inform viewers about this so they do not think that captions are missing from the video. Provide a brief caption at the beginning of the video that indicates that no audio is provided.

Link to this section: Shortcut to copy the link: ctrl+C or ⌘C

E-mail a link to this section

Production workflow for pre-produced captions

A typical caption file contains timing information and positioning codes in addition the text representing the audio track. Timing information indicates when each caption should appear or disappear from the screen; positioning codes indicate where on the screen the captions should appear. There are a variety of do-it-yourself tools available for creating captions for pre-produced video and audio clips.

Link to this section: Shortcut to copy the link: ctrl+C or ⌘C

E-mail a link to this section

Basic workflow for creating pre-produced captions:

Example:

Enter caption text into the editor
Transcribe the audio directly into the caption editor or, if available, import a transcript that has been prepared ahead of time using a text editor.
Edit and break text into captions
Edit for proper spelling and grammar; divide the text into caption blocks.
Time the captions
Assign a timecode to each caption that indicates when it will appear or disappear from the screen.
Review the captions
Watch the captioned video carefully and eliminate any errors in text, timing and positioning. Accuracy is paramount: misspelled or poorly edited and timed captions will only make it harder for viewers to follow what is happening on-screen.
Export a caption file
Export the captions in the player-specific target format so they can be synchronized with the media, or provided as a transcript. See caption formats and examples for more information.

Below is an image of a caption editor showing how captions have been formatted and timestamped. Each caption is assigned a start/display time. In this editor, captions that are not assigned end/erase times will simply be replaced when the next caption displays.

A caption editor showing timestamped caption text.

Link to this section: Shortcut to copy the link: ctrl+C or ⌘C

E-mail a link to this section

Caption quality

Always provide the highest-quality captions that convey 100% accuracy. Keep the following points in mind when writing captions:

Ensure that there are no spelling errors. This includes the names of characters or speakers.
Use conventional grammar rules. After end punctuation (period, question mark, exclamation point, etc.), always begin a new caption block.
Do not edit unless you have a specific reason to do so (e.g., to achieve a specific reading level). Fillers such as "um," "ah," etc., can be deleted to save reading time unless doing so alters the representation of the speaker.
Do not censor: captions should reflect the words that are spoken in the audio track. If objectionable words are used in the audio, the captions should show those words. If the audio is edited to obscure a specific word or phrase (e.g., "bleeped" audio), the captions should reflect the fact that a word or phrase has been obscured.

See resources for writing quality captions for more information.

Link to this section: Shortcut to copy the link: ctrl+C or ⌘C

E-mail a link to this section

Styling captions

Most caption-creation tools let authors style captions in a number of ways: e.g., adding color to the text or background, specifying different font faces and sizes. However, support in browsers and other media players for styling information is inconsistent and is at times unreliable. If the media is being produced for a specific player, style the captions to that player’s capabilities. Otherwise, rely instead on a player’s default presentation style (usually white characters on a black box).

Many media players give users the option to customize captions to meet their personal preferences. These preferences always override author styling. For some users, customizing captions is essential, not just an enhancement: styling captions in a specific manner (for example, yellow text over a black background, with a very large font size) may be the only way for them to make the text readable.

Below is an image showing bold text added to captions in a caption-authoring tool.

A caption editor showing white caption text on a black background. One
row of text is bold.

And here is the WebVTT markup for that caption…

Code snippet:

1
00:00:00.670 --> 00:00:06.680
The genome is a storybook that's been edited
<b>for a couple billion years.</b>

…and the TTML markup for the same caption:

Code snippet:

<p xml:id="s_1" begin="00:00:00.67" end="00:00:06.67">
The genome is a storybook that's been edited<br />
<span tts:fontWeight="bold">for a couple billion years.</span></p>

Link to this section: Shortcut to copy the link: ctrl+C or ⌘C

E-mail a link to this section

Automatically generated captions

Automatically generated captions should never be used as the sole method to produce captions, but they can be a part of the production workflow. See the discussion about automatic captions for more information.

Link to this section: Shortcut to copy the link: ctrl+C or ⌘C

E-mail a link to this section

Resources for writing quality captions

The importance of presenting users with high-quality, accurate captions cannot be overemphasized. Use the guidelines below to help create captions that are informative and easy to read.

Link to this section: Shortcut to copy the link: ctrl+C or ⌘C

E-mail a link to this section

These tutorials provide best-practice guidance on implementing accessibility in different situations. This page combines the following WCAG 2.0 success criteria and techniques from different conformance levels:

Success Criteria:

1.2.2 Captions (Prerecorded): Captions are provided for all prerecorded audio content in synchronized media, except when the media is a media alternative for text and is clearly labeled as such. (Level A)

Techniques:

Production workflow for pre-produced captions

Basic workflow for creating pre-produced captions:

Caption quality

Styling captions

Automatically generated captions

Resources for writing quality captions

Related WCAG 2.0 resources