Caption Formats

Status: This is an incomplete, unapproved draft. The current draft is at wai-media-guide.netlify.com/

Nearly all modern browsers and media players support the display of closed captions. However, they do not all support the same caption-file formats. The most commonly used formats used for online media are:

Standalone players typically support WebVTT and/or TTML. Streaming media services typically use TTML to convey captions to users.

WebVTT and TTML contain a full array of markup for styling, timing and placement options. SRT is a bare-bones format that displays unstyled text only, although some user agents may support basic styling commands (such as bold or italic text) if they are present in the caption file.

Web browsers support various caption formats, as shown in the table below.

Browser OS Supported caption format(s)
Firefox Windows, OS X, Android, iOS WebVTT
IE 10, 11; Edge Windows TTML, WebVTT
Safari OS X; iOS WebVTT
Chrome Windows, OS X, Chrome OS, Android, iOS WebVTT

SRT is not supported natively by any browser, but is supported by most other types of media players including those provided by popular video-hosting services, some social-media platforms and by custom players.

WebVTT, TTML and SRT are "sidecar" files, which is to say they are transmitted separately from their corresponding video files (riding alongside the video data in the delivery stream, rather than being embedded directly into the video file), and are synchronized and displayed by the user agent at the time of playback.

Distributing captions

Captions are distributed to viewers using HTML5's track element, which was created specifically for carrying text tracks, such as captions, subtitles and text-based audio descriptions. track is used as a child element of the video element:

Code snippet:
<video controls>
    <source src="myvideo.mp4" type="video/mp4" />
        <track kind="captions" src="myvideo_captions.vtt" srclang="en" label="Captions" default />
</video>

In the example above, the kind attribute is set to "captions" to identify what type of text track it is. The label attribute is set to "Caption," which is the visible text (or label) that the user agent will display to identify the track to the user. Learn more about attributes for the track element.

These tutorials provide best-practice guidance on implementing accessibility in different situations. This page combines the following WCAG 2.0 success criteria and techniques from different conformance levels:

Success Criteria:

  • 1.2.2 Captions (Prerecorded): Captions are provided for all prerecorded audio content in synchronized media, except when the media is a media alternative for text and is clearly labeled as such. (Level A)

Techniques: