This document compares two timed text formats, TTML and WebVTT, and describes how to map content between them.

This is an editor's draft.

Introduction

In today's media landscape, content is available to viewers in a variety different ways, such as traditional avenues including cinema and television, and modern internet-enabled alternatives. Content often needs to be transformed into different formats in order to be available across this breadth of delivery options.

When authoring media with captions, content creators can choose a timed text format from the set of available formats. While there are many different formats available to carry captions in media, support for different formats is fragmented, with different content delivery channels supporting different formats.

TTML and WebVTT are two popular formats for captions. The two formats have different histories, and as a result they differ in both supported features and approach.

This document focuses on how to translate captions data between the TTML and WebVTT formats. It is divided into three main sections. The first section provides an overview of TTML and WebVTT formats, and describes a high level strategy for performing a mapping between them. The second section provides a detailed discussion of how to map content from TTML to WebVTT. Finally, the third section provides a similar discussion of how to map content from WebVTT to TTML.

Overview

Before beginning any mapping between TTML and WebVTT, it is necessary to understand the basic constructs used in these formats for the conveyance of timed text information. This section provides an overview of the fundamentals of text carriage in both TTML and WebVTT, and highlights those features most relevant to a mapping discussion. For the complete and authoritative definitions of these formats, please refer to their respective specifications.

TTML

This section briefly describes some of the fundamental constructs in the TTML format.

General

Here are some of the foundational constructs in TTML:
  • Syntactic Structures
  • Language identification
  • Line Breaks
  • Character Encoding

Syntactic Structures

TTML defines the following XML elements as syntactic structure that are used to group text content:

  • <tt>

  • <body>

  • <div>

  • <p>

  • <span>

An XML document forms a tree through the nesting of elements. An element may have a subtree "beneath" itself. Any of the above listed elements may have text content in their subtree as direct children or descendants. Because of this relationship they may carry information that can be applied to all text content in their subtree. This information could be used for example for:

  • Positioning of text content in a rectangular area, such as the specification of left or right horizontal alignment for Latin text.

  • Styling of text content, such as the specification of text color, with all descendants in the sub-tree inheriting this information.

  • Timing of text content, such as the specification of the begin and end time attributes.

Because the <tt> element is the root of a TTML document the information specified on the <tt> element is significant to all text content in a TTML document. For example, the specification of the language using the xml:lang attribute of the <tt> element applies to all elements within it. This is also true for the body element because all text content has the body element as ancestor.

Ignoring recursive structures for a moment the nesting hierarchy of content elements is:

Example of the content structure in TTML

Amongst these elements only the <p> and <span> element may have text content as direct children. In XML text content is represented by text nodes. Three variations are possible:

  • text content nodes are children of a p element

Example for content in a p element

  • text content nodes are children of a <span> element

Structure for content in span

  • a mix of the above, where <p> and <span> elements have text content and <span> elements as children

Structure for mixed content

There are two content structures that can be nested to build up recursive structures: <div> and <span>. A <div> can have other <div> as children. <span> can also have other <span> as children.

Language identification

The TTML specification mandates that TTML docs will contain language information. @xml:lang is used to provide this information on the <tt> element. If the language is not known, the empty string can be set.

Example: TTML English Language Identification:

					<tt xml:lang="en" ...> ...</tt>

					

Example: TTML Unknown Language:

					<tt xml:lang="" ...> ...</tt>

					

The @xml:lang can also be specified on <body>, <div>, <p> and <span>.

Line Breaks

An explicit line break is introduced in TTML through the <br> element. By default the character codes in the TTML document that represent a line break, such as line feed, carriage return or a pair of both, are not interpreted as a line break for the presentation of the content. This changes only when the @xml:space attribute with the value "preserve" is applied to some content.

Automatic line breaking can occur during presentation due to limited space of the area where the content is rendered. This behavior could be switched on and off through the style attribute word-wrap. Its default value is "on".

Character Encoding

Although UTF-8 is recommended by the TTML spec, a TTML document may use any other character encoding permitted by XML.

Positioning

TTML Regions

In TTML, a region is defined simply as a rectangular area that text can be flowed into. In TTML documents, regions are defined in a <layout> element. TTML regions can have a variety of properties defined, including the following:

Property Definition
id, as an xml:id value An identifier that can be used by other TTML elements to reference the region.
origin, as a tts:origin value The x and y coordinates denoting the ( top, left ) corner of the region, with respect to the Root Container.
extent, as a tts:extent value The width and height of a region area.
writing mode, as a tts:writingMode value Defines the block and inline progression directions.
padding, as a tts:padding value Padding or inset space on all sides of the region area.
inline alignment, as a tts:textAlign value How inline areas are aligned in the line progression direction within a block.
block alignment, as a tts:displayAlign value How block areas are aligned in the block progression direction.

Notes:

  • TTML regions are not limited to the viewport dimensions. TTML regions can be larger than the viewport, or entirely outside of the viewport.
  • If no region is defined in a TTML document, a default region is implied. This default region has the dimensions of the Root Container Region.

Styling

Syntactic Specification of Styling Information

TTML defines for all types of styling information a style attribute. Different style attributes (e.g. @tts:color) can be specified on <style>. The <region>, <body>, <div>, <p> and <span> elements can reference these style sets by using the id that is defined for a <style>. A <style> element can reference other <style> elements to combine two or more style sets.

Style attributes can also be specified directly on <region>, <body>, <div>, <p> and <span> elements. This is called inline styling.

Application of Style Attributes

To calculate the style information for content and boxes, use the style resolution process defined by TTML. This merges a chain of referenced <style> elements with inline defined style attributes to produce a single style set. It also takes into account initial values and inheritance of values.

Although all style attributes can be specified on <region>, <body>, <div>, <p> and <span> elements, only a subset of these attributes can be applied to the presentational unit that each of these elements represent.

The table below shows which style attributes apply to which element:

Style Attribute <region> <body> <div> <p> <span>
backgroundColor x x x x x
color (i*) x
direction (i) x x
display x x x x x
displayAlign x
extent x
fontFamily (i) x
fontSize (i) x
fontStyle (i) x
fontWeight (i) x
lineHeight (i) x
opacity x
origin x
overflow x
padding x
showBackground x
textAlign (i) x
textDecoration (i) x
textOutline (i) x
unicodeBidi x x
visibility (i) x x x x x
wrapOption (i) x
writingMode (i) x
zIndex (i) x

* If a style attribute is marked with "(i)" it is inheritable.

At first glance, specifying a style attribute on an element where it does not apply doesn't make much sense. But through the concept of inheritance, the values of style attributes could be inherited down the syntax tree to the element where they do apply. It is not a syntax error to specify a style attribute when it has no effect. The style attribute is simply ignored in such cases.

Timing

TTML provides support for a broad set of timing expressions. For some content, the process of mapping to WebVTT will require conversion of timing information. To perform these types of conversions, it is necessary to understand the different ways of expressing time in TTML, and the additional data required to convert between timing expressions will help inform any mapping strategy.

Timing Parameters

TTML defines several types in the ttp parameter namespace that carry timing information. Some of the TTML timing parameters may be required to accurately convert TTML time expressions as part of a mapping process. The necessary parameters can be provided in the TTML file using the defined types.

This section describes the relevant TTML timing parameters and provides a brief discussion of each type.

Parameter: Time Base

TTML Parameter Possible Values
ttp:timeBase media, smpte, clock

Time Base parameter defines the temporal coordinate system to use in interpreting time expressions.

Notes:

  • When the Time Base value is “smpte”, the Drop Mode parameter may be required to interpret time expressions. In the case of a non-integer frame rate, the Drop Mode parameter must be specified. Otherwise, the default value of nonDrop can be used.
  • When the Time Base value is “clock”, Clock Mode parameter is required to interpret time expressions.

Parameter: Drop Mode

TTML Parameter Possible Values
ttp:dropMode dropNTSC, dropPAL, nonDrop

When the Time Base is “smpte”, Drop Mode parameter defines the drop mode to use in interpreting time expressions.

Note:

  • The TTML specification provides detailed instructions for interpreting time expressions when Drop Mode is set to “dropNTSC” or “dropPAL”. These must be applied when converting time expressions to other formats.

Parameter: Clock Mode

TTML Parameter Possible Values
ttp:clockMode local, gps, utc

When the Time Base is “clock”, Clock Mode parameter defines the clock mode to use in interpreting time expressions.

Notes:

  • The TTML specification provides detailed instructions for converting between “gps” and “utc” values. When the Clock Mode is equal to “gps”, these conversions should be applied to time expressions.

Parameter: Frame Rate

TTML Parameter Possible Values
ttp:frameRate non-zero, positive integer

When the frame rate associated with a TTML document is integral, Frame Rate represents the frame rate to use in interpreting time expressions.

Parameter: Frame Rate Multiplier

TTML Parameter Possible Values
ttp:frameRateMultiplier two non-zero, positive integers: numerator and denominator

When the frame rate associated with a TTML document is not integral, Frame Rate Multiplier provides a numerator and denominator to multiply the frame rate value by in order to calculate the effective frame rate for use in interpreting time expressions.

Parameter: Sub Frame Rate

TTML Parameter Possible Values
ttp:subFrameRate non-zero, positive integer

When the frame rate associated with a TTML document is integral, Sub Frame Rate provides a sub frame rate for dividing frames..

Note:

  • Defaults to a value of 1 when not specified.

Parameter: Tick Rate

TTML Parameter Possible Values
ttp:tickRate non-zero, positive integer

Tick Rate provides a tick rate to use in interpreting time expression in a TTML document.

WebVTT

General

Here are some of the foundational constructs in WebVTT:
  • Syntactic Structures
  • Language identification
  • Line Breaks
  • Character Encoding

Syntactic Structures

The main syntactic structure of WebVTT is the WebVTT cue. All content that is defined for presentation belongs to exactly one WebVTT cue. Content inside a WebVTT cue can be further grouped by using specific spans to apply information that is significant for the rendering of the enclosed content. As with TTML, a WebVTT cue can be translated into a tree structure. Ignoring the specific names of the spans the tree structure could look like the following:

  • cue->span->"text content"

  • cue-> "text content"

  • cue-> mix of span and "text content"

Like TTML, WebVTT has a concept of regions. WebVTT regions are optional, and when no region is specified, the entire video frame is used as a defacto region. WebVTT cues can be affiliated to a region. Cues, with their text content, can signal this affiliation by specifying the identifier of the WebVTT region. As in TTML, this is a one-to-many relationship: a region may have zero to many cue affiliations, but a cue can only have one region affiliation.

If we assume a region with the id "foo", the hierarchical structure would look like the following:

cue(region-id ="foo")->content

When processing WebVTT, the region can be de-referenced and the intermediate tree could look like:

region(id="foo")->cue->content

Language Identification

WebVTT allows the specification of the text through the use of a language span.

Example: WebVTT English Language Specification:

					<lang en>...</lang>
					

Line Breaks

In WebVTT an explicit line break is introduced through the Unicode code values for carriage return, line feed or a pair of both. Automatic line breaks can occur during presentation due to limited space in the area where the content is rendered. There is no syntactic construct to influence this behavior.

Character Encoding

Text content in a WebVTT file is always encoded in UTF-8.

Positioning

WebVTT Regions

In WebVTT, a region is also defined as an area that text can be flowed into. However, WebVTT uses different properties to specify a region than TTML. Specifically, WebVTT defines the following properties for a region:

Property Definition
identifier An arbitrary string that can be used in cues to reference the region.
width The width of the region carried as a percentage of the video width. Defaults to 100.
lines value The number of lines of text within the region. Defaults to 3.
region anchor point The x and y coordinates, as percentages of the region area, for the point within the region that is anchored to the viewport. Defaults to ( 0,100 ), or the ( bottom, left ) corner of the region.
region viewport anchor point The x and y coordinates, as percentages of the viewport, for the point within the viewport to which the region anchor point is affixed.
scroll value The scroll value can have one of two values: None or Up. If it is set to None, then text remains on the line it was originally drawn upon. If it is set to Up, then new cues are added to the bottom of the region, and push up any text that is already drawn in the region until the new cue is fully displayed.

Notes:

  • WebVTT regions only support horizontal writing modes. Text using vertical writing modes cannot be flowed into a WebVTT region.
  • WebVTT region areas are limited to the viewport area.

Sizing and Collision Control

In WebVTT, cues can be drawn directly into the video viewport, without the use of regions. When this mode is employed, WebVTT defines automatic behavior for renderers to adjust cue positions in order to avoid any overlap. When WebVTT does use regions, overlap can occur if regions are defined to overlap and contain text at the same time.

When WebVTT cues are drawn directly into the video viewport, and no regions are used, properties on cues are used for specifying position and size.

Positioning WebVTT Cues

WebVTT cues can be positioned in one of two ways:

  • by using positioning syntax on the cue itself
  • by placing the cue in a region

When authoring WebVTT without regions, the position of a cue is determined by its "line" and "position" cue settings. The interpretation of these cue settings will be affected by the value of the "vertical" cue setting, the writing direction, and possibly the "size" cue setting.

  • The "line" cue setting may either be a percentage or a line number, counting from either the bottom or the top. When calculating the location as a line number, it is necessary to determine the line height. Line height can be determined from font metrics for some content. In the absence of other ways of determining line height, a default value of 5.33vh, where vh is a CSS unit equal to 1% of the viewport height, can be used.
  • The "position" is configured in the direction orthogonal to the "line" direction. For example, for horizontal cues, position is configured in the horizontal direction.
  • The optional alignment value determines whether the position is calculated relative to the start, middle, or end of the cue box. If the alignment value is not "start", it will depend upon the size of the cue box, i.e., an "end" value will be equivalent to a "start" value plus the size of the cue box in the appropriate direction. When not specified, alignment defaults to "middle".

Sizing WebVTT Cues

The "size" cue setting controls one dimension of the block and is a percentage of the video viewport. For horizontal cues, the size cue setting will be the width of the cue box. For vertical cues, the size cue setting will be the height. The other dimension of the block is determined by the content. The cue box will expand as needed to accommodate the cue text. For horizontal cues, the cue expands down. For vertical cues, the cue expands either left or right, depending on the value of the "vertical" cue setting.

Styling

Style information can be applied through span tags (e.g. the tag "b" for bold) or through reference to CSS style information. CSS style information that is defined outside of the WebVTT document can be matched by id-strings defined for cue boxes, regions or span tags.

Defined span tags for styling are:

  • "b" for bold
  • "i" for italic
  • "u" for underline

CSS properties that apply:

  • color
  • opacity
  • visibility
  • text-decoration
  • text-shadow
  • background-color
  • background-image
  • background-repeat
  • background-attachment
  • background-position
  • outline-color
  • outline-style
  • outline-width
  • font-style
  • font-variant
  • font-weight
  • font-size
  • line-height
  • font-family

Timing

Syntactic Structures

In WebVTT, timing information is applied to cues and spans.

Timing Expressions

In contrast to TTML's support for a large set of timing expressions, WebVTT supports only a single timing expression: hours: minutes: seconds.fractional-seconds. In WebVTT, the hours portion of the timing expression is optional.

Notable Differences between TTML and WebVTT

While the two formats are both used to carry captions information, there are some important differences between them that should be noted when mapping from one to the other.

General Differences

Syntactic Structures

<div> Structures

WebVTT does not have a component that corresponds to <div>.

Language Identification

TTML allows language identification in different positions in the content hierarchy (e.g. on <tt>, <p> and <span>). WebVTT only permits the specification of a language on the "inline" level.

Positioning Differences

Mapping positioning information between the TTML and WebVTT formats may be the most difficult part of any conversion process, due to some fundamental differences in the ways the two formats express spatial information. This section discusses of the differences in the spatial controls offered in TTML and WebVTT.

When mapping spatial information between the two formats, it is important to be aware of these differences in their spatial models, and then apply this awareness when making mapping decisions. The two formats differ in positioning support in four main ways:

  • Syntactic Structures
  • Supported Spatial Units
  • Region Definition
  • Supported Writing Modes
  • Sizing and Collision Control

This section will examine these differences in detail.

Syntactic Structures

TTML provides support for hierarchical elements, and spatial information, including associations with region, can be applied to elements at different levels of the hierarchy. In WebVTT, spatial information and region associations are provided at the cue level.

In TTML, the following elements may be reference regions:

  • <body>
  • <div>
  • <p>
  • <span>

When converting from TTML to WebVTT, all of the spatial information and region references in the hierarchy must be preserved by applying them to elements within, as each hierarchical item is flattened.

Supported Spatial Units

The TTML and WebVTT specifications use different units to express spatial coordinates or distances. The following table compares support for several units between the two formats:

Spatial Units TTML Supports? WebVTT Supports?
pixel Yes No
em Yes No
cell Yes No
percent Yes Yes
line number No Yes

Region Definitions

While both TTML and WebVTT define a construct known as a region, the definition of a region differs significantly from one format to another.

In addition, in WebVTT only a cue can reference a region. In TTML several structures can be associated with a region (e.g. <body>, <div> and <p>).

Supported Writing Modes

Both TTML and WebVTT define properties that denote the block and inline progression directions. TTML uses the tts:writingMode attribute to convey this information. WebVTT uses a vertical text cue setting to define the writing direction. In the case where the WebVTT writing direction is defined as vertical, an additional cue setting denotes whether the block progresses from left to right or right to left. Below is a table showing how to express various inline and block progression directions in both TTML and WebVTT.

Inline Progression Direction Block Progression Direction TTML WebVTT
Left->Right Top->Bottom lrtb auto or horizontal
Right->Left Top->Bottom rltb auto or horizontal
Top->Bottom Right->Left tbrl vertical:rl
Top->Bottom Left->Right tblr vertical:lr
Left->Right lr auto or horizontal
Right->Left rl auto or horizontal
Top->Bottom tb vertical:lr

Sizing and Collision Control

The two formats differ in the amount of control available over spatial placement of timed text. In general, the TTML format provides a greater degree of control of spatial positioning, while the WebVTT format provides some control, and combines it with some automatic behaviors. The implementation of specified automatic behaviors may vary from one renderer to the next.

In TTML, <p> elements are flowed into regions. If two regions are defined to overlap spatially, and both display text at the same time, the text may overlap.

In WebVTT, cues can be drawn directly into the video viewport, without the use of regions. When this mode is employed, WebVTT defines automatic behavior for renderers to adjust cue positions in order to avoid any overlap. When WebVTT does use regions, overlap can occur if regions are defined to overlap and contain text at the same time.

When WebVTT cues are drawn directly into the video viewport, and no regions are used, properties on cues are used for specifying position and size.

Styling Differences

Unsupported Style Features

Some of the style features in TTML and WebVTT are not supported by the other format. The following TTML style attributes have no corresponding CSS property:

  • opacity
  • overflow
  • padding
  • showBackground
  • wrapOption
  • zIndex

The following CSS properties allowed by WebVTT have no corresponding TTML @style attributes:

  • background-attachment
  • background-image
  • background-position
  • background-repeat
  • font-variant
  • text-shadow

Although there may be strategies for mapping these unsupported style features, an evaluation of these strategies is out of scope of this document.

Style Information External to the Document

With TTML, all style information is present in the document itself. In contrast, for WebVTT, all CSS selectors and properties are defined in a context external to the WebVTT document. One common case is the specification of the CSS styles in an HTML context where the WebVTT documents are embedded.

Inline Styling

In contrast to TTML, WebVTT does not allow inline styling. Inline styling is the direct specification of a style attribute on a syntax structure that "wraps" the text content (e.g. a <p> and <span> in TTML or a class span tag in WebVTT).

Styling of Multiple <div> Elements

Since WebVTT defines no structure that corresponds to the TTML <div> element, any style information on <div> cannot be cannot be mapped directly to a WebVTT document.

Styling of <region> Elements

In TTML each defined <region> can hold style information. Although regions exist in WebVTT, CSS properties can only be defined for all WebVTT regions in a file, and cannot be tied to a specific region individually.

Style Inheritance Between Style Definitions

In TTML, <style> elements can reference other <style> elements to merge the style sets. In WebVTT and CSS, it is not possible to establish a similiar relationship between ::cue pseudo elements.

Multiple Styles on <body>, <div> and <p>

In TTML, multiple <style> elements can be referenced by <<body>>, <div> and <p> elements. In WebVTT, only one style set can be applied to the complete document or to a cue.

Value Scope of Some Style Features

Some of the style features have a slightly different value scope. These differences are described in greater detail in the following sections of this document.

Timing Differences

The two formats provide different models for expressing timing information, and support different timing capabilities. For the most part, the timing support in TTML is a superset of the timing support in WebVTT, with the exception of some inter-cue timing constructs in WebVTT that do not exist in the same form in TTML. This section discusses the differences between timing support in the two formats.

TTML and WebVTT differ in timing support in three ways:

  • Syntactic Structure
  • Ordering
  • Timing Expressions

This section will examine the different functionality offered by the two formats in detail.

Syntactic Structure

TTML provides support for hierarchical elements, and timing information can be applied at most levels of the hierarchy. In contrast, WebVTT has a flat structure, with no ability to nest captions cues within other elements. In addition, WebVTT defines some intra-cue timing concepts which are not present in TTML.

In TTML, the following elements may contain timing information:

  • <body>
  • <region>
  • <div>
  • <p>
  • <span>

When converting from TTML to WebVTT, all of the timing information in this hierarchy must be preserved by applying it to elements within, as each hierarchical item is flattened. In addition, whether a parent element is parallel or sequential must be taken into account when adjusting the timing for child elements during the flattening process.

While WebVTT does not include the support for hierarchical elements found in TTML, it instead introduces some additional timing concepts for intra-cue timing. These can be employed when expressing display modes for text such as roll-up and the past and future pseudo-classes.

Ordering

WebVTT requires that cues be represented in sequential order, with the earliest cue preceding later cues. TTML does not have this requirement. When converting from TTML to WebVTT, TTML <p> elements must be sorted into sequential order.

Timing Expressions

Simply put, timing expressions are the ways in which timing information may be specified in a timed text document. TTML supports a greater set of timing expressions than WebVTT. The following table shows the set of timing expressions available and the support for each expression in TTML and WebVTT.

Support for Timing Expressions in TTML and WebVTT

Timing Expression TTML Supports? WebVTT Supports?
hours: minutes: seconds Yes No
hours: minutes: seconds.fractional-seconds Yes Yes
hours: minutes: seconds: frames Yes No
hours: minutes: seconds: frames.sub-frames Yes No
hours.fractional-hours Yes No
minutes.fractions-minutes Yes No
seconds.fractional-seconds Yes No
milliseconds.fractional-milliseconds Yes No
frames.fractional-frames Yes No
ticks.fractional-ticks Yes No

TTML to WebVTT

When transforming captions from TTML to WebVTT, it is necessary to take into account the differences in the feature sets of the two formats, and to develop some strategies for handling them. The TTML format provides a broader set of options than WebVTT for authoring captions. In addition, the TTML format allows for more complex hierarchical relationships between elements than can be achieved in WebVTT.

Based on these differences, the following strategy for mapping emerges:

Alternatively, the pre-processing step can be omitted, and the new profile can be used as a guideline when mapping directly from a general TTML document to a WebVTT document.

The TTML To WebVTT Profile

The TTML To WebVTT ( TVTT ) mapping profile constrains a TTML document structure to make the mapping between TTML and WebVTT simple and transparent. Many TTML documents will not conform to this profile.

Feature Provisions
Relative to the TT Feature namespace

#animation

SHALL NOT be used.

#backgroundColor-block

MAY be used.

  • The hex notation of a color with alpha channel ("#"rrggbbaa) SHALL not be used.

#backgroundColor-inline

MAY be used.

  • The hex notation of a color with alpha channel ("#"rrggbbaa) SHALL not be used.

#backgroundColor-region

SHALL NOT be used.

#backgroundColor

MAY be used.

  • The hex notation of a color with alpha channel ("#"rrggbbaa) SHALL not be used.

#bidi

MAY be used.

#cellResolution

MAY be used.

  • The value of the attribute ttp:cellResolution SHALL be set to "1 1".

#clockMode-gps

SHALL NOT be used.

#clockMode-local

SHALL NOT be used.

#clockMode-utc

SHALL NOT be used.

#clockMode

SHALL NOT be used.

#color

MAY be used.

  • The hex notation of a color with alpha channel ("#"rrggbbaa) SHALL not be used.
  • The value "transparent" SHALL NOT be used.

#content

MAY be used.

  • A document SHALL NOT have more than one tt:div element.
  • Every tt:p element SHALL have one region attribute.

#core

MAY be used.

  • The xml:lang SHALL NOT appear on any other element than tt:tt and tt:span.

#direction

MAY be used.

#display-block

SHALL NOT be used.

#display-inline

SHALL NOT be used.

#display-region

SHALL NOT be used.

#display

SHALL NOT be used.

#displayAlign

MAY be used.

#dropMode-dropNTSC

SHALL NOT be used.

#dropMode-dropPAL

SHALL NOT be used.

#dropMode-nonDrop

SHALL NOT be used.

#dropMode

SHALL NOT be used.

#extent-region

MAY be used.

  • tts:extent attribute when applied to a region element SHALL use "percentage" representation, and SHALL NOT use em or px units.
  • tts:extent attribute SHALL be present on all region elements.

#extent-root

SHALL NOT be used.

#extent

MAY be used.

#fontFamily-generic

MAY be used.

  • The following generic font family names SHALL NOT be used: "default", "sansSerif", "serif", "monospaceSansSerif", "monoSpaceSerif", "proportionalSansSerif", "proportionalSerif".

#fontFamily-non-generic

MAY be used.

#fontFamily

MAY be used.

#fontSize-anamorphic

SHALL NOT be used.

#fontSize-isomorphic

MAY be used.

#fontSize

MAY be used.

#fontStyle-italic

MAY be used.

#fontStyle-oblique

MAY be used.

#fontStyle

MAY be used.

#fontWeight-bold

MAY be used.

#fontWeight

MAY be used.

#frameRate

SHALL NOT be used.

#frameRateMultiplier

SHALL NOT be used.

#layout

MAY be used.

  • No two presented regions in a given intermediate synchronic document SHALL overlap, i.e. the intersection of the sets of coordinates within each region (including its boundary) is empty.
  • All regions SHALL NOT extend beyond the root container, i.e. the intersection of the sets of coordinates belonging to a region (including its boundary) and the sets of coordinates belonging to the root container (including its boundary) is the set of coordinates belonging to the region (including its boundary).
  • The number of presented regions in a given intermediate synchronic document SHALL be smaller than or equal to 4.

#length-cell

SHALL NOT be used.

#length-em

MAY be used.

#length-integer

MAY be used.

#length-negative

SHALL NOT be used.

#length-percentage

MAY be used.

#length-pixel

MAY be used.

#length-positive

MAY be used.

#length-real

MAY be used.

#length

MAY be used.

#lineBreak-uax14

MAY be used.

#lineHeight

MAY be used.

#markerMode-continuous

SHALL NOT be used.

#markerMode-discontinuous

SHALL NOT be used.

#markerMode

SHALL NOT be used.

#metadata

MAY be used.

#nested-div

SHALL NOT be used.

#nested-span

MAY be used.

#opacity

MAY be used.

#origin

MAY be used.

  • The tts:origin attribute SHALL use "percentage" representation, and SHALL NOT use em or px units.
  • The tts:origin attribute SHALL be specified on the region element.

#overflow-visible

SHALL NOT be used.

#overflow

SHALL NOT be used.

#padding-1

SHALL NOT be used.

#padding-2

SHALL NOT be used.

#padding-3

SHALL NOT be used.

#padding-4

SHALL NOT be used.

#padding

SHALL NOT be used.

#pixelAspectRatio

SHALL NOT be used.

#presentation

MAY be used.

#profile

MAY be used.

#showBackground

SHALL NOT be used.

#structure

MAY be used.

#styling-chained

SHALL NOT be used.

#styling-inheritance-content

MAY be used.

#styling-inheritance-region

SHALL NOT be used.

#styling-inline

MAY be used.

  • Inline Styling SHALL NOT be used unless explicitly required or allowed.
  • The following style attributes SHALL be specified inline as follows: on tt:p element and on tt:region the attributes tts:extent, tts:origin, tts:displayAlign and tts:writingMode.

#styling-nested

SHALL NOT be used.

#styling-referential

MAY be used.

  • A tt:body element SHALL NOT reference more than one style.
  • A tt:div element SHALL NOT reference any style.
  • A tt:region element SHALL NOT reference any style.

#styling

MAY be used.

#subFrameRate

SHALL NOT be used.

#textAlign-absolute

MAY be used.

#textAlign-relative

MAY be used.

#textAlign

MAY be used.

  • The tts:textAlign attribute SHALL be present on every tt:p element.
  • The tts:textAlign attribute SHALL NOT be present on any other element than the tt:p element.

#textDecoration-over

MAY be used.

#textDecoration-through

MAY be used.

#textDecoration-under

MAY be used.

#textDecoration

MAY be used.

#textOutline-blurred

MAY be used.

#textOutline-unblurred

MAY be used.

#textOutline

MAY be used.

#tickRate

SHALL NOT be used.

#time-clock-with-frames

SHALL NOT be used.

#time-clock

MAY be used.

#time-offset-with-frames

SHALL NOT be used.

#time-offset-with-ticks

SHALL NOT be used.

#time-offset

SHALL NOT be used.

#timeBase-clock

SHALL NOT be used.

#timeBase-media

MAY be used.

#timeBase-smpte

SHALL NOT be used.

#timeContainer

MAY be used.

#timing

MAY be used.

  • begin and end attribute SHALL NOT be specified on tt:div and tt:body.
  • Sequential timing SHALL NOT be used.
  • If a begin attribute is specified an end attribute SHALL be specified on the same element
  • If an end attribute is specified a begin attribute SHALL be specified on the same element
  • The dur attribute SHALL NOT be used.
  • The time expression on the begin and end attribute of tt:p SHALL have the format HH:MM:SS.MS where MS SHALL have exact three digits.

#transformation

MAY be used.

#unicodeBidi

MAY be used.

#visibility-block

MAY be used.

#visibility-inline

MAY be used.

#visibility-region

SHALL NOT be used.

#visibility

MAY be used.

#wrapOption

SHALL NOT be used.

#writingMode-horizontal-lr

MAY be used.

#writingMode-horizontal-rl

MAY be used.

#writingMode-horizontal

MAY be used.

#writingMode-vertical

MAY be used.

#writingMode

MAY be used.

  • The attribute tts:writingMode SHALL only be specified inline on a tt:region element.

#zIndex

SHALL NOT be used.

Pre-processing: Converting TTML to TVTT

To ease conversion, a source TTML document can be transformed to a TTML document that conforms to the TVTT (TTML To WebVTT Document Profile). Below are some steps for how to pre-process a source TTML document so that it is valid against the TVTT.

General Pre-processing

WebVTT does not define hierarchical elements such as the <body> or <div> elements found in TTML. Similarly, the TVTT profile constrains the use of hierarchical elements in documents that conform to it. As a result, when converting TTML documents to TVTT documents, all the information provided in the hierarchical elements must be applied to either the captions within those elements. Through this process, individual captions may have their timing, styling, layout and other information adjusted to take into account values inherited from sections that contain them. Within this document, the term “flattening” is used to designate this process.

Flattening

In order to flatten a TTML document, work through any hierarchy of <body> and <div> elements in a TTML document starting from the <body> element and apply values from that section to each of the child elements within it. This process can be applied iteratively to create a set of timed text elements with adjusted values that reflect the values inherited from <body> or <div> elements. For simple TTML documents without much hierarchy, this step may not be necessary.

Note that once this process has been completed, some information from the original TTML file has been lost, such as the grouping of timed text elements. In addition, some information that was expressed succinctly in TTML is now repeated.

Merge Multiple <div>

TTML documents that contain more than one <div> should be mapped to a document with just one <div>. All <p> have to be copied to the outermost <div> element, in document order. All other <div> should be pruned.

Example: Merging Multiple <div> elements

Before:

After:

Add @xml:id to <p>

If a <p> element does not yet have an @xml:id, one should be added with a value as identifier.

Example: Add @xml:id to <p>

Before:

<p ...>...</p>
<p ...>...</p>
<p ...>...</p>
			

After:

<p xml:id="p1">...</p>
<p xml:id="p2">...</p>
<p xml:id="p3">...</p>
			

Push Down xml:lang to <p>

For every <p> the value of @xml:lang needs to be resolved, taking into account the value of the @xml:lang of its ancestors. If the value is not the empty string, then a <span> child should be added to <p>. This <span> should enclose the complete content of the <p>. @xml:lang of that <span> shall be set to the resolved language.

Example: Push Down xml:lang to <p>

Before:

<tt xml:lang="en">
<div>
	<p>
		<span>...</span>
	</p>
	<p xml:lang="de">
		<span>....</span><span xml:lang="fr">....</span>
	</p>        
</div>
</tt>
					

After:

<tt xml:lang="">
<div>
	<p>
		<span xml:lang="en"><span>...</span></span>
	</p>
	<p>
		<span xml:lang="de">....<span xml:lang="fr">....</span><span>
	</p>        
</div>
</tt>
					

Add <region> for Default Region

TTML defines a "default region" that applies if no <region> could be resolved. This default region should be explicitly defined in an T2PV document as follows:

<region xml:id="defaultRegion" tts:extent="100% 100%" tts:origin="0% 0%" />  
					

Region Resolution for <p>

For every <p>, a @region should be specified. If no @region is specified on <p>, than @region should be set to the id of the <region> that is referenced on the nearest ancestor of the <p>. If @region is not specified on <p> or on its ancestors, the @region of <p> should be set to the id of the default region.

Example: Region Resolution for<p>

Before:

<div region="r1">
	<p  ...>...</p>
	<p region="2" ...>...</p>
	<p ...>...</p>
</div>
					

After:

<div>
	<p region="1" ...> ...</p>
	<p region="2" ...>...</p>
	<p region="1" ...> ...</p>
</div>
					

Note: TTML allows the @region attribute to be set on elements. The discussion of how to map a TTML document containing such elements is out of the scope of this document.

Translate @xml:space Preserve

If the resolved @xml:space value of <p> or <span> is set to preserve, then all characters for linefeed in a <p> and <span> should be replaced by a <br> and all spaces should be replaced by the entity for non-breaking-space (&nbsp;).

Example: Translate @xml:space Preserve

Before:

<p xml:space="preserve" ...>
   Good morning!
   - Good morning!
</p>
					

After:

<p ...><br/>Good&nbsp;morning!<br/>&nbsp;&nbsp;&nbsp;-&nbsp;&nbsp;morning!<br/></p>
					

Whitespace Normalization

All text content of <p> and <span> elements should be whitespace normalized, with leading and trailing whitespace characters deleted, and any whitespace character replaced by a space character.

Example: Whitespace Normalization

Before:

<p xml:space="default" ...>
   Good evening!<br/>
   - Good evening!
</p>
					

After:

<p xml:space="default" ...>Good evening!<br/>- Good evening!</p>
					

Pre-processing: Positioning

Apply Hierarchical Spatial Information to <p> Elements

In order to conform to the TVTT profile, all references to regions must be applied to <p>. If a <p> element is nested in other elements, any region references that exist on its parent elements should be moved to the <p> element.

Remove Nested <style> Elements from <region> Elements

In order to conform to the TVTT profile, all nested <style> elements in <region> element definitions must be removed, with the attributes applied directly to the <region> itself. The sections below describe the details of this process.

Pre-processing: Styling

Convert Inline Styling

Inline styles shall not be used in the TVTT apart from @tts:textAlign specified on <p> elements and @tts:extent, @tts:origin, @tts:displayAlign and @tts:writingMode on <region> elements. All other style attributes specified inline on <body>, <div>, <p> or <span> must be mapped to a <style> element that is then referenced by this content element.

Example: Conversion of Inline Styling

Before:

<tt ...>
	...
	<body>
		<div>
			<p xml:id="p1" tts:fontFamily="monospace" tts:color="white" >
				<span tts:backgroundColor="black">Whose house?</span><br/>
				<span tts:color="lime" tts:backgroundColor="black">- My master´s</span>
			</p>
		</div>
	</body>
</tt>
					

After:

<tt ...>
<head>
  <styling>
	  <style xml:id="p1_style" tts:color="white" tts:fontFamily="monospace" />    
	  <style xml:id="background_black" tts:backgroundColor="black"/>
	  <style xml:id="color_lime" tts:color="lime"/>
  </styling>
</head>
<body>
	<div>
		<p xml:id="p1" style="p1_style">
			<span style="background_black">Whose house?</span><br/>
			<span style="color_lime background_black">- My master´s</span>
		</p>
	</div>
</body>
</tt>
					

<style> Elements that Reference Other <style> Elements

If a <style> element references another <style> element, the style values that result from this reference, or from a continuing chain of style references, have to be resolved and merged into the set of style attributes of the referencing <style> element. If the same style attribute is defined in both a referenced <style> element and the referencing <style> element, the value of the attribute in the referencing <style> element is used.

Example: Conversion of Referenced Style Elements

Before:

      <styling>
	  <style xml:id="s3" tts:color="blue" tts:backgroundColor="white" tts:fontFamily="monospace" />    
	  <style xml:id="s2" tts:color="white" tts:backgroundColor="black" style="s3"/>
	  <style xml:id="s1" tts:color="lime" style="s2"/>
      </styling>
					

After:

      <styling>
	 <style xml:id="s3" tts:color="blue" tts:backgroundColor="white" tts:fontFamily="monospace" />    
	 <style xml:id="s2" tts:color="white" tts:backgroundColor="black" tts:fontFamily="monospace" />
	 <style xml:id="s1" tts:color="lime" tts:backgroundColor="black"  tts:fontFamily="monospace"/>
      </styling>
					

Multiple Styles that Apply to <body>

If more than one <style> is referenced by <body>, a new <style> needs to be created where all style attributes of the referenced styles are merged.

If no style is referenced by <body>, an empty <style> is created and referenced by the <body>.

Example: Conversion of Multiple Styles

Before:


  <head>
	  <styling>
		  <style xml:id="fontStyles" tts:fontFamily="monospace" tts:fontSize="200%" tts:lineHeight="normal"/>    
		  <style xml:id="colorStyles" tts:color="white" tts:backgroundColor="black"/>
	  </styling>
  </head>
  <body style="colorStyles fontStyles">
		
  </body>

					

After:


  <head>
	  <styling>
		  <style xml:id="defaultStyle" tts:color="white" tts:fontFamily="monospace" tts:fontSize="200%"/>
		  <style xml:id="fontStyles" tts:fontFamily="monospace" tts:fontSize="200%" />    
		  <style xml:id="colorStyles" tts:color="white" />
	  </styling>
	  
  </head>
  <body style="defaultStyle">
		
  </body>
 
					

Styles Applied to <div> Elements

The resolved style set of the first <div> in the document is merged into the <style> element referenced by the <body>. If a style attribute is already set in the <style> referenced by <body>, the value is overwritten by the value of the style value applied to the <div>.

Example: Conversion of Styles Applied to<div> Elements

Before:

   <head>
	<styling>
		<style xml:id="defaultStyle" tts:color="white" tts:fontFamily="monospace" tts:fontSize="200%"/>
		<style xml:id="newFont" tts:fontFamily="Verdana" tts:fontSize="160%"/>
	</styling>
   </head>
   <body style="defaultStyle">
		<div style="newFont"> .... </div>
   </body>
					

After:

    <head>
	<styling>
		<style xml:id="defaultStyle" tts:color="white" tts:fontFamily="Verdana" tts:fontSize="160%"/>
		<style xml:id="newFont" tts:fontFamily="Verdana" tts:fontSize="160%"/>
	</styling>
    </head>
   <body style="defaultStyle">
		<div> .... </div>
   </body>
					

Styles that are Specified for <region> Elements

In a TVTT document, only the style attributes @tts:extent, @tts:origin, @tts:displayAlign and @tts:writingMode shall be specified on a <region>. The <region> shall not contain any style references nor <style> elements as children. If a source TTML document does not comply with this constraint, then all style references have to be resolved and merged, taking into account the style values of the <style> children of the <region>. A new <style> for the resolved set of style values is created. Every <p> that references that <region> should reference this <style>.

Pruning of Unreferenced Styles

All <style> elements that are not referenced should be pruned from the document.

Translation of Incompatible Values

As mentioned above, there are a few incompatibilities between the set of styling attributes available in TTML and in WebVTT. These are expressed in the TVTT profile. Below are some recommendations how to handle these when translating a general TTML document into the TVTT profile.

  • tts:color

    • the color name "magenta" shall be mapped to "fuchsia"
    • the color name "cyan" shall be mapped to "aqua"
    • the color name "transparent" shall be mapped to the value "rgba(255, 255, 255, 0.0)"
    • a hex notated color value with alpha channel shall be mapped to an rgba notated value
  • tts:background-color

    • the color name "magenta" shall be mapped to "fuchsia"
    • the color name "cyan" shall be mapped to "aqua"
    • a hex notated color value with alpha channel shall be mapped to an rgba notated value.
  • font-family names

    • the generic font family names monospaceSansSerif, monospaceSerif shall be mapped to monospace
    • the generic font family names sansSerif and serif shall be mapped to sans-serif
  • length metric "c":

    • the length metric "c" shall be converted to percentage
  • font-size:

    • font-size with two values shall be converted to one value that corresponds to the height of the font.

Cell Resolution

The cell resolution should be set to "1 1". If the tts:fontSize is not specified on the region element and no fontSize was specified on the parent element then the percentage value of the attribute tts:fontSize is relative to the computed size of 1c. By extending 1c over the height of the video viewport the percentage values in fontSize are relative to the height of the video viewport and therefore map directly to CSS he font-size attribute in WebVTT that uses the 'vh' metric.

Initial Values

As default values for corresponding style structures (e.g. font-size) may differ between TTML and WebVTT, the document structure should be pre-processed to apply explicitly defined values, rather than relying on default values.

Pre-processing: Timing

Converting Timing Expressions

Because TTML supports many more timing expressions than those included in the TVTT profile, it may be necessary to perform a pre-processing step to convert a TTML document into a document that conforms to the TVTT profile. In many cases, this type of processing will require timing parameters to convert timing expressions into the format supported by TVTT.

As a first step in mapping timing information from TTML to TVTT, convert all times in the TTML document into the time format supported by TVTT: hours: minutes: seconds.fractional-seconds, and limit the fractional-seconds to three decimal places. This conversion simplifies the mapping to WebVTT, as it results in all timing information expressed in the units supported by WebVTT. This section steps through the supported TTML timing expressions and describes how to convert from each of them into to the format that is included in the TVTT profile.

Time Expression: hours:minutes:seconds

TTML Time Expression TVTT Time Expression Relevant Parameters
hours: minutes: seconds hours: minutes: seconds.fractional-seconds Time Base, Clock Mode

For example:

TTML Time Expression TVTT Time Expression
00:00:40 00:00:40.000 or 00:40.000

Notes:

  • If the TTML document contains a Time Base parameter equal to “clock”, and Clock Mode parameter is present and has a value of “gps”, the TTML time must be converted to “utc” time, per the instructions in the TTML specification.
  • If the TTML document contains a Time base parameter equal to "local", external information may be required to convert wall clock time into playback time.
  • In WebVTT, the hours unit is optional in a timing expression, in the case where its value is equal to zero.

Time Expression: hours:minutes:seconds.fractional-seconds

TTML Time Expression TVTT Time Expression Relevant Parameters
hours: minutes: seconds.fractional-seconds hours: minutes: seconds.fractional-seconds Time Base, Clock Mode

This second case requires no transformation, except to limit the fraction-seconds portion of the timing expression to three decimal places.

For example:

TTML Time Expression TVTT Time Expression
01:02:43.0345555 01:02:43.035

Note:

  • If the TTML document contains a Time Base parameter equal to “clock”, and Clock Mode parameter is present and has a value of “gps”, the TTML time must be converted to “utc” time, per the instructions in the TTML specification.
  • If the TTML document contains a Time base parameter equal to "local", external information may be required to convert wall clock time into playback time.

Time Expression: hours:minutes:seconds:frames

TTML Time Expression TVTT Time Expression Relevant Parameters
hours: minutes: seconds:frames hours: minutes: seconds.fractional-seconds Time Base, Clock Mode, Drop Mode, Marker Mode, Frame Rate, Frame Rate Multiplier

When converting from time expressions that contain frames, it is necessary to know the frame rate that the TTML document uses. This information may be provided as parameters within the TTML document. TTML specifies two parameter types for carrying frame rate: ttp:frameRate and ttp:frameRateMultiplier.

In addition, in the case where the ttp:timeBase is equal to smpte and the ttp:markerMode is either not set or set to discontinuous, it will be necessary to account for any discontinuities in timing expressions when converting.

For example, in a TTML file with a frame rate of 30:

TTML Time Expression TVTT Time Expression
01:02:43:07 01:02:43.233

As another example, in a TTML file with a frame rate of 30 and a frame rate multiplier of 1000:1001:

TTML Time Expression TVTT Time Expression
01:02:43:07 01:02:43.234

Notes:

  • If the TTML document contains Time Base parameter equal to “smpte”, then Drop Mode parameters must be applied, per the instructions in the TTML specification.
  • If the TTML document contains Time Base parameter equal to “clock”, and the Clock Mode parameter is present and has a value of “gps”, the TTML time must be converted to “utc” time, per the instructions in the TTML specification. -It is possible that captions timing expressions will contain an offset. External information will be required to identify when an offset is being used and to account for it when translating timing expressions.

Time Expression: hours:minutes:seconds:frames.sub-frames

TTML Time Expression TVTT Time Expression Relevant Parameters
hours: minutes: seconds: frames.sub-frames hours: minutes: seconds.fractional-seconds Time Base, Clock Mode, Drop Mode, Frame Rate, Sub Frame Rate

When converting from time expressions that contain frames and sub-frames, it is necessary to know the frame rate and the sub-frame rate that the TTML document uses. This information may be provided as parameters within the TTML document, or as external data that is input to the mapping process. TTML specifies two parameter types for carrying frame rate: ttp:frameRate and ttp:subFrameRate.

For example, in a TTML file with a frame rate of 30 and a sub-frame rate of 2:

TTML Time Expression TVTT Time Expression
01:02:43:07.1 01:02:43.25

Notes:

  • If the TTML document contains Time Base parameter equal to “smpte”, then Drop Mode parameter must be applied, per the instructions in the TTML specification.
  • If the TTML document contains Time Base parameter equal to “clock”, and the Clock Mode parameter is present and has a value of “gps”, the TTML time must be converted to “utc” time, per the instructions in the TTML specification.

Time Expression: hours.fractional-hours

TTML Time Expression TVTT Time Expression Relevant Parameters
hours.fractional-hours seconds.fractional-seconds Time Base, Clock Mode

When converting between durations in hours to durations in seconds, simply multiply by the number of seconds in an hour: 3600.

For example, for a duration of 3 hours:

TTML Time Expression TVTT Time Expression
3h 03:00:00.000

Similarly, for a duration of 3.45 hours:

TTML Time Expression TVTT Time Expression
3.45h 03:27:00.000

Note:

  • If the TTML document contains Time Base parameter equal to “clock”, and the Clock Mode parameter is present and has a value of “gps”, the TTML time must be converted to “utc” time, per the instructions in the TTML specification.

Time Expression: minutes.fractional-minutes

TTML Time Expression TVTT Time Expression Relevant Parameters
minutes.fractional-minutes seconds.fractional-seconds Time Base, Clock Mode

When converting between durations in hours to durations in seconds, simply multiply by the number of seconds in a minute: 60.

For example, for a duration of 3 minutes:

TTML Time Expression TVTT Time Expression
3m 00:03:00.000

Similarly, for a duration of 3.45 minutes:

TTML Time Expression TVTT Time Expression
3.45m 00:03:27.000

Note:

  • If the TTML document contains Time Base parameter equal to “clock”, and the Clock Mode parameter is present and has a value of “gps”, the TTML time must be converted to “utc” time, per the instructions in the TTML specification.

Time Expression: seconds.fractional-seconds

TTML Time Expression TVTT Time Expression Relevant Parameters
seconds.fractional-seconds seconds.fractional-seconds Time Base, Clock Mode

This case requires very little transformation, as WebVTT supports this format. In some cases, it may be necessary to append the fractional seconds equal to zero to the timestamp.

For example, for a duration of 3 seconds:

TTML Time Expression TVTT Time Expression
3s 00:00:03.000

Similarly, for a duration of 3.45 seconds:

TTML Time Expression TVTT Time Expression
3.45s 00:00:03.450

Notes:

  • If the TTML document contains Time Base parameters equal to “clock”, and the Clock Mode parameter is present and has a value of “gps”, the TTML time must be converted to “utc” time, per the instructions in the TTML specification.

Time Expression: milliseconds.fractional-milliseconds

TTML Time Expression TVTT Time Expression Relevant Parameters
milliseconds.fractional-seconds seconds.fractional-seconds Time Base, Clock Mode

When converting between durations in hours to durations in seconds, simply divide by the number of seconds in a minute: 1000.

For example, for a duration of 3 milliseconds:

TTML Time Expression TVTT Time Expression
3ms 00:00:00.003

Similarly, for a duration of 3.45 milliseconds:

TTML Time Expression TVTT Time Expression
3ms 00:00:00.004

Note:

  • If the TTML document contains Time Base parameter equal to “clock”, and the Clock Mode parameter is present and has a value of “gps”, the TTML time must be converted to “utc” time, per the instructions in the TTML specification.
  • Some rounding of timing expression values may be necessary if the value expressed in the TTML document has greater precision than is supported in the timing expression format supported by TVTT.

Time Expression: frames.fractional-frames

TTML Time Expression TVTT Time Expression Relevant Parameters
frames.fractional-frames seconds.fractional-seconds Time Base, Clock Mode, Drop Mode, Frame Rate, Frame Rate Multiplier

When converting from time expressions that contain frames, it is necessary to know the frame rate that the TTML document uses. This information may be provided as parameters within the TTML document, or as external data that is input to the mapping process. TTML specifies two parameter types for carrying frame rate: ttp:frameRate and ttp:frameRateMultiplier.

For example, in a TTML file with a frame rate of 30:

TTML Time Expression TVTT Time Expression
75f 00:00:02.500

As another example, in a TTML file with a frame rate of 30 and a frame rate multiplier of 1000:1001:

TTML Time Expression TVTT Time Expression
75f 00:00:02.502

Notes:

  • If the TTML document contains Time Base parameter equal to “smpte”, then the Drop Mode parameter must be applied, per the instructions in the TTML specification.
  • If the TTML document contains Time Base parameter equal to “clock”, and the Clock Mode parameter is present and has a value of “gps”, the TTML time must be converted to “utc” time, per the instructions in the TTML specification.

Time Expression: ticks.fractional-ticks

TTML Time Expression TVTT Time Expression Relevant Parameters
ticks.fractional-ticks seconds.fractional-seconds Time Base, Clock Mode, Tick Rate

When converting from time expressions that contain ticks, it is necessary to know the tick rate that the TTML document uses. This information may be provided as parameters within the TTML document, or as external data that is input to the mapping process. TTML specifies the following parameter type for carrying tick rate: ttp:tickRate.

For example, given a Tick Rate of 15:

TTML Time Expression TVTT Time Expression
50t 00:00:03.333

Similarly, for a duration of 50.45 ticks:

TTML Time Expression TVTT Time Expression
50.45t 00:00:03.363

Note:

  • If the TTML document contains Time Base parameter equal to “clock”, and the Clock Mode parameter is present and has a value of “gps”, the TTML time must be converted to “utc” time, per the instructions in the TTML specification.

Converting Durations to End Times

The TVTT profile does not support durations, but rather requires that cue timings be expressed as begin and end times. Therefore, as part of transforming general TTML documents to conform with the TVTT profile, any timing expressed as duration must be transformed into an end time.

Example: Duration to End Time Conversion

This example starts with an excerpt from a TTML file that uses "dur" to express the amount of time that <p> elements should be displayed.

<body begin="00:00:20.000" end="00:00:50.000>
   <div begin="00:00:01.000" dur="10.000s">
	 <p begin="00:00:00.000" dur="5s">Appears at 21 secs<br>
	 and remains visible to 26 secs</p>
	 <p begin="00:00:05.000" dur="5s">Appears at 26 secs<br>
	 and remains visible to 31 secs</p>
   </div>
</body>
					

Transform timing information from "dur" to "end".

<body begin="00:00:20.000" end="00:00:50.000>
   <div begin="00:00:01.000" end="00:00:31.000">
	 <p begin="00:00:00.000" end="00:00:05.000">Appears at 21 secs<br>
	 and remains visible to 26 secs</p>
	 <p begin="00:00:05.000" end="00:00:10.000"">Appears at 26 secs<br>
	 and remains visible to 31 secs</p>
   </div>
</body>
					

Calculate Cue Timing With Respect to the TTML Hierarchy

Once all timing expressions have been converted to be valid against the TVTT profile, the next step is to preserve and apply timing information from parent elements in order to calculate the correct timing to use for the WebVTT cue.

Below are several examples of TTML excerpts containing body, <div> and <p> elements, with timing information included in each element.

Example: Parallel Timing

This example starts with an excerpt from a TTML file that does not specify a timeContainer attribute on any element. When not specified, the timeContainer defaults to parallel timing of child elements.

<body begin="00:00:20.000" end="00:00:50.000>
   <div begin="00:00:01.000" end="00:00:11.000">
	 <p begin="00:00:00.000" end="00:00:05.000">Appears at 21 secs<br>
	 and remains visible to 26 secs</p>
	 <p begin="00:00:05.000" end="00:00:10.000">Appears at 26 secs<br>
	 and remains visible to 31 secs</p>
   </div>
</body>
				

Step 1: Apply the timing information from the <body> to the <div>.

The <div> begin and duration times must be adjusted to account for the <body> element's begin and end times, so that:

<div begin="00:00:21.000" end="00:00:31.000">
	 <p begin="00:00:00.000" end="00:00:05.000">Appears at 21 secs<br> 
	 and remains visible to 26 seconds</p>
	 <p begin="00:00:05.000" end="00:00:10.000">Appears at 26 secs<br>
	 and remains visible to 31 secs</p>
</div>
				

Step 2: Apply the timing information from the <div> to the <p>.

<p begin="00:00:21.000" end="00:00:26.000">Appears at 21 secs<br>
	and remains visible to 26 seconds</p>
<p begin="00:00:26.000" end="00:00:31.000">Appears at 26 secs<br> 
	and remains visible to 31 secs</p>
				

Note:

  • In this example, the duration of the <body> and <div> elements did not impact the timestamps of the <p> elements during the flattening process. If one of the <p> elements had had an end time beyond either the <body> or <div> durations, it would have been truncated to the enclosing elements’ durations.

Example: Sequential Timing

This example starts with an excerpt from a TTML file that specifies a timeContainer attribute on the div element with a sequential value.

<body begin="00:00:20.000" end="00:00:50.000">
   <div timeContainer="seq" begin="00:00:01.000" end="00:00:21.000">
	 <p begin="00:00:00.000" end="00:00:05.000">Appears at 21 secs<br> 
	 and remains visible to 26 secs</p>
	 <p begin="00:00:05.000" end="00:00:05.000">Appears at 31 secs<br> 
	 and remains visible to 36 secs</p>
   </div>
</body>
				

Step 1: Apply the timing information from the body to the div element.

The div begin and duration times must be adjusted to account for the body’s begin and end times, so that:

<div begin="00:00:21.000" end="00:00:31.000">
	<p begin="00:00:00.000" end="00.00.05.000">Appears at 21 secs<br> 
	and remains visible to 26 seconds</p>
	<p begin="00:00:05.000" end="00:00:10.000">Appears at 26 secs<br> 
	and remains visible to 31 secs</p>
</div>
				

Step 2: Apply the timing information from the <div> to the <p>.

<p begin="00:00:21.000" end="00:00:26.000">Appears at 21 secs<br> 
	and remains visible to 26 seconds</p>
<p begin="00:00:31.000" end="00:00:36.000">Appears at 31 secs<br> 
    and remains visible to 36 secs</p>
				

Converting from TVTT to WebVTT

General

Mapping TTML <head> Elements to WebVTT

The TTML <head> element contains metadata as well as Styling and Layout elements. Other sections of this document provide detailed descriptions for mapping information from the <head> element to WebVTT.

Mapping TTML <body> Elements to WebVTT

TTML documents contain a <body> element. This element holds the captions, and makes reference to the styling, timing, layout and other information defined in the TTML <head> element. The <body> element can also use <div> elements to organize captions into groups. Captions inherit timing, layout or styling information from the elements that contain them.

In order to map the contents of a TTML <body> to WebVTT, several processes must be applied. These processes attempt to preserve information while transforming the captions data into a form that can be represented in WebVTT.

Ordering

TTML documents can have captions listed in arbitrary order with respect to time, while WebVTT documents must have captions listed according to their display time, ordered from earliest time to latest time. Therefore, captions from a TTML document must be put into display time order prior to mapping them into WebVTT. Once the flattening step is finished, the next step is to re-order the captions based on the timing of each <p>.

Converting

For many attributes, including spatial and timing values, TTML supports a larger set of representations and units than WebVTT does. As a result, many TTML documents will require unit conversions to be transformed into valid WebVTT documents.

Later sections of this document describe in detail how to map positioning, styling and timing information between the two formats.

Mapping of <p>

Every <p> is mapped to a WebVTT cue. The value of @xml:id of <p> is mapped to the id of the corresponding cue. The text content of the <p> is mapped to cue text. See the styling section for how to map the @style of a <p>.

Example: Mapping the <p> Element

Before:

<p begin="00:00:00.000" end="00:00:02.000" xml:id="p1" ...>
Good morning!
</p>
					

After:

p1
00:00:00.000 --> 00:00:02.000
Good morning!
					

Mapping of <span>

Every <span> that has a style attribute is mapped to a class span tag in WebVTT where the values of the @style are mapped to applicable classes of the class span tag. Every <span> with @xml:lang is mapped to a language span tag with a the corresponding value.

Example: Mapping the <span> Element

Before:

<span xml:lang="en"><span style="s1 s2">Good morning</span>
					

After:

<lang en><c.s1.s2>Good morning</c>
					

Mapping of <br>

A <br> is mapped to a WebVTT line terminator.

Example: Mapping the <br> Element

Before:

<p ...>What a day!<br/>- Yes!</p>
					

After:

What a day!
- Yes!
					

Mapping Positioning Information

The position and dimension of the TTML Root Container region may differ from the dimensions of the video. In other words, there may be some padding around the Root Container region. In this case, the padding must be taken into account when computing WebVTT percentages. Ultimately, the WebVTT values must be expressed relative to the video viewport dimensions.

Block Size

To convert tts:extent, when applied to a TTML region, to WebVTT cue settings:

  • Determine whether the width or height of the TTML extent attribute corresponds to the WebVTT size setting. If the TTML block progression direction is top-to-bottom, the width corresponds to the "size" setting; otherwise the height corresponds to the "size" setting. Based on this determination, calculate the size cue setting as a percentage of the corresponding video dimension.
  • WebVTT does not provide a way to define the extent in the block progression dimension. Instead, WebVTT relies on automatic behavior to handle resizing in this dimension based on the text content.

To convert tts:extent, when applied to a TTML region, to a WebVTT region:

  • Convert the TTML extent width to a percentage of the video width. This percentage will be the value of the "width" setting in the region setting list.
  • Convert the TTML extent height to a line value. (This will require converting to a percentage of the video height, then dividing by the font size, rounded to the next integer). This will be the value of the "lines" setting in the region setting list.

Block Position

To convert tts:origin to WebVTT cue settings:

  • Convert the tts:origin x and y values to percentages of the video dimensions.
  • Determine the block progression direction of the TTML region from the tts:writingMode attribute. This will determine the appropriate value for the "vertical" cue setting.
  • Determine the writing direction of the text from the tts:writingMode attribute.
  • If the block progression direction is top to bottom:
    • The value of the "line" cue setting is the percentage value for the second coordinate of the origin attribute.
    • The value of the "position" cue setting is the percentage value for the first coordinate of the origin attribute.
  • If the block progression direction is left to right:
    • The value of the "line" cue setting is the percentage value for the first coordinate of the origin attribute.
    • The value of the "position" cue setting is the percentage value for the second coordinate of the origin attribute.
  • Always provide the optional position alignment value "start".

To convert tts:origin to WebVTT region settings:

  • WebVTT region positioning is controlled by the region anchor setting and the viewport anchor setting.
  • WebVTT regions only support horizontal cues, which corresponding to a vertical TTML block progression direction.
  • Set the "regionanchor" setting value to 0%,0%, which is the upper left corner of the region.
  • Set the "viewportanchor" setting value to the percentage values for the two origin attribute coordinates.

Example: Converting from a TTML Region to a WebVTT Cue

In this example, the TTML is converted to WebVTT without using WebVTT regions. It begins with a few fragments of TTML, the first containing a region definition, and the second containing a <p> element:

Before:

<layout>
  <region xml:id="reg3" tts:origin="25% 80%" tts:extent="50% 16%" >
  </region>
</layout>
				
  <p region="reg3" begin="00:00:00.000" end="00:00:10.000">A simple caption example.</p>
				

After: Converted to WebVTT syntax

00:00:00.000 --> 00:00:10.000 position:25% line:80% size:50% align:start
A simple caption example.
				

Notes:

  • In WebVTT, text alignment defaults to the middle of the cue box. In order to have cue text aligned to the left, it is necessary to add the align:start, or align:left value to the cue.

  • In WebVTT, there is no way to directly specify the size of the dimension in the block progression direction, in this case, the vertical direction. This dimension is determined by the amount of text in the cue box.

Example: Convert a Region from TTML to WebVTT, Positioned at the (top, left) Corner of the Viewport

This example begins with a TTML region definition:

<layout>
  <region xml:id="regionExample1" tts:origin="0% 0%" tts:extent="50% 16%" >
  </region>
</layout>
				

Step 1: Convert from the vertical extent value from percent to line number

WebVTT specifies the vertical size of a region in terms of integer lines of text. Assuming a default line height of 5.33,

number of lines = 16% / 5.33vh = 3 
				

Step 2: Convert from TTML to WebVTT syntax

Region: id=regionExample1 width=50% lines=3 regionanchor=0%,0% viewportanchor=0%,0% scroll=up
				

Notes:

  • By setting both the regionanchor and the viewportanchor to ( 0%, 0% ), the ( top, left ) corner of the region is anchored to the ( top, left ) corner of the video viewport, matching the origin specified in the TTML region definition above.

Example: Converting a Region from TTML to WebVTT, Positioned in the Middle of the Viewport

This example begins with a TTML region definition:

<layout>
  <region xml:id="regionExample2" tts:origin="25% 80%" tts:extent="50% 32%">
  </region>
</layout>
				

Step 1: Convert from the vertical extent value from percent to line number

WebVTT specifies the vertical size of a region in terms of integer lines of text. Assuming a default line height of 5.33,

number of lines = 32% / 5.33vh = 6 
				

Step 2: Convert from TTML to WebVTT syntax

Region: id=regionExample2 width=50% lines=6 regionanchor=0%,0% viewportanchor=25%,80% scroll=up
				

Mapping Styling Information

Creation of a CSS file

When starting from a TTML document that conforms to the TVTT profile, the first step is to create a CSS file where all of the <style> elements are mapped to ::cue pseudo elements with the value of @xml:id of the <style> element as a class name.

Example: Creating CSS Style Class from TTML Style Element

Starting with the following TTML snippet:

<style xml:id="fontStyles" tts:fontFamily="monospace" tts:fontSize="200%"/>    
<style xml:id="colorStyles" tts:color="white" tts:backgroundColor="black"/>
				

The corresponding CSS classes looks like this:

::cue(.fontStyles) {
  font-family=monospace;
  font-size=200%;
}

::cue(.colorStyles){
  color= white;
  background-color: black;
}
				

Mapping of Style Attributes to CSS Properties

The table below shows how to map TTML style attributes to CSS properties.

TTML Style attribute CSS property
<tts:backgroundColor> background-color
<tts:color> color
<tts:fontFamily> font-family
<tts:fontSize> font-size
<tts:fontStyle> font-style
<tts:fontWeight> font-weight
<tts:lineHeight> line-height
<tts:textDecoration> text-decoration
<tts:textOutline> outline-color
<tts:visibility> visibility

Mapping of an RGBA Color Value

As part of the mapping process, it is necessary to convert rgba notated colors from the notation used in TTML to the notation used in CSS. This conversion is accomplished by dividing the last value by 255 and rounding to a decimal with a fraction expressed as one digit.

Example: Translating TTML Background Color to CSS

TTML: tts:backgroundColor="rgba(0,0,0,178)"
CSS: background-color: rgba(0,0,0,0.7);

Mapping of the @style on <body>

Any <style> elements referenced by the <body> of a TTML document should be mapped to a ::cue pseudo element containing the corresponding CSS properties and values.

Example: Mapping Style Elements Applied to <body>
TTML:

  <style xml:id="defaultStyle" fontWeight="normal" fontSize="100%" .../>
				

CSS:

  ::cue {
  font-weight=normal;
  font-size=100%;
}
				

Mapping of @style on <p>

If a <p> contains references to one or more <style> elements, the corresponding cue should start with a c class span tag, where the style references are mapped to applied classes.

Example: Mapping Style Elements Applied to <p> TTML:

<p style="s1 s2 s3" ...>Good morning!</p>
				

WebVTT:

<c.s1.s2.s3>Good morning!
				

Mapping of @style on <span>

A <span> that references a <style> can be mapped to a "c span tag" where the references to styles are mapped to applicable CSS class names.

Example: Mapping Style Elements Applied to <span>

<span style="speaker1">What a day!</span><br/>
<span style="speaker2">Yes!</span>
				

Example WebVTT:

<c.speaker1>What a day!</c>
<c.speaker2>Yes!</c>
				

Mapping Timing Information

Most of the transformation of timing information occurs during pre-processing. Once a document conforms to the TVTT format, only a few remaining transformations must be handled during the mapping to WebVTT:

  • Convert span elements that contain timing information
  • Convert from TTML syntax to WebVTT syntax
  • Order the cues

Calculate Timing for <span> Elements

Documents that employ <span> elements with timing information will require additional processing when mapping from TVTT to WebVTT.

Example: <span> Elements

This example starts with an excerpt from a TTML file that includes some <span> elements within <p> elements.

<body timeContainer="par">
   <div timeContainer="par">
	   <p begin="00:00:10.000" end="00:00:40.000">
		  <span end="00:00:24.400">Appears at 10 seconds and 
		  disappears at 24.4 seconds</span>
		  <br/>
		  <span begin="00:00:25.000" end="00:00:35.000">Appears at 25 seconds and 
		  disappears at 35 seconds</span>
	   </p>
   </div>
</body>
				

Step 1: Define a CSS class for hidden text

::cue(.invisible_text) { color: rgba(0, 0, 0, 0);} 
				

Step 2: Transform the spans into separate <p> elements

<p begin="10.000s" end="24.400s">Appears at 10 seconds and 
disappears at 24.4 seconds</p>
<p begin="00:00:25.000" end="35.000s">Appears at 25 seconds and 
disappears at 35 seconds</p>
				

Step 3: Convert from TTML to WebVTT syntax

00:00:10.000 --> 00:00:24.400
This text must appear at 10 seconds and disappear at 24.4 seconds
<c.invisible_text>This text must appear at 25 seconds and disappear at 35 seconds</c>

00:00:25.000 --> 00:00:35.000
<c.invisible_text>This text must appear at 10 seconds and disappear at 24.4 seconds</c>
This text must appear at 25 seconds and disappear at 35 seconds 
				

An alternative approach is to use intra-cue timings.

0:00:10.000 --> 0:00:40.000
This text must appear at 10 seconds and disappear at 24.4 seconds\r
<0:00:24.400><0:00:25.000>This text must appear at 25 seconds and disappear at 35 seconds<0:00:35.000>
				

Converting from TTML to WebVTT Syntax

TTML:

<p begin="00:00:21.000" end="00:00:26.000">Appears at 21 secs<br> 
	and remains visible to 26 seconds</p>
<p begin="00:00:31.000" end="00:00:36.000">Appears at 31 secs<br> 
    and remains visible to 36 secs</p>
				

WebVTT:

00:00:21.000 --> 00:00:26.000
Appears at 21 secs
and remains visible to 26 seconds

00:00:31.000 --> 00:00:36.000
Appears at 31 secs 
and remains visible to 36 secs
				

Ordering

Completing these steps results in a document with a list of WebVTT cues. The last step is to sort these cues from earliest to latest time, based on each cue's beginning timestamp.

WebVTT to TTML

General

Mapping WebVTT Cues to TTML <p> Elements

Before mapping the syntax of WebVTT cues to the syntax of TTML <p> elements, it can be useful to assemble WebVTT cues into groups that can be transformed into TTML <div> elements. The hierarchical elements of TTML's syntactic structure can provide opportunities for consolidating expressions of style and layout.

Converting Positioning

TTML Position from a WebVTT Cue Box

When mapping positioning information from WebVTT to TTML, start by generating a TTML region definition for each WebVTT region or cue that has a different block size or location. After developing these independent regions, it may be possible to optimize the TTML by sharing or merging region definitions.

WebVTT has the automatic behavior that cue positions are subject to adjustment if cues overlap as positioned by the cue settings. TTML does not have analogous automatic behavior. To avoid overlap in the TTML version of a document, adjust the positioning of WebVTT cues and regions as part of the mapping process.

In WebVTT, the position of a cue is determined by its "position" and "line" cue settings. When interpreting these cue settings, it is necessary to apply the values of other WebVTT cues settings including:

  • The vertical cue setting

  • The writing direction cue setting

  • The size cue setting

Here are additional guidelines for interpreting WebVTT cue setting:

  • The line cue setting may either be a percentage or a line number. Positive line numbers are counted from the top, negative line numbers are counted from the bottom.

  • The position is configured in the direction opposite to the writing direction. For example, with horizontal cues, the writing direction is the vertical direction.

  • The optional alignment value determines whether the position is calculated relative to the start, middle, or end of the cue box. If the alignment value is not "start", then alignment depends upon the size of the cue box. For example, an alignment value of "end" will be equivalent to an alignment value of "start" plus the size of the cue box in the appropriate direction.

Calculate Cue Box Coordinates

In order to calculate the coordinates of the upper left vertex of a WebVTT cue box as percentage, use the following steps.

The "position" cue setting will determine one coordinate, referred to as the position_coordinate.

  • Calculate the computed text position alignment.
  • If the computed text position alignment is "start", the position_coordinate is the value of the "position" cue setting.
  • If the computed text position alignment is "end", the position_coordinate is the value of the "position" cue setting minus the value of the "size" cue setting.
  • If the computed text position alignment is "middle", the position_ coordinate is the value of the "position" cue setting minus half the value of the "size" cue setting.

The "line" cue setting will determine the other coordinate, referred to as the line_coordinate.

  • The cue height as a percentage of the viewport is the number of lines in the cue multiplied by the computed line height. Because WebVTT auto-wraps text, the number of lines of text must be computed by laying out the text and determining where line breaks will occur.
  • If the "line" value is a line number, convert it to a percentage value.
    • If the line number is positive, and thus counted from the top, the corresponding percentage value is (line number * computed line height)
    • If the line number is negative, and thus counted from the bottom, the corresponding percentage value is 100 + (line number * computed line height)
  • The cue line alignment, with a value of "start", "middle" or "end", determines the interpretation of the line percentage value. The direction of block growth controls whether the "start" or "end" corresponds to the upper left coordinate.
    • For horizontal cues, a line alignment of "start" corresponds to the top of the cue box. For a line alignment of "middle" subtract half of the cue height from the line percentage value. For a line alignment of "end", subtract all of the cue height from the line percentage value.
    • For vertical cues, if the cue grows left to right, a line alignment of "start" corresponds to the left edge of the cue box. For a line alignment of "middle", subtract half of the cue height from the line percentage value. For a line alignment value of "end", subtract all of the cue height from the line percentage value.

Block Size

When converting from WebVTT cue settings to a tts:extent value, the goal is to arrive at extent values expressed as percentages of the viewport's width and height. In cases where WebVTT cues express spatial dimensions solely using percentages of the viewport, there will be no need to convert into different units, as TTML supports percentages. In cases where WebVTT uses line numbers for vertical dimensions, it will be necessary to convert the line numbers into percentages of the viewport's height. As discussed above, this conversion depends upon determining the correct line height to use. Assuming WebVTT dimensions have been converted into percentages, the TTML extent can be calculated in the following way

When the cue's vertical setting is 'auto' or 'horizontal' the first value of the tts:extent pair will be equal to the cue's size property. The second value of the tts:extent pair must be synthesized, as WebVTT does not specify the size of the cue in the block progression direction. This second value can be computed by looking at the number of lines of text in the cue, and multiplying it by the computed line height, to achieve a value as a percentage of the viewport height.

tts:extent = ( size, computed line height * number of lines )
				

When the cue's vertical setting is 'vertical', the first value in the tts:extent pair that must be synthesized, while the second value in the pair will be equal to the cue's size property. The first value should be computed based on the number of lines in the cue and some font metrics.

tts:extent = ( computed line width * number of lines, size ) 
				

Block Position

To convert from WebVTT cue position settings to TTML, it is necessary to set both the tts:origin and tts:writingMode attributes on a TTML region.

As a first step, determine the TTML writing mode, based on the WebVTT vertical text cue setting and writing direction. If the setting is not present, the default value is horizontal. If the setting is vertical, it will have a writing direction of either left to right or right to left associated with it. Refer to the table above to map WebVTT values to TTML values.

Next, calculate the tts:origin value. The tts:origin will correspond to the value of the top, left corner of the WebVTT cue box.

In the case of a horizontal or auto writing mode:

tts:writing = "lrtb"
tts:origin = ( position, line )
				

Note:

  • WebVTT does not distinguish between left to right and right to left for its horizontal writing mode. The writing direction is determined based on the font and language. If the WebVTT content is using a right to left writing direction, this can be explicitly expressed in TTML by setting tts:writing to "rltb".

In the case of a vertical writing mode, with a left to right writing direction:

tts:writing = "tblr"
tts:origin = ( line, position )
				

Example: Converting from a WebVTT Cue to TTML

This example begins with a WebVTT fragment containing a cue that does not use a region:

00:00:00.000 --> 00:00:10.000 position:50% line:0% size:50%
A cue with no region.
				

Step 1: Define a TTML Region

Start by defining a TTML Region for this caption.

<layout>
<region xml:id="reg4" tts:origin="50% 0%" tts:extent="50% 16%" tts:writingMode="lrtb" >
</region>
</layout>
				

Step 2: Convert from WebVTT to TTML Syntax

Reference the region created above using its id.

  <p region="reg4" begin="00:00:00.000" end="00:00:10.000" tts:textAlign="center">A cue with no region.</p>
				

Notes:

  • WebVTT does not specify a vertical dimension, so in the process of mapping from WebVTT to TTML, it is necessary to synthesize a value for this dimension. Choosing 16%, assuming a default line height of 5.33 vh, gives us a vertical dimension roughly equal to three lines of text. This choice may be appropriate for some WebVTT content where there are no cues with more than three lines.

  • In WebVTT, text alignment defaults to the middle of the cue box. In order to replicate this behavior in TTML, it is necessary to set the tts:textAlign attribute to "center" on the region.

In the case of a vertical writing mode, with a left to right writing direction:

tts:writingMode = "tbrl"
tts:origin = ( line, position )
				

TTML Position from WebVTT Region

When determining the TTML origin based on a WebVTT region, it is necessary to take into account the two different types of anchors defined for WebVTT regions. The TTML origin will always correspond to the top, left corner of the region, while the WebVTT anchor point may correspond to some other point within the region.

To convert from WebVTT region settings to a tts:origin value:

  • With the region anchor mapped to the region viewport anchor, determine the viewport values that correspond to the region location 0, 0. If we let the region anchor values be a,b and let the region viewport anchor values be c,d
  • tts:origin="(c-a)% (d-b)%"

Example: Converting from a WebVTT Cue with Region to TTML

This example begins with a WebVTT fragment containing a cue that does use a region:

Region: id=reg5 width=30% lines=3 regionanchor=50%,50% viewportanchor=25%,40% scroll=up

00:00:00.000 --> 00:00:10.000 region:reg5
A cue that uses a region.
				

Step 1: Convert Height from Number of Lines to Percent Once again, the default line height of 5.33 vh will be used for this conversion.

line height = 3 lines * 5.33vh = 16%
				

Step 2: Calculate Coordinates of Top, Left Corner from Anchors Given that the region anchor is in the middle of the region, the viewport anchor provides the coordinates for the center of the region. From that information, and the calculated line height, the coordinates for the top, left corner can be calculated.

The first coordinate of the tss:extent pair will measure the horizontal position of the origin. To calculate it, the horizontal values of both the viewport anchor and the region anchor are needed, along with the width of the region.

extent horizontal =  horizontal-viewport anchor - horizontal-region anchor * region width 
extent horizontal =  25 - .50*30
extent horizontal =  10
				

The second coordinate of the tss:extent pair will measure the vertial position of the origin. To calculate it, the vertical values of both the viewport anchor and the region anchor are needed, along width the height of the region, converted into a percentage of the viewport.

extent vertical =  vertical-viewport anchor - vertical-region anchor * region height 
extent vertical =  40 - .50*16
extent vertical =  32
				

Step 3: Convert Region Definition to TTML Start by defining a TTML Region for this caption.

<layout>
<region xml:id="reg5" tts:origin="10% 32%" tts:extent="30% 16%" tts:textAlign="center">
</region>
</layout>
				

Step 4: Convert WebVTT Cue to TTML <p>

Reference the region created above using its id.

  <p region="reg5" begin="00:00:00.000" end="00:00:10.000">A cue that uses a region.</p>
				

Converting Styling

Mapping WebVTT CSS Styles to TTML

WebVTT uses CSS to carry styling information. Information held in CSS can be translated into TTML Style elements, contained in the Styling section of the TTML Head section.

Translation of style information from WebVTT to TTML has to be done in two steps:

  • Translation of the CSS properties associated with the ::cue pseudo selector to <style> elements with corresponding TTML style attributes and values.

  • References to these <style> elements by one of the TTML content elements <body>, <p> or <span>.

The following table contains the mapping of CSS properties to TTML style elements.

VTT CSS Property or Span Tag TTML style attribute mapping
CSS property: background-attachment -
CSS property: background-color -
CSS property: background-image -
CSS property: background-position -
CSS property: background-repeat -
CSS property: color color
CSS property: font-family fontFamily
CSS property: font-size fontSize
CSS property: font-style fontStyle
CSS property: font-variant -
CSS property: font-weight fontWeight
CSS property: line-height lineHeight
CSS property: opacity opacity
CSS property: outline-color textOutline
CSS property: outline-color textOutline
CSS property: outline-style textOutline
CSS property: outline-width textOutline
CSS property: text-decoration textDecoration
CSS property: text-shadow -
CSS property: visibility visibility
Span tag: b fontWeight
Span tag: i fontStyle
Span tag: u textDecoration

Mapping of default values and ::cue pseudo element without arguments

If a WebVTT document references CSS using a ::cue pseudo element without arguments, a <style> element should be created in the TTML <head> section, to hold styling information. This <style> element should then be referenced in the <body> element. If there is no use of ::cue pseudo elements in the WebVTT document, the TTML <style> element should be set according to the initial values that apply by default in WebVTT.

<style xml:id="bodyStyle" tts:color="rgba(255,255,255,255)" tts:fontFamily="sansSerif" ..../>
					

If there are no ::cue pseudo element without arguments applied to a WebVTT file, then all CSS properties that override the default values specified in WebVTT should also be set in the "bodyStyle" <body> element.

Example: Translating CSS associated with VTT to TTML

This example begins with a css fragment that defines a ::cue.

::cue {
  font-family: Verdana;
}
					

This information can be translated into a TTML <style> element, with a synthesized id, to be used to reference in the <body> element of the TTML document.

<style xml:id="bodyStyle" tts:color="rgba(255,255,255,255)" tts:fontFamily="Verdana" ..../>
					

Mapping of ::cue pseudo element with cue-id as argument

For every ::cue pseudo element where the argument is a cue-id a corresponding <style> element in TTML should be created and should be referenced by the <p> element that corresponds to that cue.

Example CSS associated with VTT

::cue(#id1) {
  font-family: Verdana;
}
					

Example VTT

id1
00:00:00.000 --> 00:00:02.000
Some text
					

Example TTML

....
<style xml:id="pStyleId1" tts:fontFamily="Verdana" ..../>
....
<p style="pStyleId1" begin="00:00:00.000" end="00:00:02.000" ..../>
					

Mapping of ::cue pseudo element with class name as argument

For every ::cue pseudo element where the argument is a class name a corresponding <style> element in TTML should be created and should be referenced by the <span> element that corresponds to the tag in WebVTT that uses this classname.

Example

::cue(.cyanColor) {
  font-color: cyan;
}
					

Example VTT

00:00:00.000 --> 00:00:02.000
<c.cyanColor>Some text
					

Example TTML

....
<style xml:id="cyanColor" tts:color="cyan" />
....
<span style="cyanColor">Some text</span>
					

Mapping of span tags for bold, italic and underline

Three <style> elements should be created to map the WebVTT tags for bold, italic and underline:

Example TTML

<style xml:id="bold" tts:fontWeight="bold" />
<style xml:id="italic" tts:fontStyle="italic" />
<style xml:id="underline" tts:textDecoration="underline" />
				

The <style> elements should be referenced by the <span> elements that correspond to the span tags <b>, <i> and <u>.

Converting Timing

WebVTT offers fewer options for Timing Expressions than TTML and does not provide a means to hierarchically group cues. These restrictions simplify the process of mapping from WebVTT to TTML. In fact, it is possible to use WebVTT timing information without any transformation, provided the correct values are specified for timing parameters and attributes in the destination TTML file.

This section describes the best values to use for TTML timing parameters and attributes and then provides some conversion examples.

Timing Parameters

Setting TTML timing parameters to the following values will allow WebVTT cues to be transformed into TTML <p> elements with no conversion of timing information required.

Parameter: Time Base

TTML Parameter Value
ttp:timeBase media

Note:

  • Setting the time base to "media" removes the need for additional timing parameters, as it relates to other time base values.

Timing Attributes

TTML Attribute Value
timeContainer par

Notes:

  • Setting the time container to “par” or parallel on all elements within the TTML body will specify that the times from WebVTT cues can be used in TTML without any adjustments.
  • The default value for timeContainer is par, so it does not need to be explicitly stated in TTML.

Example: WebVTT to TTML Conversion

This example begins with a short WebVTT file:

WEBVTT

00.00:00.000 --> 00.00:10.000 
This caption starts at 0s and remains for 10s.

00.00:15.000 --> 00.00:20.000 
This caption starts at 15s and remains for 5s.
				

Step 1: Convert from WebVTT to TTML syntax

Transform the cues into TTML <p> elements.

<p begin="00:00:00.000" end="00:00:10.000"> 
This caption starts at 0s and remains for 10s.</p>

<p begin="00:00:15.000" end="00:00:20.000"> 
This caption starts at 15s and remains for 5s.</p>
				

Step 2: Add the TTML Hierarchical Elements

<body>
   	<div>
		<p begin="00:00:00.000" end="00:00:10.000" > 
		This caption starts at 0s and remains for 10s.</p>

		<p begin="00:00:15.000" end="00:00:20.000" > 
		This caption starts at 15s and remains for 5s.</p>
	</div>
</body>
				

or, to be explicit, state the timeContainer attribute for the containing elements:

<body timeContainer="par">
	<div timeContainer="par">
		<p begin="00:00:00.000" end="00:00:10.000" > 
		This caption starts at 0s and remains for 10s.</p>

		<p begin="00:00:15.000" end="00:00:20.000" > 
		This caption starts at 15s and remains for 5s.</p>
	</div>
</body>
				

References

TTML Specification

WebVTT Specification

CSS Specification