This specification defines SyncMedia, an XML format for synchronized media presentations. A presentation consists of different types of media, orchestrated in a timeline. SyncMedia presentations are rendered to a user by a SyncMedia-aware player.

Relationship to Other Specifications

SyncMedia is an evolution of EPUB3 Media Overlays and, like Media Overlays, is built on [[SMIL3]]. Compared to Media Overlays, SyncMedia incorporates additional SMIL concepts, and also includes custom features.

A more detailed comparison of SyncMedia to both SMIL3 and EPUB3 Media Overlays can be found in the SyncMedia Explainer.

SyncMedia

SyncMedia is an XML format for synchronized media presentations. It uses a subset of [[SMIL3]] and also defines its own custom features. SyncMedia files use the filename extension .sync.

The default namespace for SyncMedia is that of SMIL: http://www.w3.org/ns/SMIL.

SyncMedia custom features use the SyncMedia namespace, which is https://w3.github.io/sync-media-pub.

This is a placeholder namespace URL; see issue 36

This section defines SyncMedia's elements and attributes, and gives examples.

Definitions

Media Object
A media resource in the [=Sync Media Document=]
Media Parameters
Named parameters to communicate options to the [=Media Object Renderer=]
Media Object Renderer
A component used by the [=Sync Media Player=] to render [=Media Objects=]. Different media types may necessitate different renderers.
Parallel Time Container
A [=Time Container=] in which children are rendered in parallel
Role
Gives the structural semantics for the item
Sequential Time Container
A [=Time Container=] in which children are rendered in sequence
Sync Media Document
The document containing the SyncMedia presentation.
Sync Media Player
A user agent that knows how to process and playback [=Sync Media Documents=]
Time Container
The container that dictates the playback order for its children
Track
An organizational concept that defines a purposeful virtual rendering space for media objects, not tied to a visual layout, with default properties

Document Structure

Each SyncMedia document MUST have a smil element as its root.

A SyncMedia document contains two parts: a head and a body. The head contains metainformation and track information. The temporal presentation of media objects is laid out in the body. Time containers can be used to render media in parallel or to arrange sequences.

A SyncMedia document MUST have a body. It MAY have a head.

Element Description
smil Root element
head Information not related to temporal behavior
body Main [=sequential time container=] for the presentation.

Time Containers

Media objects are arranged in time containers, which determine whether they are rendered together (in parallel) or one after the other (in sequence). Time containers MAY be nested in other time containers (but MUST NOT be nested in media objects).

Element Description
seq A [=sequential time container=] for media and/or time containers.
par A [=parallel time container=] for media and/or time containers.
<smil xmlns="http://www.w3.org/ns/SMIL">
    <body>
        <par>
            <audio src="chapter01.mp3" clipBegin="30" clipEnd="40"/>
            <text src="chapter01.html#heading_01"/>
        </par>
        <par>
            <audio src="chapter01.mp3" clipBegin="40" clipEnd="50"/>
            <text src="chapter01.html#para_01"/>
        </par>
        <par>
            <audio src="chapter01.mp3" clipBegin="50" clipEnd="60"/>
            <text src="chapter01.html#para_02"/>
        </par>
    </body>
</smil>

Structural semantics

Structural semantics MAY be added to time containers via the sync:role attribute. Values MUST come from WAI-ARIA Document Structure or DPUB-ARIA.

Benefits of structural semantics

There are benefits to applying structural semantics to time containers in SyncMedia. User agents that understand semantic role values MAY customize the user experience, for example by enabling the skipping of types of secondary content that interferes with the flow of narration (such as page number announcements, often included to provide a point of reference between print and digital editions); or escaping complex structures, such as tables or charts.

Attributes

Attribute Description
sync:role One or more semantic role(s)

TODO Issue 12

<smil xmlns="http://www.w3.org/ns/SMIL" sync:xmlns="https://w3.github.io/sync-media-pub">
    <body>
        <par>
            <audio src="chapter01.mp3" clipBegin="50" clipEnd="60"/>
            <text src="chapter01.html#para_02"/>
        </par>
        <par sync:role="doc-pagebreak">
            <audio src="chapter01.mp3" clipBegin="60" clipEnd="62"/>
            <text src="chapter01.html#pg_04"/>
        </par>
        <par>
            <audio src="chapter01.mp3" clipBegin="62" clipEnd="70"/>
            <text src="chapter01.html#para_03"/>
        </par>
    </body>
</smil>

Media Objects

Media resources are included in SyncMedia via media objects. The actual media resource is an external file, or quite commonly, a segment of a file, such as an audio or video clip, or part of an HTML document.

The table below describes the media objects in SyncMedia. Ref can be used to represent any media, but authors often prefer to use media type-specific synonyms.

Element Description
audio References audio media.
image References image media.
ref Generic media reference
text References content in an external text-based document.
video References video media.

Attributes

Attributes on media objects are used to

  • express the location of the media source, including segment
  • assign a media object to a [=track=]
  • indicate that a media object repeats
Attribute Description
clipBegin Start of a timed media clip, as in SMIL3's clipBegin
clipEnd End of a timed media clip, as in SMIL3's clipEnd
panZoom Rectangular portion of media object, as in SMIL3's panZoom
repeatCount Specifies the number of iterations of a timed media object. Values are a number, or "indefinite", as in SMIL3's repeatCount
src URL of media file, optionally including a media fragment [[media-frags]]
sync:track ID of a sync:track element.

EPUB Media Overlays clock values are considered valid clip begin and end values, because the SMIL MediaClipping Module states that if no metric specifier is given, Normal Play Time (npt) is assumed (not smpte).

If both an src with a media fragment and clipBegin/clipEnd attributes are present, clipping MUST be applied to the resource with respect to the media fragment offset(s), as defined in All Media Fragment Clients.

It is RECOMMENDED to use a media fragment on src to refer to a large chunk of media; and to use clipBegin and clipEnd for defining fine-grained clips. This is to separate the requirement on the client of retrieving the resource, perhaps done using a URI request to a server, from locating a segment of the resource, done with Media Fragments clip start/end points. Otherwise, if a client is fetching every phrase individually, it would then have to implement complex caching to smooth out playback so as to remove glitching between clips.

Embedded media

Embedded media, such as a video in an HTML document, MAY be referenced by the URL of its embedding document plus a selector.

Therefore, [=media object renderer=]s SHOULD support opening an HTML document and dereferencing content based on a selector.

<par>
    <text src="doc.html#para1"/>
    <video src="doc.html#video1" clipBegin="0" clipEnd="10"/>
</par>

Parameters

SyncMedia uses SMIL3's param to send parameters to [=media object renderer=]s.

Element Description
param Media object rendering parameter.

The attributes for param are:

Attribute Description
name Parameter name
value Parameter value

The following parameter name values are defined:

Name Allowed value(s) Description For media object(s)
cssClass One or more strings Indicates class name(s) to apply Media that can be styled with CSS
clipPath As defined by the SVG path data attribute The shape that will be used to apply a clip mask to the media Visual media
pan Between -1 (full left) and 1 (full right) Indicates the left/right pan Audible media
playbackRate 1.0 (normal rate), less, or more Indicates the playback rate. Values SHOULD align with HTML's {{HTMLMediaElement/playbackRate}}. Timed media
volume Between 0 and 1 Indicates the volume Audible media

clipPath specifies a clipping path using an SVG path definition. The clipping is applied to the visible region of the Media Object on which it is defined. When combined with panZoom, the clipPath SHOULD be applied inside the rect defined by the panZoom attribute.

<smil xmlns="http://www.w3.org/ns/SMIL">
    <body>
        <par>
            <audio src="chapter01.mp3" clipBegin="30" clipEnd="40"/>
            <text src="chapter01.html#heading_01">
                <param name="cssClass" value="highlight"/>
            </text>
        </par>
        <par>
            <audio src="chapter01.mp3" clipBegin="40" clipEnd="50"/>
            <text src="chapter01.html#para_01">
                <param name="cssClass" value="highlight"/>
            </text>
        </par>
        <par>
            <audio src="chapter01.mp3" clipBegin="50" clipEnd="60"/>
            <text src="chapter01.html#para_02">
                <param name="cssClass" value="highlight"/>
            </text>
        </par>
    </body>
</smil>

Tracks

SyncMedia presentations organize media objects of the same types into virtual spaces called "tracks". Tracks MUST be placed in the SyncMedia document head. Tracks have several useful features:

  1. A track MAY provide default params that then get applied to any media object on that track.
  2. A track MAY be set as the default for a given media object type (e.g. all the audio media objects can be automatically assigned to a track).
  3. A track MAY have a default source for all its media objects to use, in combination with any fragment specifier on the media object itself.

All of these features reduce verbosity as otherwise these properties would have to be explicitly stated on each media object.

Element Description
sync:track A virtual space to which [=Media Objects=] are assigned. A user agent MAY offer interface controls on a per-track basis (e.g. adjust volume on the narration track). A sync:track MAY have [=media parameters=], which act as defaults for [=Media Objects=] on that track.

Attributes

Attribute Description
sync:label The track's label
sync:defaultSrc URL of the default file that media objects on this track will use.
sync:defaultFor Media objects of the type specified (one of: audio, image, video, text, ref) are automatically assigned to this track.
sync:trackType Indicates which presentation feature is embodied by this track.

TODO: Issue 31

<smil xmlns="http://www.w3.org/ns/SMIL" sync:xmlns="https://w3.github.io/sync-media-pub">
    <head>
        <sync:track sync:label="Page" sync:defaultFor="text" 
            sync:defaultSrc="chapter01.html" sync:trackType="contentDocument">
            <param name="cssClass" value="highlight"/>
        </sync:track>
    </head>
    <body>
        <par>
            <audio src="chapter01.mp3" clipBegin="30" clipEnd="40"/>
            <text src="#heading_01"/>
        </par>
        <par>
            <audio src="chapter01.mp3" clipBegin="40" clipEnd="50"/>
            <text src="#para_01"/>
        </par>
        <par>
            <audio src="chapter01.mp3" clipEnd="50" clipEnd="60"/>
            <text src="#para_02"/>
        </par>
    </body>
</smil>
<smil xmlns="http://www.w3.org/ns/SMIL" sync:xmlns="https://w3.github.io/sync-media-pub">
    <head>
        <sync:track id="background-music" sync:trackType="backgroundAudio">
            <param name="volume" value="0.5"/>
        </sync:track>
        <sync:track sync:label="Narration" sync:defaultFor="audio" sync:trackType="audioNarration"/>
        <sync:track sync:label="Page" sync:defaultFor="text" sync:trackType="contentDocument">
            <param name="cssClass" value="highlight"/>
        </sync:track>
    </head>
    <body>
        <par>
            <audio sync:track="background-music" src="bkmusic.mp3" repeat="indefinite"/>
            <seq>
                <par>
                    <audio src="chapter01.mp3" clipBegin="30" clipEnd="40"/>
                    <text src="chapter01.html#heading_01"/>
                </par>
                <par>
                    <audio src="chapter01.mp3" clipBegin="40" clipEnd="50"/>
                    <text src="chapter01.html#para_01"/>
                </par>
                <par>
                    <audio src="chapter01.mp3" clipEnd="50" clipEnd="60"/>
                    <text src="chapter01.html#para_02"/>
                </par>
            </seq>
        </par>
    </body>
</smil>

The reason for including a narration sync:track, even though it supplies no default values, is because it would enable a user agent to have separate controls for narration audio vs background music audio.

Metadata

SyncMedia has a generic mechanism for incorporating metadata but does not define any specific metadata. Metadata MUST go in the SyncMedia document head.

Element Description
metadata Extension point that allows the inclusion of metadata from any metainformation structuring language

Playback

Processing

Applying track values to media objects

[=Tracks=] MAY provide defaults for [=media objects=]. This section gives the rules for how to apply these values.

Track attribute Impact on media object
sync:defaultSrc Provides the src for the media object. If the media object has an src which is only a selector, then the selector is appended to the track's sync:defaultSrc. Any other value for a media object src overrides the track's sync:defaultSrc.

In addition, any [=media parameters=] defined for a track are inherited by any media objects on that track. The exception is when the media objects themselves provide a parameter of the same name, in which case, the media object's parameter value overrides the track's parameter value.

Rendering

After the SyncMedia document has been processed, it is ready to be rendered.

Element Rendering behavior
body Render like seq
seq Render each child in order, each starting after the previous completes. Done when the last child is finished.
par Render each child at the same time. Done when all the children are finished.
audio Play the referenced portion of audio media and apply params. Done when the referenced portion has finished.
image Load the image file or segment and apply params. Not timed, so considered done immediately.
ref Infer the media type and, if supported, render the file or segment, and apply params. If timed, done when the segment is finished; if untimed, done immediately.
text Display the HTML document, ensure the referenced element is visible, and apply params. Not timed, so considered done immediately.
video Play the video file or segment and apply params. Done when the segment is finished.

Note about media with repeatCount and when it's considered done

User Interaction

TODO: how much to cover here?

XML

Additional attributes

In addition to the attributes already covered, this section adds the following standard XML attributes:

Attribute Description
xml:base Document base URL, as defined in [[XMLBASE]]
xml:id Unique identifier for an element, as defined in [[XML-ID]]
xml:lang Language identifier, as defined in [[XML]]

Content model

This is the XML content model for SyncMedia. Required elements and attributes are indicated.

ElementAttributesContent
`smil` (required) In this order:
`head` In any order:
`metadata` 0 or more elements from any namespace
`sync:track`
`param` Empty
`body` In any order:
`seq` In any order:
`par` In any order:
`audio`
`image`
`ref`
`text`
`video`

Acknowledgements

At the time of publication, the members of the Synchronized Multimedia for Publications Community Group were:

Avneesh Singh (DAISY Consortium), Ben Dugas (Rakuten, Inc.), Chris Needham (British Broadcasting Corporation), Daniel Weck (DAISY Consortium), Didier Gehrer, Farrah Little (BC Libraries Cooperative), George Kerscher (DAISY Consortium), Ivan Herman (W3C), James Donaldson, Lars Wallin (Colibrio), Livio Mondini, Lynn McCormack (CAST, Inc), Marisa DeMeglio (DAISY Consortium, chair), Markku Hakkinen (Educational Testing Service), Matt Garrish (DAISY Consortium), Michiel Westerbeek (Tella), Nigel Megitt (British Broadcasting Corporation), Romain Deltour (DAISY Consortium), Wendy Reid (Rakuten, Inc.), Zheng Xu (Rakuten, Inc.)