This specification defines SyncMedia, an XML format for synchronized media presentations. A presentation consists of different types of media, orchestrated in a timeline. SyncMedia presentations are rendered to a user by a SyncMedia-aware player.
SyncMedia is an evolution of EPUB3 Media Overlays and, like Media Overlays, is built on [[SMIL3]]. Compared to Media Overlays, SyncMedia incorporates additional SMIL concepts, and also includes custom features.
A more detailed comparison of SyncMedia to both SMIL3 and EPUB3 Media Overlays can be found in the SyncMedia Explainer.
SyncMedia is an XML format for synchronized media presentations. It uses a subset of [[SMIL3]] and also defines its own custom features. SyncMedia files use the filename extension .sync
.
The default namespace for SyncMedia is that of SMIL: http://www.w3.org/ns/SMIL
.
SyncMedia custom features use the SyncMedia namespace, which is https://w3.github.io/sync-media-pub
.
This is a placeholder namespace URL; see issue 36
This section defines SyncMedia's elements and attributes, and gives examples.
Each SyncMedia document MUST have a smil
element as its root.
A SyncMedia document contains two parts: a head
and a body
. The head contains metainformation and track information. The temporal presentation of media objects is laid out in the body. Time containers can be used to render media in parallel or to arrange sequences.
A SyncMedia document MUST have a body
. It MAY have a head
.
Element | Description |
---|---|
smil |
Root element |
head |
Information not related to temporal behavior |
body |
Main [=sequential time container=] for the presentation. |
Media objects are arranged in time containers, which determine whether they are rendered together (in parallel) or one after the other (in sequence). Time containers MAY be nested in other time containers (but MUST NOT be nested in media objects).
Element | Description |
---|---|
seq |
A [=sequential time container=] for media and/or time containers. |
par |
A [=parallel time container=] for media and/or time containers. |
<smil xmlns="http://www.w3.org/ns/SMIL"> <body> <par> <audio src="chapter01.mp3" clipBegin="30" clipEnd="40"/> <text src="chapter01.html#heading_01"/> </par> <par> <audio src="chapter01.mp3" clipBegin="40" clipEnd="50"/> <text src="chapter01.html#para_01"/> </par> <par> <audio src="chapter01.mp3" clipBegin="50" clipEnd="60"/> <text src="chapter01.html#para_02"/> </par> </body> </smil>
Structural semantics MAY be added to time containers via the sync:role
attribute. Values MUST come from WAI-ARIA Document Structure or DPUB-ARIA.
There are benefits to applying structural semantics to time containers in SyncMedia. User agents that understand semantic role values MAY customize the user experience, for example by enabling the skipping of types of secondary content that interferes with the flow of narration (such as page number announcements, often included to provide a point of reference between print and digital editions); or escaping complex structures, such as tables or charts.
Attribute | Description |
---|---|
sync:role |
One or more semantic role(s) |
TODO Issue 12
<smil xmlns="http://www.w3.org/ns/SMIL" sync:xmlns="https://w3.github.io/sync-media-pub"> <body> <par> <audio src="chapter01.mp3" clipBegin="50" clipEnd="60"/> <text src="chapter01.html#para_02"/> </par> <par sync:role="doc-pagebreak"> <audio src="chapter01.mp3" clipBegin="60" clipEnd="62"/> <text src="chapter01.html#pg_04"/> </par> <par> <audio src="chapter01.mp3" clipBegin="62" clipEnd="70"/> <text src="chapter01.html#para_03"/> </par> </body> </smil>
Media resources are included in SyncMedia via media objects. The actual media resource is an external file, or quite commonly, a segment of a file, such as an audio or video clip, or part of an HTML document.
The table below describes the media objects in SyncMedia. Ref
can be used to represent any media, but authors often prefer to use media type-specific synonyms.
Element | Description |
---|---|
audio |
References audio media. |
image |
References image media. |
ref |
Generic media reference |
text |
References content in an external text-based document. |
video |
References video media. |
Attributes on media objects are used to
Attribute | Description |
---|---|
clipBegin |
Start of a timed media clip, as in SMIL3's clipBegin |
clipEnd |
End of a timed media clip, as in SMIL3's clipEnd |
panZoom |
Rectangular portion of media object, as in SMIL3's panZoom |
repeatCount |
Specifies the number of iterations of a timed media object. Values are a number, or "indefinite", as in SMIL3's repeatCount |
src |
URL of media file, optionally including a media fragment [[media-frags]] |
sync:track |
ID of a sync:track element. |
EPUB Media Overlays clock values are considered valid clip begin and end values, because the SMIL MediaClipping Module states that if no metric specifier is given, Normal Play Time (npt
) is assumed (not smpte
).
If both an src
with a media fragment and clipBegin
/clipEnd
attributes are present, clipping MUST be applied to the resource with respect to the media fragment offset(s), as defined in All Media Fragment Clients.
It is RECOMMENDED to use a media fragment on src
to refer to a large chunk of media; and to use clipBegin
and clipEnd
for defining fine-grained clips. This is to separate the requirement on the client of retrieving the resource, perhaps done using a URI request to a server, from locating a segment of the resource, done with Media Fragments clip start/end points. Otherwise, if a client is fetching every phrase individually, it would then have to implement complex caching to smooth out playback so as to remove glitching between clips.
Embedded media, such as a video in an HTML document, MAY be referenced by the URL of its embedding document plus a selector.
Therefore, [=media object renderer=]s SHOULD support opening an HTML document and dereferencing content based on a selector.
<par> <text src="doc.html#para1"/> <video src="doc.html#video1" clipBegin="0" clipEnd="10"/> </par>
SyncMedia uses SMIL3's param to send parameters to [=media object renderer=]s.
Element | Description |
---|---|
param |
Media object rendering parameter. |
The attributes for param
are:
Attribute | Description |
---|---|
name |
Parameter name |
value |
Parameter value |
The following parameter name
values are defined:
Name | Allowed value(s) | Description | For media object(s) |
---|---|---|---|
cssClass |
One or more strings | Indicates class name(s) to apply | Media that can be styled with CSS |
clipPath |
As defined by the SVG path data attribute | The shape that will be used to apply a clip mask to the media | Visual media |
pan |
Between -1 (full left) and 1 (full right) | Indicates the left/right pan | Audible media |
playbackRate |
1.0 (normal rate), less, or more | Indicates the playback rate. Values SHOULD align with HTML's {{HTMLMediaElement/playbackRate}}. | Timed media |
volume |
Between 0 and 1 | Indicates the volume | Audible media |
clipPath
specifies a clipping path using an SVG path definition. The clipping is applied to the visible region of the Media Object on which it is defined. When combined with panZoom
, the clipPath
SHOULD be applied inside the rect defined by the panZoom
attribute.
<smil xmlns="http://www.w3.org/ns/SMIL"> <body> <par> <audio src="chapter01.mp3" clipBegin="30" clipEnd="40"/> <text src="chapter01.html#heading_01"> <param name="cssClass" value="highlight"/> </text> </par> <par> <audio src="chapter01.mp3" clipBegin="40" clipEnd="50"/> <text src="chapter01.html#para_01"> <param name="cssClass" value="highlight"/> </text> </par> <par> <audio src="chapter01.mp3" clipBegin="50" clipEnd="60"/> <text src="chapter01.html#para_02"> <param name="cssClass" value="highlight"/> </text> </par> </body> </smil>
SyncMedia presentations organize media objects of the same types into virtual spaces called "tracks". Tracks MUST be placed in the SyncMedia document head
. Tracks have several useful features:
audio
media objects can be automatically assigned to a track).All of these features reduce verbosity as otherwise these properties would have to be explicitly stated on each media object.
Element | Description |
---|---|
sync:track |
A virtual space to which [=Media Objects=] are assigned. A user agent MAY offer interface controls on a per-track basis (e.g. adjust volume on the narration track). A sync:track MAY have [=media parameters=], which act as defaults for [=Media Objects=] on that track. |
Attribute | Description |
---|---|
sync:label |
The track's label |
sync:defaultSrc |
URL of the default file that media objects on this track will use. |
sync:defaultFor |
Media objects of the type specified (one of: audio , image , video , text , ref ) are automatically assigned to this track. |
sync:trackType |
Indicates which presentation feature is embodied by this track. |
TODO: Issue 31
<smil xmlns="http://www.w3.org/ns/SMIL" sync:xmlns="https://w3.github.io/sync-media-pub"> <head> <sync:track sync:label="Page" sync:defaultFor="text" sync:defaultSrc="chapter01.html" sync:trackType="contentDocument"> <param name="cssClass" value="highlight"/> </sync:track> </head> <body> <par> <audio src="chapter01.mp3" clipBegin="30" clipEnd="40"/> <text src="#heading_01"/> </par> <par> <audio src="chapter01.mp3" clipBegin="40" clipEnd="50"/> <text src="#para_01"/> </par> <par> <audio src="chapter01.mp3" clipEnd="50" clipEnd="60"/> <text src="#para_02"/> </par> </body> </smil>
<smil xmlns="http://www.w3.org/ns/SMIL" sync:xmlns="https://w3.github.io/sync-media-pub"> <head> <sync:track id="background-music" sync:trackType="backgroundAudio"> <param name="volume" value="0.5"/> </sync:track> <sync:track sync:label="Narration" sync:defaultFor="audio" sync:trackType="audioNarration"/> <sync:track sync:label="Page" sync:defaultFor="text" sync:trackType="contentDocument"> <param name="cssClass" value="highlight"/> </sync:track> </head> <body> <par> <audio sync:track="background-music" src="bkmusic.mp3" repeat="indefinite"/> <seq> <par> <audio src="chapter01.mp3" clipBegin="30" clipEnd="40"/> <text src="chapter01.html#heading_01"/> </par> <par> <audio src="chapter01.mp3" clipBegin="40" clipEnd="50"/> <text src="chapter01.html#para_01"/> </par> <par> <audio src="chapter01.mp3" clipEnd="50" clipEnd="60"/> <text src="chapter01.html#para_02"/> </par> </seq> </par> </body> </smil>
The reason for including a narration sync:track
, even though it supplies no default values, is because it would enable a user agent to have separate controls for narration audio vs background music audio.
SyncMedia has a generic mechanism for incorporating metadata but does not define any specific metadata. Metadata MUST go in the SyncMedia document head
.
Element | Description |
---|---|
metadata |
Extension point that allows the inclusion of metadata from any metainformation structuring language |
[=Tracks=] MAY provide defaults for [=media objects=]. This section gives the rules for how to apply these values.
Track attribute | Impact on media object |
---|---|
sync:defaultSrc |
Provides the src for the media object. If the media object has an src which is only a selector, then the selector is appended to the track's sync:defaultSrc . Any other value for a media object src overrides the track's sync:defaultSrc . |
In addition, any [=media parameters=] defined for a track are inherited by any media objects on that track. The exception is when the media objects themselves provide a parameter of the same name
, in which case, the media object's parameter value
overrides the track's parameter value
.
After the SyncMedia document has been processed, it is ready to be rendered.
Element | Rendering behavior |
---|---|
body |
Render like seq |
seq |
Render each child in order, each starting after the previous completes. Done when the last child is finished. |
par |
Render each child at the same time. Done when all the children are finished. |
audio |
Play the referenced portion of audio media and apply params . Done when the referenced portion has finished. |
image |
Load the image file or segment and apply params . Not timed, so considered done immediately. |
ref |
Infer the media type and, if supported, render the file or segment, and apply params . If timed, done when the segment is finished; if untimed, done immediately. |
text |
Display the HTML document, ensure the referenced element is visible, and apply params . Not timed, so considered done immediately. |
video |
Play the video file or segment and apply params . Done when the segment is finished. |
Note about media with repeatCount
and when it's considered done
TODO: how much to cover here?
trackType
) that they might not make sense for, e.g. speeding up a presentation but not speeding up the background music.In addition to the attributes already covered, this section adds the following standard XML attributes:
Attribute | Description |
---|---|
xml:base |
Document base URL, as defined in [[XMLBASE]] |
xml:id |
Unique identifier for an element, as defined in [[XML-ID]] |
xml:lang |
Language identifier, as defined in [[XML]] |
This is the XML content model for SyncMedia. Required elements and attributes are indicated.
Element | Attributes | Content |
---|---|---|
`smil` (required) | In this order: | |
`head` |
In any order:
|
|
`metadata` | 0 or more elements from any namespace | |
`sync:track` |
|
|
`param` |
|
Empty |
`body` | In any order: | |
`seq` | In any order: | |
`par` | In any order: | |
`audio` |
|
|
`image` |
|
|
`ref` |
|
|
`text` |
|
|
`video` |
|
At the time of publication, the members of the Synchronized Multimedia for Publications Community Group were:
Avneesh Singh (DAISY Consortium), Ben Dugas (Rakuten, Inc.), Chris Needham (British Broadcasting Corporation), Daniel Weck (DAISY Consortium), Didier Gehrer, Farrah Little (BC Libraries Cooperative), George Kerscher (DAISY Consortium), Ivan Herman (W3C), James Donaldson, Lars Wallin (Colibrio), Livio Mondini, Lynn McCormack (CAST, Inc), Marisa DeMeglio (DAISY Consortium, chair), Markku Hakkinen (Educational Testing Service), Matt Garrish (DAISY Consortium), Michiel Westerbeek (Tella), Nigel Megitt (British Broadcasting Corporation), Romain Deltour (DAISY Consortium), Wendy Reid (Rakuten, Inc.), Zheng Xu (Rakuten, Inc.)