This document collects use cases and requirements for improved support for building web applications that allow end-users to manipulate professional media assets, including audio-visual masters for television and motion pictures, and perform media production steps such as quality checking, versioning, or timed text authoring.
Professional media assets, including audio-visual masters for television and motion pictures, are increasingly being stored in the cloud.
There is a corresponding growing interest in building web applications that allow end-users to manipulate these assets, e.g., quality checking, versioning, timed text authoring, etc. While the web platform has evolved to support consumer media applications, professional applications require additional capabilities, including precise timing, wider color gamut and high-dynamic range, high-fidelity timed text, etc.
This document analyses gaps in web platform technologies for media production through use cases and requirements.
Concrete use cases welcome through GitHub issues or pull requests.
This list of gaps is to be driven by use cases and will be re-evaluated as the list of use cases in is completed.
Web applications measure time values with respect to a monotonic clock [[HR-TIME]]. The [[[HTML]]] [[HTML]] does not expose any precise mechanism to assess the time value, with respect to that clock, at which a particular media frame is going to be rendered. A web application may only infer this information by looking at the media element's {{HTMLMediaElement/currentTime}} property to infer the frame being rendered and the time at which the user will see the next frame. This has several limitations:
In addition, {{HTMLMediaElement/currentTime}} only accepts a time value, and neither has the ability, nor is there an alternative API, to set its value to a frame number or SMPTE time code [[SMPTE12-1]].
The media element does not provide a mechanism to seek by individual frames. This can be worked around by using the media's frame rate and the media element's {{HTMLMediaElement/currentTime}} property to seek by a frame duration from the reported {{HTMLMediaElement/currentTime}} value, but the {{HTMLMediaElement/currentTime}} property does not guarantee frame level precision. In addition, the frame rate of the media may vary over time, may be rounded internally by the browser, and is not exposed.
When appending segments using [[[MSE]]] [[MSE]], the
timestampOffset
property does not provide enough precision
to identify frame boundaries. This suffers the same limitation as
{{HTMLMediaElement/currentTime}}, where the value is represented as a
double, which does not allow it to identify individual frames due to
rounding errors.
For playback of remote content (e.g., via the [[[REMOTE-PLAYBACK]]] [[REMOTE-PLAYBACK]] or [[[PICTURE-IN-PICTURE]]] [[PICTURE-IN-PICTURE]]), whether the content will be able to be synchronized remains an open question. This has been reported by the Second Screen Community Group.