Requirements for Media Production

This document collects use cases and requirements for improved support for building web applications that allow end-users to manipulate professional media assets, including audio-visual masters for television and motion pictures, and perform media production steps such as quality checking, versioning, or timed text authoring.

Introduction

Professional media assets, including audio-visual masters for television and motion pictures, are increasingly being stored in the cloud.

There is a corresponding growing interest in building web applications that allow end-users to manipulate these assets, e.g., quality checking, versioning, timed text authoring, etc. While the web platform has evolved to support consumer media applications, professional applications require additional capabilities, including precise timing, wider color gamut and high-dynamic range, high-fidelity timed text, etc.

This document analyses gaps in web platform technologies for media production through use cases and requirements.

Functional gaps

This list of gaps is to be driven by use cases and will be re-evaluated as the list of use cases in is completed.

Frame identification

Web applications measure time values with respect to a monotonic clock [[HR-TIME]]. The [[[HTML]]] [[HTML]] does not expose any precise mechanism to assess the time value, with respect to that clock, at which a particular media frame is going to be rendered. A web application may only infer this information by looking at the media element's {{HTMLMediaElement/currentTime}} property to infer the frame being rendered and the time at which the user will see the next frame. This has several limitations:

{{HTMLMediaElement/currentTime}} is represented as a double value, which does not allow it to identify individual frames due to rounding errors. This is a known issue.
{{HTMLMediaElement/currentTime}} is updated at a user-agent defined rate (typically the rate at which the time marches on algorithm runs), and is kept stable while scripts are running. When a web application reads {{HTMLMediaElement/currentTime}}, it cannot tell when this property was last updated, and thus cannot reliably assess whether this property still represents the frame currently being rendered.

In addition, {{HTMLMediaElement/currentTime}} only accepts a time value, and neither has the ability, nor is there an alternative API, to set its value to a frame number or SMPTE time code [[SMPTE12-1]].

Seeking to next/previous frame

The media element does not provide a mechanism to seek by individual frames. This can be worked around by using the media's frame rate and the media element's {{HTMLMediaElement/currentTime}} property to seek by a frame duration from the reported {{HTMLMediaElement/currentTime}} value, but the {{HTMLMediaElement/currentTime}} property does not guarantee frame level precision. In addition, the frame rate of the media may vary over time, may be rounded internally by the browser, and is not exposed.

Indeterminate frame boundaries for segments in Media Source Extensions

When appending segments using [[[MSE]]] [[MSE]], the timestampOffset property does not provide enough precision to identify frame boundaries. This suffers the same limitation as {{HTMLMediaElement/currentTime}}, where the value is represented as a double, which does not allow it to identify individual frames due to rounding errors.

Synchronization with remote playback