Well-deployed technologies

Audio/Video rendering

Few media-based services can usefully function without rendering audio or video content; the HTML5 specification provides widely deployed support for this essential feature. Video content can be rendered in any Web page via the <video> element.

Likewise, audio content can be rendered in any Web page via the <audio> element.

Beyond the declarative approach enabled by the <audio> element, the Web Audio API provides a full-fledged audio processing API, which includes support for low-latency playback of audio content.

Rendering of captions

WebVTT is a file format for captions and subtitles. The specification is still at the Candidate Recommendation phase, but the format is already supported at various levels among browsers, allowing to render text tracks through a <video> element. The Timed Text Markup Language (TTML) specification provides a richer language for describing timed text. Version 2 extends the former standard with advanced features for animations, styling, embedded content and metadata. It is used both as an interchange format among authoring systems and for delivery of subtitles and captions worldwide, in particular through profiles such as the IMSC1.1 (Internet Media Subtitles and Captions) profile. Usual browsers do not support IMSC1 natively, but Web applications can still take advantage of IMSC1 through libraries such as the imscJS polyfill library, which is a complete implementation of the IMSC1 profile in JavaScript and renders IMSC1 documents to HTML5.

Rendering of protected media

For the distribution of media whose content needs specific protection from copy, Encrypted Media Extensions (EME) enables Web applications to render encrypted media streams based on Content Decryption Modules (CDM).

Rendering of media fragments

Users often want to share a pointer to a specific position within the timeline of a video (or an audio) feed with friends on social networks, and expect media players to jump to the requested position right away. The Media Fragments URI specification defines a syntax for constructing media fragment URIs and explains how Web browsers can use this information to render the media fragment.

FeatureSpecification / GroupMaturityCurrent implementations
Select browsers…
Audio/Video renderingvideo element in HTML Standard
WHATWG
Living Standard
audio element in HTML Standard
WHATWG
Living Standard
Web Audio API
Audio Working Group
Candidate Recommendation
Rendering of captionsWebVTT: The Web Video Text Tracks Format
Timed Text Working Group
Candidate Recommendation
Timed Text Markup Language 2 (TTML2) (2nd Edition)
Timed Text Working Group
Candidate Recommendation
TTML Profiles for Internet Media Subtitles and Captions 1.1
Timed Text Working Group
Recommendation
Rendering of protected mediaEncrypted Media Extensions
Media Working Group
Recommendation
Rendering of media fragmentsMedia Fragments URI 1.0 (basic)
Media Fragments Working Group
Recommendation

Specifications in progress

Rendering in different color spaces

Wide-gamut displays are becoming more and more common. CSS Colors Level 4 adds color spaces beyond the classical sRGB so that style sheets can leverage new colors available. Similarly, the CSS Media Queries Level 4 specification includes means to detect wide-gamut displays and adapt the rendering of the application to these improved color spaces.

The CSS Media Queries Level 5, to be published after Level 4, will introduce a video- prefix to some features to detect differences between the video plane and the graphics plane on devices that render video separately from the rest of an HTML document, and that e.g. may support High Dynamic Range (HDR) for video and only Standard Dynamic Range (SDR) for other types of content.

Rendering of protected media

Four extensions to Encrypted Media Extensions (EME) are under development in the Media Working Group: Persistent Usage Record sessions, HDCP Detection, Encryption scheme capability detection and an API to find existing sessions.

Rendering of captions

The Audio Description profile is a profile of TTML2 incorporating audio features, intended to support audio description workflows worldwide, including description creation, script delivery and exchange and generated audio description distribution.

Distributed rendering

As users increasingly own more and more connected devices, the need to get these devices to work together increases as well:

  • The Presentation API offers the possibility for a Web page to open and control a page located on another screen, opening the road for multi-screen Web applications.
  • The Remote Playback API focuses more specifically on controling the rendering of media on a separate device.
  • The Open Screen Protocol is a suite of network protocols that allow user agents to implement the Presentation API and the Remote Playback API in an interoperable fashion for browsers and presentation displays connected via the same local area network.
  • The Picture-in-Picture specification allows applications to initiate and control the rendering of a video in a separate miniature window that is viewable above all other activities.
  • The Audio Output Devices API offers similar functionality for audio streams, enabling a Web application to pick on which audio output devices a given sound should be played on.

Rendering in VR/AR headsets

The WebXR Device API specification is a low-level API that allows applications to access and control head-mounted displays (HMD) using JavaScript and create compelling Virtual Reality (VR) / Augmented Reality (AR) experiences. It is a critical enabler to render 360° video content in Virtual Reality headsets.

FeatureSpecification / GroupMaturityCurrent implementations
Select browsers…
Rendering in different color spacesprofiled device-dependent colors in CSS Color Module Level 4
CSS Working Group
Working Draft
color-gamut media query in Media Queries Level 4
CSS Working Group
Candidate Recommendation
Media Queries Level 5
CSS Working Group
Working Draft
Rendering of protected mediaPersistent usage record session in Encrypted Media Extensions
Media Working Group
Editor's Draft
HDCP detection in Encrypted Media Extensions
Media Working Group
Editor's Draft
Encryption scheme detection in Encrypted Media Extensions
Media Working Group
Editor's Draft
API to find existing sessions in Encrypted Media Extensions
Media Working Group
Editor's Draft
Rendering of captionsProposal for an Exchange Format to support Audio Description
Timed Text Working Group
Editor's Draft
Distributed renderingPresentation API
Second Screen Working Group
Candidate Recommendation
Remote Playback API
Second Screen Working Group
Candidate Recommendation
Open Screen Protocol
Second Screen Working Group
Editor's Draft
Picture-in-Picture
Media Working Group
Working Draft
Audio Output Devices API
WebRTC Working Group
Candidate Recommendation
Rendering in VR/AR headsetsWebXR Device API
Immersive Web Working Group
Working Draft

Exploratory work

Audio rendering

The Audio Device Client specification proposes to define an intermediate layer between the Web Audio API and actual audio devices used by the browser. It exposes various low-level properties that have been completely hidden or unavailable to developers, including I/O device selection, multi-channel I/O support, and configurable sample rates. The Audio Working Group may adopt the specification as one of its next deliverables.

Rendering in different color spaces

To adapt to wide-gamut displays, all the graphical systems of the Web will need to adapt to these broader color spaces. This includes the need to make HTML canvas color-managed.

More generally, the High Dynamic Range and Wide Gamut Color on the Web note, developed by the Color on the Web Community Group, analyzes gaps and candidate next steps for enabling support for High Dynamic Range (HDR) and Wide Color Gamut (WCG) on the Web, such as mechanisms to allow color and luminance matching between HDR video content and surrounding or overlaid graphic and textual content in Web pages.

Rendering of captions

Providing an alternative transcript to media content is a well-known best practice; a transcript extension to HTML has been proposed to make an explicit link between media content and their transcript and thus facilitate discovery and consumption.

Some advanced closed captioning scenarios cannot be expressed using WebVTT. In such cases, Web applications need to render cues on their own using JavaScript. By definition, this means that the resulting captions cannot benefit from integration with the underlying platform, e.g. to apply user style sheets or take part in Picture-in-Picture scenarios. Early discussions on TextTrackCue enhancements could pave the way for a generic solution in that field.

Rendering of captions in VR/AR scenarios poses unique challenges on the rendering side (where to position these captions in 3D, whether to follow users head movements, how to indicate the source of a caption in a 360° video when the user is currently looking elsewhere) as well as on the distribution side (how to encode these captions interoperably in timed text files). The Immersive Web Community Group is exploring Use case for subtitles in 360° videos and requirements for subtitles and text in WebXR. The Immersive Captions Community Group is investigating best practices for access, activation, and display settings for captions with different types of Immersive Media (AR, VR, Games).

Beyond traditional closed captions, video sharing platforms may implement a feature known as Bullet Chatting or Danmaku whereby comments, which may be generated by users in real-time, and annotations get overlaid and animated on top of videos at specific points of the media timeline. The Bullet Chatting proposal explores possible interoperability requirements and technical gaps in that space.

Distributed rendering

The Multi-Device Timing Community Group is exploring another aspect of multi-device media rendering: its Timing Object specification enables to keep video, audio and other data streams in close synchrony, across devices and independently of the network topology. This effort needs support from interested parties to progress.

Features not covered by ongoing work

Native support for 360° video rendering
While it is already possible to render 360° videos within a <video> element, integrated support for the rendering of 360° videos would allow to hide the complexity of the underlying adaptive streaming logic to applications, letting Web browsers optimize streaming and rendering on their own.
Further extensions to Encrypted Media Extensions (EME)
Further extensions to the Encrypted Media Extensions specification have been proposed, including defining a virtual environment in which CDMs can run to improve CDM portability across operating systems, mappings between EME and underlying DRM-specific security levels, and protection of media content when played in a VR headset. Development of these features for EME is out of scope for the Media Working Group. Per charter, this group can only work on Persistent Usage Record sessions, HDCP Detection, Encryption scheme capability detection and an API to find existing sessions.

Discontinued features

Network service discovery
The Network Service Discovery API was to offer a lower-level approach to the establishment of multi-device operations, by providing integration with local network-based media renderers, such as those enabled by DLNA, UPnP, etc. This effort was discontinued out of privacy concerns and lack of interest from implementers. The current approach is to let the user agent handle network discovery under the hoods, as done in the Presentation API and Remote Playback API.
WebVR
Development of the WebVR specification that allowed access and control of Virtual Reality (VR) devices, and which is supported in some browsers, has halted in favor of the WebXR Device API, which extends the scope of the work to Augmented Reality (AR) devices.