To ensure the user experience of streamed media remains resilient in the face of changing network conditions, media providers make use of adaptive streaming, where content gets split into chunks at different quality levels, and where the client requests chunks at the appropriate quality level based on networking conditions during playback. Several technologies are available to satisfy this approach (such as MPEG DASH) but Web browsers may not offer native support for them. The Media Source Extensions (MSE) specification enables Web application developers to create libraries that will consume these different adaptive streaming formats and protocols.
Users often want to share a pointer to a specific position within the timeline of a video (or an audio) feed with friends on social networks, and expect media players to jump to the requested position right away. The Media Fragments URI specification defines a syntax for constructing URIs that reference fragments of the media content, and explains how Web browsers can use this information to render the media fragment, allowing streaming of only the relevant part of the media content to the client device.
Specifications in progress
Capabilities and quality
To improve the user experience and take advantage of advanced device capabilities when they are available, media providers need to know the decoding (and encoding) capabilities of the user's device. Can the device decode a particular codec at a given resolution, bitrate and framerate? Will the playback be smooth and power efficient? Can the display render high dynamic range (HDR) and wide color gamut content? The Media Capabilities specification defines an API to expose that information, with a view to replacing the more basic and vague
isTypeSupported() functions defined in HTML and MSE.
On top of getting answers to "Can" questions, media providers also want answers to "Should" questions, such as "Should I send HDR content if I have both HDR and SDR variants?". The Media Capabilities specification does not answer these questions, which rather fall within the realm of CSS extensions describing the capability of the display. Notably, CSS Media Queries Level 4 includes means to detect wide-gamut displays and adapt the rendering of the application to them. CSS Media Queries Level 5, to be published after Level 4, will introduce a
video- prefix to some features to detect differences between the video plane and the graphics plane on devices that render video separately from the rest of an HTML document, and that e.g. may support High Dynamic Range (HDR) for video and only Standard Dynamic Range (SDR) for other types of content.
Media providers also need some mechanism to assess the user's perceived playback quality in order to alter the quality of content transmitted using adaptive streaming. The Media Playback Quality specification, initially part of MSE, exposes metrics on the number of frames that were displayed or dropped.
A media stream may be composed of different programs that use different encodings, or may need to embed ads that have different characteristics. Applications cannot easily and smoothly change from one media stream encoding to another during playback using the first version of Media Source Extensions (MSE), as it does not support this use case. Content providers need to work around this limitation, e.g. transcoding and re-packaging the ad content to be consistent with the program content prior to distribution where possible. Another workaround is to use multiple video elements and script to show or hide them at the appropriate times. The Media Working Group has started work on a codec switching feature for MSE to address this issue.
When broadcasting popular live events on an IP network, content distribution should introduce as little latency as possible. WebRTC technologies may be used to reach sub-second latency. Additionally, these technologies may be used to create a peer-to-peer content delivery network and spread the load of video distribution among viewers. Extensions to the base WebRTC specification allow Web applications to further refine media parameters:
- Content Hints lets Web applications advertise the type of media content that is being consumed (e.g. speech or music, movie or screencast) so that user agents may optimize encoding or processing parameters;
- Scalable Video Coding (SVC) allows Web applications to leverage SVC (whereby subset video streams can be derived from the larger video stream by dropping packets to reduce bandwidth consumption), making providing video at different qualities to multiple destinations with the same initial video stream easier.
As media experiences become more and more interactive, the question of how to bundle interactive content along with media content arises again. The media industry has mainly adopted a bundling approach for media content whereby all tracks and material that compose the media content get embedded in some media container that can then be distributed as a media package. The digital publishing industry follows a different approach based on the EPUB format, a suite of technologies for representing, packaging and encoding Web content (including multimedia) for distribution in a single-file container. The Publishing Working Group also develops the Publication Manifest, which defines a manifest format that includes metadata about the digital publication, the list of resources that belong to the digital publication and a default reading order (how it connects resources into a single contiguous work). The Audiobooks specification notably describes a profile of the Publication Manifest to create audio books. The Web community at large takes another approach, based on the Web App Manifest which lets developers specify application metadata into a single JSON file, and the Web Packaging effort which explores technical solutions to bundle together the resources that make up the application.
The WebTransport proposal allows data to be sent and received between a browser and server, implementing pluggable protocols underneath with common APIs on top, notably based on QUIC. The API is similar to WebSocket in that it exposes bidirectional connections between a client and a server, but allows to further reduce the latency of network communications between a client and a server, and also supports multiple streams, unidirectional streams, out-of-order delivery, and unreliable transport. Usage scenarios include low-latency streaming of media chunks from a server to a client, and cloud based scenarios (e.g. cloud gaming) where most of the application logic runs on the server.
An early proposal to add a
HTMLMediaElement.renderingBufferHint attribute is under discussion to allow web applications to disable the internal rendering buffer that user agents use to create a cushion against playback stalls, which may be needed in applications where the overall latency is critical.
Some devices (e.g. TV sets) provide access to non-IP broadcast media; the Tuner API in the TV Control specification brings these non-IP media streams to Web browsers. The API also explores ways to expose and launch broadcast-related applications that get transmitted within the transport stream sometimes. The scope of the API may change in the future though. The work on the API itself has been discontinued while scope, use cases and requirements get refined in the Media and Entertainment Interest Group.
Different media containers, used to transport media over the network, embed different kinds of in-band tracks, which must be mapped onto video, audio and text tracks in HTML5 so that Web applications can access them interoperably across Web browsers. The Sourcing In-band Media Resource Tracks from Media Containers into HTML specification provides such mapping guidelines.
Device such as HDMI dongles and lightweight set-top boxes may not have the power needed to run a Web browser locally. Updating the firmware of these devices may also be difficult. One possible solution is to bring the processing and rendering of the browser to the cloud and to stream the result to the device. The Cloud Browser Architecture specification describes the concepts and architecture for such a Cloud Browser to provide building blocks for potential further standardization needs.
|Feature||Specification / Group||Implementation intents|
WebTransport Working Group
|Broadcast||TV Control API|
TV Control Working Group
|Mappings||Sourcing In-band Media Resource Tracks from Media Containers into HTML|
Media Resource In-band Tracks Community Group
|Cloud browsing||Cloud Browser Architecture|
Media and Entertainment Interest Group
Features not covered by ongoing work
- Streaming HTTP media on HTTPs pages
- While it is possible to read audio/video content served on HTTP in
<video>elements served on HTTPs pages (despite the usual mixed content restrictions), this does not extend to the usage of streaming enabled by Media Stream Extensions. There have been early discussions on how this could be solved. The consensus was not to include possible solutions in the current version of the specification.
- Multicast distribution
- In typical unicast distribution schemes, the cost of the infrastructure increases with the number of viewers, and becomes impractical to stream large live events such as finals of world championships that attract millions of viewers at once. The ability to distribute content using multicast would address network scaling issues, allowing to push media stream resources to many clients at once. The Media & Entertainment Interest Group discussed possible extensions to
fetchthat could enable server push and multicast distribution.
- Bundling and playback of interactive media content
- The current response to packaging of web applications, based on Web App Manifest for metadata, Service Workers for offline execution, and possibly Web Packaging in the future, may not integrate well with media production pipelines that start from a more media-centric perspective. For instance, the Carriage of Web Resource in ISOBMFF proposal, developed at MPEG, takes the opposite approach and defines ways to embed interactive content within a media container file. More work may be needed to investigate a possible common solution.