Audio Session

Editor’s Draft,

More details about this document
This version:
https://w3c.github.io/audio-session/
Issue Tracking:
GitHub
Editors:
(Apple)
(Mozilla)

Abstract

This API defines an API surface for controlling how audio is rendered and interacts with other audio playing applications

Status of this document

This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

Feedback and comments on this specification are welcome. GitHub Issues are preferred for discussion on this specification. Alternatively, you can send comments to the Media Working Group’s mailing-list, public-media-wg@w3.org (archives). This draft highlights some of the pending issues that are still to be discussed in the working group. No decision has been taken on the outcome of these issues including whether they are valid.

This document was published by the Media Working Group as an Editor’s Draft. This document is intended to become a W3C Recommendation.

Publication as an Editor’s Draft does not imply endorsement by W3C and its Members.

This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 12 June 2023 W3C Process Document.

1. Introduction

People consume a lot of media (audio/video) and the Web is one of the primary means of consuming this type of content. However, media on the web does not integrate well with the platform. The Audio Session API helps to close the gap with platforms that have audio session/audio focus such as Android and iOS. This API will help by improving the audio-mixing of websites with native apps, so they can play on top of each other, or play exclusively.

Additionally, on some platforms the user agent will automatically manage the audio session for the site based on whether media elements are playing or not and which APIs are used for playing audio. In some cases this may not match user expectations, this API provides overrides to authors.

2. The AudioSession interface

By convention, there are several audio session types for different purposes:

The AudioSession is the main interface for this API. It can have the following states:

The page has a default audio session which is used by the user agent to automatically set up the audio session parameters. The UA will request and abandon audio focus when media elements start/finish playing on the page. This default audio session is represented as an AudioSession object that is exposed as navigator.audioSession.

enum AudioSessionState {
  "inactive",
  "active",
  "interrupted"
};

enum AudioSessionType {
  "auto",
  "playback",
  "transient",
  "transient-solo",
  "ambient",
  "play-and-record"
};

[Exposed=Window]
partial interface Navigator {
  // The default audio session that the user agent will use when media elements start/stop playing.
  readonly attribute AudioSession audioSession;
};

[Exposed=Window]
interface AudioSession : EventTarget {
  attribute AudioSessionType type;

  readonly attribute AudioSessionState state;
  attribute EventHandler onstatechange;
};

3. Privacy considerations

4. Security considerations

5. Examples

5.1. A site sets its audio session type proactively to "play-and-record"

navigator.audioSession.type = 'play-and-record';
// From now on, volume might be set based on 'play-and-record'.
...
// Start playing remote media
remoteVideo.srcObject = remoteMediaStream;
remoteVideo.play();
// Start capturing
navigator.mediaDevices.getUserMedia({ audio:true, video:true }).then(stream => {
    localVideo.srcObject = stream;
});

5.2. A site reacts upon interruption

navigator.audioSession.type = 'play-and-record';
// From now on, volume might be set based on 'play-and-record'.
...
// Start playing remote media
remoteVideo.srcObject = remoteMediaStream;
remoteVideo.play();
// Start capturing
navigator.mediaDevices.getUserMedia({ audio:true, video:true }).then(stream => {
    localVideo.srcObject = stream;
});
let isInterrupted = false;
navigator.audioSession.onstatechange = () => {
    if (navigator.audioSession.state === 'interrupted') {
        isInterrupted = true;
        localVideo.pause();
        remoteVideo.pause();
        // Make it clear to the user that the call is interrupted.
        showInterruptedBanner();
        localVideo.srcObject.getTracks().forEach(track => track.enabled = false);
        return;
    }
    if (isInterrupted) {
        isInterrupted = false;
        // Let user decide when to restart the call.
        showOptionalRestartBanner().then((result) => {
            if (!result)
                return;
            localVideo.srcObject.getTracks().forEach(track => track.enabled = true);
            localVideo.play();
            remoteVideo.play();
        });
    }
}

6. Acknowledgements

TODO

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Conformant Algorithms

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps can be implemented in any manner, so long as the end result is equivalent. In particular, the algorithms defined in this specification are intended to be easy to understand and are not intended to be performant. Implementers are encouraged to optimize.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[DOM]
Anne van Kesteren. DOM Standard. Living Standard. URL: https://dom.spec.whatwg.org/
[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119
[WEBIDL]
Edgar Chen; Timothy Gu. Web IDL Standard. Living Standard. URL: https://webidl.spec.whatwg.org/

IDL Index

enum AudioSessionState {
  "inactive",
  "active",
  "interrupted"
};

enum AudioSessionType {
  "auto",
  "playback",
  "transient",
  "transient-solo",
  "ambient",
  "play-and-record"
};

[Exposed=Window]
partial interface Navigator {
  // The default audio session that the user agent will use when media elements start/stop playing.
  readonly attribute AudioSession audioSession;
};

[Exposed=Window]
interface AudioSession : EventTarget {
  attribute AudioSessionType type;

  readonly attribute AudioSessionState state;
  attribute EventHandler onstatechange;
};