Picture-in-Picture

Editor’s Draft,

More details about this document
This version:
https://w3c.github.io/picture-in-picture/
Latest published version:
https://www.w3.org/TR/picture-in-picture/
Feedback:
GitHub
Editor:
(Google LLC)
Former Editor:
(Google LLC)
Web Platform Tests:
permissions-policy/
picture-in-picture/

Abstract

This specification provides APIs to allow websites to create a floating video window always on top of other windows so that users may continue consuming media while they interact with other content sites, or applications on their device.

Status of this document

This section describes the status of this document at the time of its publication. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.

Feedback and comments on this specification are welcome. GitHub Issues are preferred for discussion on this specification. Alternatively, you can send comments to the Media Working Group’s mailing-list, public-media-wg@w3.org (archives). This draft highlights some of the pending issues that are still to be discussed in the working group. No decision has been taken on the outcome of these issues including whether they are valid.

This document was published by the Media Working Group as an Editor’s Draft. This document is intended to become a W3C Recommendation.

Publication as an Editor’s Draft does not imply endorsement by W3C and its Members.

This document was produced by a group operating under the W3C Patent Policy. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

This document is governed by the 03 November 2023 W3C Process Document.

1. Introduction

This section is non-normative.

Many users want to continue consuming media while they interact with other content, sites, or applications on their device. A common UI affordance for this type of activity is Picture-in-Picture (PiP), where the video is contained in a separate miniature window that is always on top of other windows. This window stays visible even when the user agent is not visible. Picture-in-Picture is a common platform-level feature among desktop and mobile OSs.

This specification extends HTMLVideoElement allowing websites to initiate and control this behavior by exposing the following sets of properties:

2. Examples

2.1. Add a custom Picture-in-Picture button

<video id="video" src="https://example.com/file.mp4"></video>

<button id="togglePipButton"></button>

<script>
  const video = document.getElementById("video");
  const togglePipButton = document.getElementById("togglePipButton");

  // Hide button if Picture-in-Picture is not supported or disabled.
  togglePipButton.hidden =
    !document.pictureInPictureEnabled || video.disablePictureInPicture;

  togglePipButton.addEventListener("click", async () => {
    // If there is no element in Picture-in-Picture yet, let's request
    // Picture-in-Picture for the video, otherwise leave it.
    try {
      if (document.pictureInPictureElement) {
        await document.exitPictureInPicture();
      } else {
        await video.requestPictureInPicture();
      }
    } catch (err) {
      // Video failed to enter/leave Picture-in-Picture mode.
    }
  });
</script>

2.2. Monitor video Picture-in-Picture changes

<video id="video" src="https://example.com/file.mp4"></video>

<script>
  const video = document.getElementById("video");

  video.addEventListener("enterpictureinpicture", (event) => {
    // Video entered Picture-in-Picture mode.
    const pipWindow = event.pictureInPictureWindow;
    console.log(`Picture-in-Picture window width: ${pipWindow.width}`);
    console.log(`Picture-in-Picture window height: ${pipWindow.height}`);
  });

  video.addEventListener("leavepictureinpicture", () => {
    // Video left Picture-in-Picture mode.
  });
</script>

2.3. Update video size based on Picture-in-Picture window size changes

<video id="video" src="https://example.com/file.mp4"></video>

<button id="pipButton"></button>

<script>
  const video = document.getElementById("video");
  const pipButton = document.getElementById("pipButton");

  pipButton.addEventListener("click", async () => {
    try {
      await video.requestPictureInPicture();
    } catch (error) {
      // Video failed to enter Picture-in-Picture mode.
    }
  });

  video.addEventListener("enterpictureinpicture", (event) => {
    // Video entered Picture-in-Picture mode.
    const pipWindow = event.pictureInPictureWindow;
    updateVideoSize(pipWindow.width, pipWindow.height);
    pipWindow.addEventListener("resize", onPipWindowResize);
  });

  video.addEventListener("leavepictureinpicture", (event) => {
    // Video left Picture-in-Picture mode.
    const pipWindow = event.pictureInPictureWindow;
    pipWindow.removeEventListener("resize", onPipWindowResize);
  });

  function onPipWindowResize(event) {
    // Picture-in-Picture window has been resized.
    const { width, height } = event.target;
    updateVideoSize(width, height);
  }

  function updateVideoSize(width, height) {
    // TODO: Update video size based on pip window width and height.
  }
</script>

3. Concepts

3.1. Internal Slot Definitions

A user agent has:

  1. An initiators of active Picture-in-Picture sessions list of zero or more origins, which is initially empty.

Note: In case a user agent supports multiple Picture-in-Picture windows, the list allows duplicates.

An origin is said to have an active Picture-in-Picture session if any of the origins in initiators of active Picture-in-Picture sessions are same origin-domain with origin.

3.2. Request Picture-in-Picture

When the request Picture-in-Picture algorithm with video is invoked, the user agent MUST run the following steps:

  1. If Picture-in-Picture support is false, throw a NotSupportedError and abort these steps.

  2. If the document is not allowed to use the policy-controlled feature named "picture-in-picture", throw a SecurityError and abort these steps.

  3. If video’s readyState attribute is HAVE_NOTHING, throw a InvalidStateError and abort these steps.

  4. If video has no video track, throw a InvalidStateError and abort these steps.

  5. If video’s disablePictureInPicture is true, the user agent MAY throw an InvalidStateError and abort these steps.

  6. If pictureInPictureElement is null and the relevant global object of this does not have transient activation, throw a NotAllowedError and abort these steps.

  7. If video is pictureInPictureElement, abort these steps.

  8. Set pictureInPictureElement to video.

  9. Let Picture-in-Picture window be a new instance of PictureInPictureWindow associated with pictureInPictureElement.

  10. Append relevant settings object’s origin to initiators of active Picture-in-Picture sessions.

  11. Queue a task to fire an event named enterpictureinpicture using PictureInPictureEvent at the video with its bubbles attribute initialized to true and its pictureInPictureWindow attribute initialized to Picture-in-Picture window.

  12. If pictureInPictureElement is fullscreenElement, it is RECOMMENDED to exit fullscreen.

It is RECOMMENDED that video frames are not rendered in the page and in the Picture-in-Picture window at the same time but if they are, they MUST be kept in sync.

When a video is played in Picture-in-Picture mode, the states SHOULD transition as if it was played inline. That means that the events SHOULD fire at the same time, calling methods SHOULD have the same behaviour, etc. However, the user agent MAY transition out of Picture-in-Picture when the video element enters a state that is considered not compatible with Picture-in-Picture.

Styles applied to video (such as opacity, visibility, transform, etc.) MUST NOT apply in the Picture-in-Picture window. Its aspect ratio is based on the video size.

It is also RECOMMENDED that the Picture-in-Picture window has a maximum and minimum size. For example, it could be restricted to be between a quarter and a half of one dimension of the screen.

3.3. Exit Picture-in-Picture

When the exit Picture-in-Picture algorithm is invoked, the user agent MUST run the following steps:

  1. If pictureInPictureElement is null, throw a InvalidStateError and abort these steps.

  2. Run the close window algorithm with the Picture-in-Picture window associated with pictureInPictureElement.

  3. Queue a task to fire an event named leavepictureinpicture using PictureInPictureEvent at the video with its bubbles attribute initialized to true and its pictureInPictureWindow attribute initialized to Picture-in-Picture window associated with pictureInPictureElement.

  4. Unset pictureInPictureElement.

  5. Remove one item matching relevant settings object’s origin from initiators of active Picture-in-Picture sessions.

It is NOT RECOMMENDED that the video playback state changes when the exit Picture-in-Picture algorithm is invoked. The website SHOULD be in control of the experience if it is website initiated. However, the user agent MAY expose Picture-in-Picture window controls that change video playback state (e.g., pause).

As one of the unloading document cleanup steps, run the exit Picture-in-Picture algorithm.

3.4. Disable Picture-in-Picture

Some pages may want to disable Picture-in-Picture mode for a video element; for example, they may want to prevent the user agent from suggesting a Picture-in-Picture context menu in some cases. To support these use cases, a new disablePictureInPicture attribute is added to the list of content attributes for video elements.

The disablePictureInPicture IDL attribute MUST reflect the content attribute of the same name.

If the disablePictureInPicture attribute is present on the video element, the user agent MAY prevent the video element from playing in Picture-in-Picture mode or present any UI to do so.

When the disablePictureInPicture attribute is added to a video element, the user agent MAY run these steps:

  1. Reject any pending promises returned by the requestPictureInPicture() method with InvalidStateError.

  2. If video is pictureInPictureElement, run the exit Picture-in-Picture algorithm.

3.5. Interaction with Fullscreen

It is RECOMMENDED to run the exit Picture-in-Picture algorithm when the pictureInPictureElement fullscreen flag is set.

3.6. Interaction with Remote Playback

The [Remote-Playback] specification defines a local playback device and a local playback state. For the purpose of Picture-in-Picture, the playback is local and regardless of whether it is played in page or in Picture-in-Picture.

3.7. Interaction with Media Session

The API will have to be used with the [MediaSession] API for customizing the available controls on the Picture-in-Picture window.

3.8. Interaction with Page Visibility

When pictureInPictureElement is set, the Picture-in-Picture window MUST be visible, even when the Document is not in focus or hidden. The user agent SHOULD provide a way for users to manually close the Picture-in-Picture window.

The Picture-in-Picture window visibility MUST NOT be taken into account by the user agent to determine if the system visibility state of a traversable navigable has changed.

3.9. One Picture-in-Picture window

Operating systems with a Picture-in-Picture API usually restrict Picture-in-Picture mode to only one window. Whether only one window is allowed in Picture-in-Picture mode will be left to the implementation and the platform. However, because of the one Picture-in-Picture window limitation, the specification assumes that a given Document can only have one Picture-in-Picture window.

What happens when there is a Picture-in-Picture request while a window is already in Picture-in-Picture will be left as an implementation detail: the current Picture-in-Picture window could be closed, the Picture-in-Picture request could be rejected or even two Picture-in-Picture windows could be created. Regardless, the User Agent will have to fire the appropriate events in order to notify the website of the Picture-in-Picture status changes.

4. API

4.1. Extensions to HTMLVideoElement

partial interface HTMLVideoElement {
  [NewObject] Promise<PictureInPictureWindow> requestPictureInPicture();

  attribute EventHandler onenterpictureinpicture;
  attribute EventHandler onleavepictureinpicture;

  [CEReactions] attribute boolean disablePictureInPicture;
};

The requestPictureInPicture() method, when invoked, MUST return a new promise promise and run the following steps in parallel:

  1. Let video be the video element on which the method was invoked.

  2. Run the request Picture-in-Picture algorithm with video.

  3. If the previous step threw an exception, reject promise with that exception and abort these steps.

  4. Resolve promise with the Picture-in-Picture window associated with pictureInPictureElement.

4.2. Extensions to Document

partial interface Document {
  readonly attribute boolean pictureInPictureEnabled;

  [NewObject] Promise<undefined> exitPictureInPicture();
};

The pictureInPictureEnabled attribute’s getter must return true if Picture-in-Picture support is true and this is allowed to use the feature indicated by attribute name picture-in-picture, and false otherwise.

Picture-in-Picture support is false if there’s a user preference that disables it or a platform limitation. It is true otherwise.

The exitPictureInPicture() method, when invoked, MUST return a new promise promise and run the following steps in parallel:

  1. Run the exit Picture-in-Picture algorithm.

  2. If the previous step threw an exception, reject promise with that exception and abort these steps.

  3. Resolve promise.

4.3. Extension to DocumentOrShadowRoot

partial interface mixin DocumentOrShadowRoot {
  readonly attribute Element? pictureInPictureElement;
};

The pictureInPictureElement attribute’s getter must run these steps:

  1. If this is a shadow root and its host is not connected, return null and abort these steps.

  2. Let candidate be the result of retargeting Picture-in-Picture element against this.

  3. If candidate and this are in the same tree, return candidate and abort these steps.

  4. Return null.

4.4. Interface PictureInPictureWindow

[Exposed=Window]
interface PictureInPictureWindow : EventTarget {
  readonly attribute long width;
  readonly attribute long height;

  attribute EventHandler onresize;
};

A PictureInPictureWindow instance represents a Picture-in-Picture window associated with an HTMLVideoElement. When instantiated, an instance of PictureInPictureWindow has its state set to opened.

When the close window algorithm with an instance of PictureInPictureWindow is invoked, its state is set to closed.

The width attribute MUST return the width in CSS pixels of the Picture-in-Picture window associated with pictureInPictureElement if the state is opened. Otherwise, it MUST return 0.

The height attribute MUST return the height in CSS pixels of the Picture-in-Picture window associated with pictureInPictureElement if the state is opened. Otherwise, it MUST return 0.

When the size of the Picture-in-Picture window associated with pictureInPictureElement changes, the user agent MUST queue a task to fire an event named resize at pictureInPictureElement.

4.5. Event types

[Exposed=Window]
interface PictureInPictureEvent : Event {
    constructor(DOMString type, PictureInPictureEventInit eventInitDict);
    [SameObject] readonly attribute PictureInPictureWindow pictureInPictureWindow;
};

dictionary PictureInPictureEventInit : EventInit {
    required PictureInPictureWindow pictureInPictureWindow;
};
enterpictureinpicture

Fired on a HTMLVideoElement when it enters Picture-in-Picture.

leavepictureinpicture

Fired on a HTMLVideoElement when it leaves Picture-in-Picture mode.

resize

Fired on a PictureInPictureWindow when it changes size.

4.6. Task source

The task source for all the tasks queued in this specification is the media element event task source of the video element in question.

4.7. CSS pseudo-class

The :picture-in-picture pseudo-class MUST match the Picture-in-Picture element. It is different from the pictureInPictureElement as it does NOT apply to the shadow host chain.

5. Security considerations

This section is non-normative.

To limit potential abuse through spoofing, the API applies only to HTMLVideoElement. User interaction with the Picture-in-Picture window is intentionally limited so that the only effect is on the Picture-in-Picture window itself or the media being played.

5.1. Secure Context

The API is not limited to [SECURE-CONTEXTS] because it exposes a feature to web applications that user agents usually offer natively on all media regardless of the browsing context.

5.2. Permissions Policy

This specification defines a policy-controlled feature named "picture-in-picture" that controls whether the request Picture-in-Picture algorithm may return a SecurityError and whether pictureInPictureEnabled is true or false.

The default allowlist for this feature is *.

6. Acknowledgments

Thanks to Jennifer Apacible, Zouhir Chahoud, Marcos Cáceres, Philip Jägenstedt, Jeremy Jones, Chris Needham, Jer Noble, Justin Uberti, Yoav Weiss, and Eckhart Wörner for their contributions to this document.

Conformance

Document conventions

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Conformant Algorithms

Requirements phrased in the imperative as part of algorithms (such as "strip any leading space characters" or "return false and abort these steps") are to be interpreted with the meaning of the key word ("must", "should", "may", etc) used in introducing the algorithm.

Conformance requirements phrased as algorithms or specific steps can be implemented in any manner, so long as the end result is equivalent. In particular, the algorithms defined in this specification are intended to be easy to understand and are not intended to be performant. Implementers are encouraged to optimize.

Index

Terms defined by this specification

Terms defined by reference

References

Normative References

[CSS-VALUES-4]
Tab Atkins Jr.; Elika Etemad. CSS Values and Units Module Level 4. URL: https://drafts.csswg.org/css-values-4/
[DOM]
Anne van Kesteren. DOM Standard. Living Standard. URL: https://dom.spec.whatwg.org/
[FULLSCREEN]
Philip Jägenstedt. Fullscreen API Standard. Living Standard. URL: https://fullscreen.spec.whatwg.org/
[HTML]
Anne van Kesteren; et al. HTML Standard. Living Standard. URL: https://html.spec.whatwg.org/multipage/
[INFRA]
Anne van Kesteren; Domenic Denicola. Infra Standard. Living Standard. URL: https://infra.spec.whatwg.org/
[PERMISSIONS-POLICY-1]
Ian Clelland. Permissions Policy. URL: https://w3c.github.io/webappsec-permissions-policy/
[Remote-Playback]
Mark Foltz. Remote Playback API. URL: https://w3c.github.io/remote-playback/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119
[SELECTORS-4]
Elika Etemad; Tab Atkins Jr.. Selectors Level 4. URL: https://drafts.csswg.org/selectors/
[WEBIDL]
Edgar Chen; Timothy Gu. Web IDL Standard. Living Standard. URL: https://webidl.spec.whatwg.org/

Informative References

[MediaSession]
Thomas Steimel; youenn fablet. Media Session. URL: https://w3c.github.io/mediasession/
[SECURE-CONTEXTS]
Mike West. Secure Contexts. URL: https://w3c.github.io/webappsec-secure-contexts/

IDL Index

partial interface HTMLVideoElement {
  [NewObject] Promise<PictureInPictureWindow> requestPictureInPicture();

  attribute EventHandler onenterpictureinpicture;
  attribute EventHandler onleavepictureinpicture;

  [CEReactions] attribute boolean disablePictureInPicture;
};

partial interface Document {
  readonly attribute boolean pictureInPictureEnabled;

  [NewObject] Promise<undefined> exitPictureInPicture();
};

partial interface mixin DocumentOrShadowRoot {
  readonly attribute Element? pictureInPictureElement;
};

[Exposed=Window]
interface PictureInPictureWindow : EventTarget {
  readonly attribute long width;
  readonly attribute long height;

  attribute EventHandler onresize;
};

[Exposed=Window]
interface PictureInPictureEvent : Event {
    constructor(DOMString type, PictureInPictureEventInit eventInitDict);
    [SameObject] readonly attribute PictureInPictureWindow pictureInPictureWindow;
};

dictionary PictureInPictureEventInit : EventInit {
    required PictureInPictureWindow pictureInPictureWindow;
};