This specification defines a file format and processing model for packaging the set of related resources and associated metadata that comprise a digital Publication into a single-file container, the Lightweight Publication Package.

This draft provides a draft version of a Lightweight Packaging Format for Publications. Many details are under active consideration within the Publishing Working Group and are subject to change. The most prominent known issues have been identified in this document and links provided to comment on them.

Introduction

A Lightweight Publication Package is used:

Terminology

This document uses terminology defined by the W3C Note "Publishing and Linking on the Web" [[publishing-linking]], including, in particular, user and user agent.

In addition, the following terminology is defined for use in this specification:

Codec

Codec refers to content types that have intrinsic binary format qualities, such as video and audio media types which are already designed for optimum compression, or which provide optimized streaming capabilities.

Non-Codec

Non-Codec refers to content types that benefit from compression due to the nature of their internal data structure, such as file formats based on character strings (for example, HTML, CSS, etc.).

Package

In this specification, alias to Lightweight Publication Package.

Primary Entry Page

The primary entry page represents the preferred starting resource for a Web Publication. The structure of a Primary Entry Page is defined in [[wpub]].

Publication

A digital Publication is a collection of one or more constituent resources and associated metadata, organized together in a uniquely identifiable grouping.

Publication Manifest

A Publication Manifest represents structured information about a publication, such as informative metadata, a list of all resources, and a default reading order. The structure of a Publication Manifest is defined in [[wpub]].

Root Directory

The base of the Package file system. This directory is virtual in nature: a user agent might or might not generate a physical root directory for the contents of the Package if such contents are unzipped.

Web Publication

A Web Publication is a digital Publication which is discoverable on the Web and presentable using Open Web Platform technologies. Web Publications are defined in [[wpub]].

Only the first instance of a term in a section is linked to its definition.

Specification

Packaging format

A Lightweight Publication Package uses the ZIP format as specified in ISO/IEC 21320-1:2015 ([[zip]]).

File and Directory Structure

A Package MUST include at least one of the following files in the Root Directory:

The contents of both files are specified in [[wpub]]; they MUST not be encrypted.

A Package MUST also include all resources within the bounds of the Publication, i.e. the finite set of resources obtained from the union of resources listed in the default reading order and resource list of the Publication Manifest.

These resource files MAY be in any location descendant from the Root Directory.

Files within the Package MUST reference each other via relative URL strings [[!url]], resolving to resources within the Package (i.e. at or below the Root Directory).

The [[zip]] specification has few constraints on the characters allowed for file and directory names. When crafting such names, authors must be careful to use characters which allow a broad interoperability among operating systems.

Compression of resources

When stored in a Package, resources with Non-Codec content types SHOULD be compressed and the Deflate compression algorithm MUST be used. This practice ensures that file entries stored in the Package have a smaller size.

Resources with Codec content types SHOULD be stored without compression. In such case, compression would introduce unnecessary processing overhead at production time (especially with large resource files) and would impact audio/video playback performance at consumption time.

In some cases, the combination of compression with some encryption schemes might even hinder the ability of user agents to handle partial content requests (e.g. HTTP byte ranges), due to the technical difficulty to determine the length of the full resource ahead of media playback (e.g. HTTP Content-Length header).

Obtaining a Publication Manifest

If the Package contains a publication.json file located in the Root Directory, the Publication Manifest is obtained by opening and parsing this file.

Otherwise, if the Package contains an index.html file in the Root Directory, the Publication Manifest is obtained through the following steps:

  1. Let Document be the result of the extraction of the index.html file from the Package.
  2. If it does not have the media type text/html or application/xhtml+xml, terminate this algorithm.
  3. Let manifest link be the first link element in tree order in Document whose rel attribute contains the publication token.
  4. If manifest link is null, terminate this algorithm.
  5. If manifest link's href attribute's value is the empty string, terminate this algorithm.
  6. If the href attribute value of manifest link has a non-null fragment identifying an identifier id in Document:

    1. Let embedded manifest script be the first script element in tree order, whose id attribute is equal to id and whose type attribute is equal to application/ld+json.
    2. If embedded manifest script is null, terminate this algorithm.
    3. Let text be the child text content of embedded manifest script
    Explanation

    This branch is in use when the manifest is embedded in the primary entry page. The algorithm locates the script element and extract the manifest itself.

  7. Otherwise:
    1. Let manifest URL be the value of the href attribute.
    2. If manifest URL is not a relative URL string, then abort these steps.
    3. Extract the Manifest from the Package using manifest URL.
    4. Open and read the Manifest file, letting text be the result.
    Explanation

    This branch is in use when the manifest is in a separate file. It performs the standard operations to retrieve the manifest from the Package.

If both index.html and publication.json are present in the Package, then the Primary Entry Page SHOULD contain a reference to the publication.json file, following the rules defined in this section.

User Agent Conformance

A user agent conforms to this specification if it meets the following criteria:

The hyperlinks specified here will have to be updated when the Publication Manifest specification has been finalized.

Examples

The application/lpf+zip Media Type

This appendix registers the media type application/lpf+zip for the Lightweight Packaging Format (LPF).

A Lightweight Packaging Format (or LPF) file is a container technology based on the [[zip]] archive format. It is used to encapsulate Web Publications. LPF and its related standards are maintained and defined by the World Wide Web Consortium (W3C).

MIME media type name:

application

MIME subtype name:

lpf+zip

Required parameters:

None.

Optional parameters:

None.

Encoding considerations:

LPF files are binary files encoded in the application/zip media type.

Security considerations:

User agents that read LPF files should rigorously check the size and validity of data retrieved.

In addition, because of the various content types that can be embedded in LPF files , application/lpf+zip may describe content that poses security implications beyond those noted here. However, only in cases where the user agent recognizes and processes the additional content, or where further processing of that content is dispatched to other user agents, would security issues potentially arise. In such cases, matters of security would fall outside the domain of this registration document.

Security considerations that apply to application/zip also apply to LPF files.

Interoperability considerations:

Any format based on LPF, if using content encryption, MUST choose a different MIME media type and file extension than those defined in this specification.

Published specification:

This media type registration is for the Lightweight Packaging Format (LPF), as described by the Lightweight Packaging Format (LPF) specification located at https://w3c.github.io/lpf/.

Applications that use this media type:

This media type is intended to be used by multiple interoperable applications for the distribution and consumption of ebooks, audiobooks, digital visual narratives and other types of digital publications.

Additional information:
Magic number(s):

0: PK 0x03 0x04

File extension(s):

LPF files are most often identified with the extension .lpf.

Macintosh file type code(s):

ZIP

Fragment identifiers:

None

Person & email address to contact for further information:

Ivan Herman (ivan@w3.org)

Intended usage:

COMMON

Author/change controller:

The published specification is a work product of the World Wide Web Consortium (W3C)’s Publishing Working Group. The W3C has change control over this specification.