This document is also available in this non-normative format: ePub
Copyright © 2016 W3C® (MIT, ERCIM, Keio, Beihang). W3C liability, trademark and document use rules apply.
A Portable Web Publication (PWP) is
This document describes the use cases that correspond to the requirements for a Portable Web Publication. It provides the basis for the technical considerations in the “Portable Web Publications for the Open Web Platform” document [pwp] companion document.
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at https://www.w3.org/TR/.
This is work in progress. The final version of this document planned to be published as an Interest Group Note in a few months. The current version is the first Public Working Draft.
This document was published by the Digital Publishing Interest Group as a First Public Working Draft. If you wish to make comments regarding this document, please send them to public-digipub-ig@w3.org (subscribe, archives). All comments are welcome.
Publication as a First Public Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. The group does not expect this document to become a W3C Recommendation. W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
This document is governed by the 1 September 2015 W3C Process Document.
The Web emerged in 1994, based on a model of individual pages loosely joined by hyperlinks. Clustering within domains and with explicit navigation elements built into them, webpages evolved into websites. This model inherited very little from an existing, powerful and much older page-based media: books.
Over centuries, “books” have assumed many forms: journals, magazines, pamphlets of long-form articles and essays, newspapers, atlases, comics, notebooks, albums of all sorts. We can define these different manifestations as “publications”: bound editions of meaningful media, made public.
We believe there is great value in combining this older tradition of portable, bounded publications with the pervasive accessibility, addressability, and interconnectedness of the Open Web Platform (OWP). New models of economic sustainability, innovative experiences of knowledge and invigorated socio-cultural engagement depend on this.
It is the task of this W3C Digital Publishing Interest Group to explore the uniqueness, desirability, and feasibility of bringing these two great models of publishing together. This document explores requirements based on examples of real world use cases and scenarios. The fundamental, baseline requirements that form the heart of what is expected from a PWP are described first, followed by requirements and use cases that describe additional, strongly desired scenarios. The complete list of requirements is also collected in a separate table in A. List of Requirements.
The terms “online” and “offline” are used in this document, but the borderline between these is not always clear cut. For example, a PWP can be on a local disc but accessed through a web server running on that machine (i.e., through a http://localhost
URL). The behavior of the client in this case is identical to the situation when the publication is genuinely online, although, technically, the publication is clearly offline. Similarly, a remote file system can be mounted as a local disc, in which case a PWP can be accessed as a file though technically online. A more precise terminology would use the terms like “protocol” and “file” states, for what is colloquially called “online” and “offline” in this document.
Req. 1: The publication should be readable in a browser.
Reading publications of any length should not be restricted to specific devices or applications; publications should be equally available in a browser.
Req. 2: PWPs should be able to make use of all facilities offered by the OWP.
There is a formidable development of visualization systems, interactive tools, and other powerful facilities that are built on top of the OWP, including accessing external services like Wolfram Alfa. These tools have been traditionally developed for browsers, and provide possibilities that traditional publications, such as books, magazines, scholarly papers, and educational materials, should also benefit from. That requires publications to become first class citizens on the web platforms.
Req. 3: It should be possible to see the publication in a “paginated” view.
Whereas a “scrolling” view is the dominating approach on the Web in Web browsers, publications must provide the possibility to switch to a paginated view if the user so desires or as the author suggests. Pagination may automatically adapt page sizes to the device’s or the browser’s viewport, and may contain separate headers, footers, and/or page numbers.
For more detailed requirements on pagination, see here.
Req. 4: The same PWP should be available both online and offline.
The same content of the PWP should be accessible offline, if circumstances so dictate, without the necessity for the reader to take any particular, technical actions.
Req. 5: There should be a smooth transition between offline and online states of the same publication.
Accessing a document online or offline should not require conversions from one storage format to the other. The transition should be as transparent as possible to the reader, requiring only a very minimal (ideally no) interaction from her part.
Req. 6: It should be possible to create and distribute a PWP as a uniquely identified single resource unit.
A PWP, no matter how many pieces, must be distributable to readers or consumers as a single unit for distribution so that users can consume the necessary content that is identified by the PWP.
See also Archiving.
Req. 7: A publication may consist of a collection of resources.
A PWP is likely to consist of a variety of web resources, including HTML, SVG, or CSS material. It may also contain data files, executable resources like JavaScript, iPython scripts, or Java. This reflects the current practice on the web, emphasizing the need of memory requirement on small devices, cacheability, or independent development.
Req. 8: The notion of a PWP should enable specific publications like audio books, graphics books, and mixed media.
All concepts and structures related to a PWP should enable the creation and/or production of video or audio rendering; all the audio, video, and graphics content must be treated with the same attention as all other content.
See also section on Accessibility.
Req. 9: The reader must have the possibility to personalize his or her reading experience. This may include, for example, controlling such features as font size, choice of fonts, background and foreground color, tone of voice, etc. This should be done via a proper interactive dialogue and/or a choice among pre-defined possibilities.
Req. 10: The publication must be discoverable.
During the discovery phase, as a reader decides whether to acquire and download a publication, he or she will want to determine if this book has the appropriate features they are looking for.
See also Req. 42: Accessibility of a PWP must be discoverable.
Req. 11: There should be a way to control versioning and revisioning.
There should be the capability for providing revisions or new versions to users. The online and offline version should be able to be in sync.
See also Req. 20: The distribution of a PWP should not affect its versioning.
Req. 12: There should be a way to differentiate between essential and non-essential resources.
Preserving essential content is the job of the reading system. However, having a clear indication within the publication format to mark which items are critical, and which instead need a fallback for limited connectivity/storage situations, would provide critical input to the reading system to do it's job. This would also give more control to the publisher, allowing them to ensure a consistent user experience while consuming the publication. When changing the state of a PWP from, e.g., online to offline, an implementation knows which PWP resources are essential for the display of the content, and therefore must be included in the offline version, or which may be skipped (see 2.4 Online and Offline and 2.5 State Transitions)
Non-essential content, which is not required to be available in certain states, should have a predefined fallback that will allow the user to continue consumption (even in a potentially degraded, but author-controlled manner) (see also 4. States of a PWP).
Req. 13: A PWP should allow for access control and write protections of the resource.
Req. 14: The publication should conform to all the requirements of horizontal dependencies.
Web content has to be consumed under different circumstances: it must be available to the largest possible audience in a secure manner, providing the necessary protection of the reader’s privacy. Publication content must be able to answer to a number of principles like accessibility, internationalization, device independence, security, or privacy. (These are usually referred to, in the W3C context, as “horizontal” dependencies.) These principles are, in general terms:
These principles correspond to technical requirements on the underlying technologies (i.e., OWP, and its possible extension for PWP) insofar as the technologies must empower the authors (writers, editors, publishers, etc.) to produce content that follow them. Whether authors use the possibilities of these technologies or not is not addressed in this document.
All these constraints are usually formalized in the context of the usage on the Web, but they are also valid for publications in general regardless of whether they are online or offline, or whether the publication is distributed as a single unit or not. In some cases, for example due to legislative reasons, the demands on digital publications may be more stringent than for generic web sites. The use cases below provide some examples for the publication-specific situations.
Req. 15: User agents must treat a PWP, regardless of the number of components, as a single unit as opposed to individual documents.
Req. 16: The information regarding the constituent resources of a PWP must be easily discovered.
A PWP will likely be composed of multiple web documents. A more complicated PWP may have many more components, meaning that extracting in advance all the references to other constituent resources may be prohibitive. It is therefore necessary for the reading system to have an easy access to the list of constituent resources, and some of their characteristics like their media types or sizes.
Req. 17: Find the (default) reading order of the resources of a PWP easily.
A user agent needs to know the sequence in which to present components of a PWP to the user. A PWP will likely be composed of multiple web documents. A typical simple PWP will anywhere from one to fifteen HTML documents and several image files, in one location or many. A more complicated PWP may have many more components, meaning that extracting the exact order from within the resources (i.e., parsing them in advance to extract the information) may be prohibitive. It is therefore necessary for the reading system to have an easy access to the reading order constituent resources. In particular, the user agent should also have the information on what the starting point of the publication rendering is.
Req. 18: There should be a way to uniquely identify a publication regardless of its state.
A unique identification of a specific publication, regardless of whether it is online or offline, or whether it is part of a web site or a single (packaged) file is essential. This unique identification should be mapped onto the “real” location of the publication smoothly, without requiring the author’s interaction.
During the consumption of a publication, a user may change the “state” of their PWP. The states of a PWP reflects whether the document is online or offline, or whether it is packed (i.e., all constituents are packaged, for example, in a ZIP file) or not. These different states require a different behavior from the user agent, while some of the characteristics of the publication may be invariant across states. The table below shows the same publication (PWP) in the most typical states:
Online | Offline | |
---|---|---|
Packed | PWP as one archive file on a Web Server | PWP as one archive file on a local disc |
Unpacked | PWP spread over several files on a Web Server | PWP spread over several files on a local disc |
The concept of states is related to 2.4 Online and Offline, 2.5 State Transitions and 2.12 Essential and Non-essential Resources.
See also Locating the Same Publication Across States, and State-Independent and State-Dependent Locators
Req. 19: The PWP needs to have an explicit “offline mode” alternative.
It is possible that certain items will be dynamic, requiring the use of an external resource (or server) to provide the data. In offline mode, the user may want to be alerted that content could not be obtained, or be shown some fallback set of data. In this case, being able to specify an explicit “no-connectivity” or "offline-mode" alternative would allow the publication author to have more control over the user's experience and replace a potential error-display with a limited subset of a good experience.
Req. 20: The distribution of a PWP should not affect its versioning.
Simply distributing or sharing a PWP to multiple destinations and devices should not result in multiple versions of the PWP. Those items should not be different versions of the source PWP unless they contain modifications that make them different PWPs.
Req. 21: The distribution of PWPs should conform to the standard processes and expectations of commercial publishing channels.
Req. 22: PWPs should support cross-references that can be resolved locally or externally.
A user should have an option to access their local copy of a PWP when there is a choice between a local copy and an external source.
Req. 23: Several PWPs may share external resources.
In order to make serial publications lighter and speed up processing, a PWP should support the injection of external resources.
Req. 24: PWPs should be able to access external data.
This is related to Req. 12: There should be a way to differentiate between essential and non-essential resources. That requirement states that essential resources must be included in the offline version; non-essential resources may be either included, too, or accessed online when possible. It is up to the packaging software to decide which resources will be included, and which will not. This requirement adds the possibility to specify that some data must stay external to the packaged publication.
In this document, 'manifest' refers to an abstract place, typically one or several files, that contain information necessary to the proper management, rendering, and so on, of the publication. This is opposed to metadata that contain information on the content of the publication like author, publication date, and so on.
Some fundamental use cases and requirements already imply the usage of manifests. For example:
Req. 25: Manifests should include the technical and descriptive metadata, and basic characteristics of the constituent resources.
A user agent requires information about the package and its components in order to process it. For example, performance and memory requirements may prevent a user agent from parsing a large number of content documents in order to discover the necessary components and their relationships. A user agent may need to make some decisions about how to present content before displaying it.
Some necessary features are listed among the fundamental features of a PWP (see 3.1 Horizontal Dependencies); the use cases below provide a (non-exhaustive) list of further information.
Req. 26: Manifest should make it possible to provide a streamlined access to disjoint parts of the publication.
It should be possible for the author to convey several potential reading orders that may go beyond the “default” for the content of the publication. This alternative reading order may only includes specific parts of the publication rather than the full content of the PWP
Req. 27: Manifest should include information of new content.
The manifest should include information that makes it possible for a user agent to find out whether a specific content has changed since a last access or not. This may or may not directly reflect versioning, as referred to in the requirement on versioning), insofar as the granularity may be different.
See also "Archiving"
Req. 28: Manifest should include means to use links to resources, regardless of location.
The fundamental requirement on identification already states that there should be a way to uniquely identify a publication (see 3.5 Uniquely Identifying a PWP). This requirement should be extended to resources within a publication, such as a chapter, an image, or a mathematical formula. To achieve these, the manifest should contain the necessary information to make the mapping of URIs possible.
Req. 29: The manifest may include alternative reading orders.
One of the fundamental requirements is that the default reading order should be made available to the user agent (see 3.4 Default Reading Order). In addition, the manifest should also provide means to define alternative reading orders. It is up to the user agent how this alternative reading order is conveyed to the reader (via some suitable user interface techniques).
Req. 30: The access methods for retrieving a manifest should allow for significant flexibility.
A manifest may be a file in some predefined format, such as XML or JSON. There should be several ways to get hold of that manifest--sourcing the file from a well known location, using an HTML link
element, etc. A PWP should provide a high level of flexibility, including possible alternatives on how that manifest should be found.
LINK
header).Req. 31: There should be a possibility to combine manifests from several origins.
The default approach for a user agent is to get hold of the manifest (file) as one unit via some flexible means (see 6.6 Multiple Access Options). However, an even more flexible means is to provide possibilities to provide several manifests file that the user agent would combine, following some rules, to yield the final, overall manifest for the publication.
The Interest Group recognizes that the proper definition/implementation of this requirement may lead to major technical complications and therefore may not be fulfilled.
Req. 32: A common, state-independent locator needs to exist. This is necessary to connect the same publication across states. This state-independent locator (also known as the “canonical” locator [rfc6596]) should be part of the PWP.
Req. 33: There must also be a separation between state-independent and state-dependent locators. It must be possible (and necessary) to use, for all cross-references, the canonical locator.
Req. 34: It should be possible to use, in all circumstances, a relative locator to refer to content within a PWP. Relative locators are abundant on the Web; state-independent locators should not break this mechanism. A PWP processor must be able to combine a relative locator with the canonical as well as state dependent locators of a PWP.
Req. 35: When providing a pointer to any or all of a publication, this should be robust across states. Locating a resource within a PWP should not depend on the PWP's state.
Some use cases, documented separately [dpub-annotation-uc] for the purpose of annotations, imply this issue:
Highlights are, in this case, Web Annotations [annotation-model], i.e., stored online with links to the highlighted text.
Req. 36: Identifiers must be persistent and usable across states, and not conflict with locators.
Identification of a publication is orthogonal to the issue of locators. There are no specific requirements on identification of PWPs in this document in general; however, any such identifier must be usable across states and must not conflict with locators. Identifiers need to be persistent across PWP instances.
We take for granted the relative durability of print artifacts, many of which have survived with little more than benign neglect. In contrast, digital documents are unlikely to persist without more active interventions, such as making copies, monitoring software dependencies, and validating integrity. Since future consumers of publications represent the most open-ended user group, it is desirable that digital documents be instilled with more of the inherent durability that characterizes print artifacts. The PWP offers this potential, by making it easier for archiving services to locate, harvest, update, and describe digital publications. Long-term preservation of digital publications ensures that they may continue to be accessible, beyond the tenure of individual authors, file formats, publishers, or publishing platforms.
Req. 37: The locations of all PWP components should be discoverable.
An archiving service needs a reliable way to learn where all of the components that constitute a PWP are located in order to be able to archive it. Without such a mechanism, the archiving service will have to develop and maintain publisher- and/or platform-specific heuristics for packaging collections of interlinked resources into discrete publications, making archiving more expensive and error-prone.
See also Discovery
Req. 38: There should be a way to discover that a new version of one or more PWP components have been published.
In order to be able to archive a PWP, an archiving service needs a reliable way to learn that a new version of one or more PWP components have been published at the same locations as previously published. Without such a mechanism, the archiving service will need to periodically re-download and re-checksum all PWP components to determine whether any updates have transpired, unnecessarily increasing the load on archiving service and publisher servers and delaying updated PWP components from being archived.
Req. 39: There should be a way to discover that one or more new components have been added to a PWP.
An archiving service needs a reliable way to learn that one or more PWP components have been added to a PWP in order to be able to archive them. Without such a mechanism, there is a possibility that the archiving service will not know that the new component belongs to the PWP, because the publisher- and/or platform-specific heuristics have not been updated.
Req. 40: There should be a way to discover that one or more PWP components have been removed from a PWP.
An archiving service needs a reliable way to learn that one or more PWP components have been removed from a PWP in order to be able to propagate this change to the archive. Without such a mechanism, it is possible that the archiving service will mistakenly make PWP components accessible that should not be.
Req. 41: There should be a way to indicate whether one or more PWP components contain structured descriptive metadata.
An archiving service needs a reliable way to determine which, if any, PWP components contain structured descriptive metadata. Without such a mechanism, the archiving service will have to develop and maintain publisher- and/or platform-specific heuristics for locating or parsing out descriptive metadata, making archiving more expensive and decreasing the reliability of reporting.
Some fundamental use cases and requirements already imply the usage of accessibility. For example:
Req. 42: Accessibility of a PWP must be discoverable. This ensures that users know how (or whether) a PWP or any of its parts is accessible. Granular accessibility of a PWP must be discoverable so that users know how (or whether) a specific chapter or element within a PWP is accessible. See also 6.1 Manifests, Metadata, and Resources.
Req. 43: A PWP must support the ability to include multiple renditions of a publication. Within their publication, in addition to the print rendition, a publisher may include a fully narrated rendition, or a video with described audio and captioning.
Req. 44: A PWP needs to support the ability to construct a limited package with only a subset of the necessary content.
See also Req. 6: It should be possible to create and distribute a PWP as a uniquely identified single resource unit and Req. 9: The reader must have the possibility to personalize his or her reading experience.
Req. 45: A PWP needs to support both time-based media and text.
A PWP needs to support time-based media, such as synchronized video, audio, captions or transcript, or sign language interpretation. A PWP must enable a synchronized media experience while navigating through the book, with sufficient level of granularity.
Req. 46: When annotations are distributed and associated with a PWP, the content of the annotation must be compatible with assistive technology.
Req. 47: User agents must be allowed to limit the capabilities of a PWP.
The compatibility and interoperability requirements of the specification must not prevent user agents from taking measures to protect the security, safety, or privacy of the user. Security-conscious systems that interlink an unusually large number of important services and have an unusually large attack surface (such as web browsers or web services) have stricter security requirements than standalone apps that are siloed from the web. They must be allowed to continue to fulfill their pre-existing security requirements as they implement support for the PWP format.
Req. 48: It should be possible to discover the capabilities a PWP will have access to. A document’s access to features and APIs will vary from platform to platform, app to app. Document authors benefit from being able to discover these capabilities.
Req. 49: PWP authors should be able to embed guidance policies in their documents that inform the user agent of their preferences as to how the integrity and security of the document itself should be preserved. Indeed, scripted documents are dynamic by nature; long-lived authored documents are vulnerable to alteration by a variety of external factors. This mechanism should be based on the pre-existing Content Security Policy (CSP) [csp2] and Subresource Integrity [sri] specifications and not be a new invention incompatible with web browser CSP implementations.
Req. 50: User agents may provide a method for escalating trust. By providing such methods, user agents may regain access to more capabilities, while otherwise the agent would impose limitation for security reasons. Platform vendors have sometimes offered methods for otherwise untrusted local scripts to become trusted and regain API privileges that the had lost while untrusted.
Number and reference | Short description |
---|---|
Req. 1 | The publication should be readable in a browser |
Req. 2 | PWPs should be able to make use of all facilities offered by the OWP |
Req. 3 | It should be possible to see the publication in a “paginated” view |
Req. 4 | The same PWP should be available both online and offline |
Req. 5 | There should be a smooth transition between offline and online states of the same publication |
Req. 6 | It should be possible to create and distribute a PWP as a uniquely identified single resource unit |
Req. 7 | A publication may consist of a collection of resources |
Req. 8 | The notion of a PWP should enable specific publications like audio books, graphics books, and mixed media |
Req. 9 | The reader must have the possibility to personalize his or her reading experience |
Req. 10 | The publication must be discoverable |
Req. 11 | There should be a way to control versioning and revisioning |
Req. 12 | There should be a way to differentiate between essential and non-essential resources |
Req. 13 | A PWP should allow for access control and write protections of the resource |
Req. 14 | The publication should conform to all the requirements of horizontal dependencies |
Req. 15 | User agents must treat a PWP, regardless of the number of components, as a single unit as opposed to individual documents |
Req. 16 | The information regarding the constituent resources of a PWP must be easily discovered |
Req. 17 | Find the (default) reading order of the resources of a PWP easily |
Req. 18 | There should be a way to uniquely identify a publication regardless of its state |
Req. 19 | The PWP needs to have an explicit “offline mode” alternative |
Req. 20 | The distribution of a PWP should not affect its versioning |
Req. 21 | The distribution of PWPs should conform to the standard processes and expectations of commercial publishing channels |
Req. 22 | PWPs should support cross-references that can be resolved locally or externally |
Req. 23 | Several PWPs may share external resources |
Req. 24 | PWPs should be able to access external data |
Req. 25 | Manifests should include the technical and descriptive metadata, and basic characteristics of the constituent resources |
Req. 26 | Manifest should make it possible to provide a streamlined access to disjoint parts of the publication |
Req. 27 | Manifest should include information of new content |
Req. 28 | Manifest should include means to use links to resources, regardless of location |
Req. 29 | The manifest may include alternative reading orders |
Req. 30 | The access methods for retrieving a manifest should allow for significant flexibility |
Req. 31 | There should be a possibility to combine manifests from several origins |
Req. 32 | A common, state-independent locator needs to exist |
Req. 33 | There must also be a separation between state-independent and state-dependent locators |
Req. 34 | It should be possible to use, in all circumstances, a relative locator to refer to content within a PWP |
Req. 35 | When providing a pointer to any or all of a publication, this should be robust across <a href="#states">states</a> |
Req. 36 | Identifiers must be persistent and usable across states, and not conflict with locators |
Req. 37 | The locations of all PWP components should be discoverable |
Req. 38 | There should be a way to discover that a new version of one or more PWP components have been published |
Req. 39 | There should be a way to discover that one or more new components have been added to a PWP |
Req. 40 | There should be a way to discover that one or more PWP components have been removed from a PWP |
Req. 41 | There should be a way to indicate whether one or more PWP components contain structured descriptive metadata |
Req. 42 | Accessibility of a PWP must be discoverable |
Req. 43 | A PWP must support the ability to include multiple renditions of a publication |
Req. 44 | A PWP needs to support the ability to construct a limited package with only a subset of the necessary content |
Req. 45 | A PWP needs to support both time-based media and text |
Req. 46 | When annotations are distributed and associated with a PWP, the content of the annotation must be compatible with assistive technology |
Req. 47 | User agents must be allowed to limit the capabilities of a PWP |
Req. 48 | It should be possible to discover the capabilities a PWP will have access to |
Req. 49 | PWP authors should be able to embed guidance policies in their documents that inform the user agent of their preferences as to how the integrity and security of the document itself should be preserved |
Req. 50 | User agents may provide a method for escalating trust |
The following people have been instrumental in providing thoughts, feedback, reviews, content, criticism, and input in the creation of this document: