Publications, from corporate memos to newsletters to electronic books to scholarly journal articles, must be considered first-class content on the Web, equal to the more common forms of Web pages available today. This document describes the various use cases highlighting the problems users and publishers face when these publications are to be used in a digital, Web environment. The requirements that come from those use cases provide the basis for the technical considerations in a companion document, currently entitled “Web Publications” [[wpub]].

A previous version of this document was published by the Digital Publishing Interest Group as an Interest Group Note. After the closure of that Interest Group the Publishing Working Group took it over for further work.

Introduction

The Web emerged in 1994, based on a model of individual pages loosely joined by hyperlinks. Clustering within domains and with explicit navigation elements built into them, webpages evolved into websites. Despite the Web's strong connections to print media (e.g. web resources are “pages” and the in-memory model for Web applications is the “Document Object Model”), this document argues that the web platform may still not be meeting certain requirements from print media that users desire.

Over centuries, “books” have assumed many forms: journals, magazines, pamphlets of long-form articles and essays, newspapers, atlases, comics, notebooks, albums of all sorts. We can define these different manifestations as “publications”: bound editions of meaningful media, made public.

Another form of publication that also has a long history in both the printed as well as the digital world are documents. These are publications that are written and distributed in a more ad-hoc manner, such as legal briefs, corporate memos, and even the definitions of standards, such as this content currently being read.

We believe there is great value in combining this older tradition of portable, bounded publications with the pervasive accessibility, addressability, and interconnectedness of the Open Web Platform (OWP). New models of economic sustainability and innovative experiences of knowledge depend on this.

It is the task of the W3C Digital Publishing Interest Group to explore the uniqueness, desirability, and feasibility of bringing these two great models of publishing together. This document explores requirements based on examples of real world use cases and scenarios. Requirements for publications on the Web are explored first, without referring to any packaging aspect that would correspond to current practices like EPUB. This is followed by requirements of those packaging aspects, as a structure on top of a purely Web-based distribution. The complete list of requirements is also collected in a separate table in an appendix.

Terminology

Web Standards

Open Web Platform

Web Publications should be able to make use of all features offered by the Open Web Platform (OWP).

There is a remarkable development of tools and frameworks built on top of OWP that make it possible to develop powerful interactive layers on top of OWP. These include, for example, data visualization systems (e.g., d3, built on top of SVG), possibilities to access external services like Wolfram Alpha, or tools to create and store (possibly as part of the publication) annotations. These tools have been traditionally developed around browsers and provide possibilities that publications should also benefit from. That requires that Web Publications become first class citizens on the Web platform.

Horizontal Dependencies

A Web Publication should conform to the requirements of all horizontal dependencies: accessibility, internationalization, device independence, security, and privacy.

Web content has to be consumed under different circumstances: it must be available to the largest possible audience in a secure manner, providing the necessary protection of the reader’s privacy. Publication content must be able to answer to a number of principles like accessibility, internationalization, device independence, security, and privacy. (These are usually referred to, in the W3C context, as “horizontal” dependencies.) These principles are, in general terms:

Accessibility:
People with disabilities should be able to access the content of a publication. They should be able to perceive, understand, navigate, and interact with it, as well as contribute to it. Accessibility encompasses all disabilities that affect access to the content, including visual, auditory, physical, speech, cognitive, and neurological disabilities.
Internationalization:
Publications should be well adapted to any language, writing systems, region, or culture. This includes the usage, when appropriate, of left-to-right, right-to-left, horizontal or vertical writing; item numbering, or interactive forms specific to local cultures; usage of the right character sets and of local typographic conventions.
Device Independence:
The content in a publication should be usable on a large number of devices with very different device characteristics: different screen types and sizes, various input modalities, varying level of processing power, etc. These different affordances should be automatic with no, or very little, user intervention.
Security:
Publications should be presented by a User Agent using a security model that is at least (if not more) secure than the standard Web security model. Doing this will prevent publications that contain malicious attacks, data theft, and other security incidents from impacting users by jeopardizing the integrity of the underlying data or machine operations.
Privacy:
The content in a publication should maintain and support user privacy, in spite of the fact that the evolution of online technologies has increased the possibility for the collection and processing of personal, and possibly sensitive, data. However, since a publication may use any part of the OWP, it may choose to use functionality such as the ability to track a user's activity within the publication.

These principles correspond to technical requirements on the underlying technologies (i.e., OWP, and its possible extension to Web Publications) insofar as the technologies must empower the authors (writers, editors, publishers, etc.) to produce content that follow them. Whether authors use the possibilities of these technologies or not is not addressed in this document.

All these constraints are formalized in the context of the usage on the Web and by extension Web Publications. This means that they are valid for publications in general. In some cases, for example due to legislative reasons, the demands on publications may be more stringent than for generic Web sites. The use cases below provide some examples for the publication-specific situations. Note also that some aspect of horizontal dependencies (e.g., accessibility or security), are also the subject of further use cases and requirements elsewhere in this document.

Escalating Trust

User agents may provide a method for escalating trust for a specific publication.

Some publications may require additional capabilities (for example, access to camera or geolocation) that a user agent might normally not enable. Today, some platform and UA vendors offer methods for otherwise untrusted local scripts to become trusted and regain API privileges, a similar ability needs to exist for publications as well.

Document Composition

Identification

A Web Publication, as a collection of resources, must be identified by either a single URL or a unique handle that can resolve to a single URL.

The unique identification of a specific Web Publication is essential. If not expressed as a URL, there should be a way to map this unique identification onto a Web Address. The Web Publication must be identifiable as a single logical resource with its own URL beyond the references to its constituent resources.

All constituent resources, and their contents, should be identified by either a URL or a unique handle that can resolve to a URL.

The requirement that a Web Publication be uniquely identifiable can be easily extended to the constituents of a Web Publication, as well as the fragments, parts, sections, etc, of those resources. Those idenfications should be stable and resilient to changes and new iterations of the publication.

Metadata

Web Publications should include technical metadata and descriptive metadata, including accessibility metadata, as well as any additional characteristics of the constituent resources.

A user agent may require information about the publication and its components in order to process it. For example, performance and memory requirements may prevent a user agent from parsing a large number of content documents in order to discover the necessary components and their relationships. A user agent may need to make some decisions about how to present content before displaying it.

A Web Publication should be able to include additional information that the user agent can use, such as:

Resources

The information regarding the constituent resources of a Web Publication must be easily discovered and there should be a way to differentiate between essential and non-essential resources.

A Web Publication will likely be composed of multiple Web documents and their resources. A more complicated Web Publication may have many resources, some of which are essential and some of which are not. Because of this complexity, extracting in advance all the references to some or all constituent resources may be prohibitive. It is therefore necessary for the user agent to have an easy access to the list of constituent resources and some of their characteristics, such as media type, size, and whether they are essential.

In a publication, some content is essential to the user being able to consume it while other content could be either absent or have a provided fallback for situations such as limited connectivity or storage. This information, provided by the author or publisher of the Web Publication, would enable a user agent to provide a better experience to the user. For example, the user agent can ensure that essential resources are made available when offline (see ).

Default Reading Order

There should be a means to indicate the author’s preferred navigation structure among the resources of a Web Publication, and User Agents should provide an accessible way of navigating the same.

Navigation

A user agent should be able to reveal the navigable structure of a Web Publication as a table of contents that is accessible to users, including those with disabilities.

The table of contents must include a link to at least one resource, and all links should refer to resources within the publication bounds. The user agent presents an accessible table of contents, which allows the user to access the links without navigating away from the current resource.

For content that requires a player interface for time-based media, the Web Publication should provide the User Agent a way to navigate to a specific position in the content.

Random Access to Content

Authors of a Web Publication should be able to provide the user agent with information to access random parts of the publication.

It should be possible for the author to convey several potential reading orders that may go beyond the “default” for the content of the publication. This alternative reading order may only include specific parts of the publication rather than the full content of the publication.

A user agent should be able to access the resources of the publication in whatever order it chooses—beyond the order provided by the publication itself.

If there is a physical book version of the Web Publication, the user must have the ability to quickly browse to a corresponding pointer as identified in the physical book.

Alternative Modalities

A Web Publication should encompass publications such as audiobooks, graphic books, mixed media, and interactive media.

All concepts and structures related to a Web Publication should enable the creation and/or production of alternative renderings for visual and auditory content.

Synchronized Time-based Media

A Web Publication needs to support synchronization between text and time-based media.

A Web Publication needs to support time-based media, such as synchronized video, audio, captions or transcript, or sign language interpretation. A Web Publication must also be able to enable a synchronized media experience while navigating through the publication, with sufficient level of granularity.

Data

Web Publications should be able to include data as resources, just as it does with text, images, etc.

Protection

A Web Publication should allow for application of access control and write protections of the publication.

Packaging

It should be possible to create and distribute a Web Publication as a single unit over different protocols or physical media.

This can be done through the usage of Packaged Web Publications.

In order to allow a Web Publication to be packaged without any changes to the content, it may be necessary to provide a mapping from the (absolute) URLs present in the publication to URLs that point to the constituent resources inside the package.

The publisher should be able to provide information in a Packaged Web Publication that can be used to check the origin of the publication and its authenticity.

The publisher should be able to provide information in a Packaged Web Publication proving that the publication has not been tampered with during delivery.

User Agent Operation

Time-based Media

If a Web Publication contains time-based media, a user agent should provide a player interface that is accessible.

The player interface should allow for the following use cases:

In time-based media in a Web Publication, It should be possible to navigate not only by chapter/section but by short segments of time.

If a Web Publication contains time-based media, a user should be able to understand the duration of the media, both in its entirety and of its constituent parts.

Progression

User agents should provide the option for the user to save their progression in the publication and return the user to the last location they saved the next time they open the publication.

Reading State

The user must be able to leave the Web Publication and return to it at the last position they left from. The User Agent must retain the reading position, based on the last known position of the reader in the Web Publication. The position should be based on the reader's position in the file within the reading order.

The user agent may retain reading state if the web publication is revised. If the user agent consists of a player interface, that interface should allow the ability to leave and return to the content in the same position where the reader left off.

Movement

It should be possible to see the Web Publication in a “paginated” view. When a user agent renders a Web Publication in a paginated layout, it must lay out each document in the default reading order sequentially, with the last page of a resource being followed by the first page of the subsequent one.

Whereas a “scrolling” view is the dominating approach on the Web in browsers, a user or author may wish to view their publications in a paginated view. As such, it should be possible for an individual publication or user agent to provide the ability to switch to pagination view. This pagination may automatically adapt page sizes to the device’s or the browser’s viewport and may contain separate headers, footers, and/or page numbers.

This is distinct from the need to retain original page numbering (often from the print edition) which must be available on demand and must be usable to discover specific locations in the publication.

For more detailed requirements on pagination, see here.

Time-based media, especially a Web Publication consisting solely of time-based media, such as an audiobook, may be presented as a single page with a player module presenting the content metadata. This player may automatically adapt size and features according to the device or browser's viewport. This view may not have page numbering, but reading position would correspond to a time value.

For navigation within time-based media such as audio and video, refer to Time-based Media.

Offline

A Web Publication should also be available offline.

The same content of the Web Publication should be accessible offline, if circumstances so dictate, without the necessity for the user to take any particular, technical actions.

A user agent needs to know the information required to allow the user to access content offline or actively streaming, based on the size and nature of the content, and conditions imposed by the user.

Personalization

The user must have the possibility of personalizing his or her reading experience. This may include, for example, controlling such features as font size, choice of fonts, background and foreground color, tone of audio, etc.

Non-WP User Agents

A non-WP user agent should be able to access the content of a Web Publication.

Since Web Publications are based on the Open Web Platform, a Web Publication's constituent HTML pages, video, audio, images, interactive components, and other media, should be accessible to a non-WP user agent. Creators of Web Publications should allow for the user to be able to access this content.

Special consideration should be given to Web Publications where time-based media is the main or only component, such as an audiobook. To allow the user to access this content, the Web Publication should provide the user with the ability to:

Packaging

The distribution of a Packaged Web Publication should not affect its iterations.

Simply distributing or sharing a Packaged Web Publication to multiple destinations and devices should not result in (technically) different iterations of the Web Publication unless they contain modifications that make them different Web Publications.

The distribution of Packaged Web Publications should respect the existing processes and expectations of professional publishing channels as well as ad-hoc methods of distribution (e.g., email).

Archiving

We take for granted the relative durability of print artifacts, many of which have survived with little more than benign neglect. In contrast, digital documents are unlikely to persist without more active interventions, such as making copies, monitoring software dependencies, and validating integrity. Since future consumers of publications represent the most open-ended user group, it is desirable that digital documents be instilled with more of the inherent durability that characterizes print artifacts. Packaged Web Publications offer this potential by making it easier for archiving services to locate, harvest, update, and describe digital publications. Long-term preservation of digital publications ensures that they may continue to be accessible beyond the tenure of individual authors, file formats, publishers, or publishing platforms.

Fundamental use cases and requirements already help aid our archiving requirements (e.g., ). However, archiving raises additional requirements:

There should be a way to indicate whether one or more Packaged Web Publication components contain (embedded) descriptive metadata.

An archiving service needs a reliable way to determine which, if any, Web Publication components contain descriptive metadata, such as those described in metadata and resources. Without such a mechanism, the archiving service will have to develop and maintain publisher- and/or platform-specific heuristics for locating or parsing out descriptive metadata, making archiving more expensive and decreasing the reliability of reporting.

There should be a way to discover that one or more new components have been added to or deleted from a Web Publication.

An archiving service needs a reliable way to learn that one or more Packaged Web Publication components have been added to or removed from a Packaged Web Publication in order to be able to update the associated archive of the publication.

List of Requirements

All Requirements

Req. #Requirement

Minimal Requirements

A user agent conforming at a minimal level must address all of the following requirements:

Req. #Requirement

Use Cases by Category

Accessibility

People with disabilities should be able to access the content of a publication. They should be able to perceive, understand, navigate, and interact with it, as well as contribute to it. Accessibility encompasses all disabilities that affect access to the content, including visual, auditory, physical, speech, cognitive, and neurological disabilities.

Req. #UC #Use Case

Internationalization

Publications should be well-adapted to any language, writing system, region, or culture. This includes the usage, when appropriate, of left-to-right, right-to-left, horizontal, or vertical writing; item numbering; interactive forms specific to local cultures; usage of the right character sets; and local typographic conventions.

Req. #UC #Use Case

Device Independence

The content in a Web Publication should be usable on a large number of devices with very different device characteristics: different screen types and sizes, various input modalities, varying levels of processing power, etc. These different affordances should be automatic with no, or very little, user intervention.

Req. #UC #Use Case

Security

Publications should be presented by a User Agent using a security model that is at least (if not more) secure than the standard Web security model. Doing this will prevent publications that contain malicious attacks, data theft, and other security incidents from impacting users by jeopardizing the integrity of the underlying data or machine operations.

Req. #UC #Use Case

Privacy

The content in a publication should maintain and support user privacy, in spite of the fact that the evolution of online technologies has increased the possibility for the collection and processing of personal, and possibly sensitive, data. However, since a publication may use any part of the OWP, it may choose to use functionality such as the ability to track a user's activity within the publication.

Req. #UC #Use Case