This specification defines a collection of information that describes the structure of Web Publications so that user agents can provide user experiences specially tailored to reading publications, such as sequential navigation and offline reading. This information includes the default reading order, a list of resources, and publication-wide metadata.

This draft provides a draft version of a Web Publication. Many details are under active consideration within the Publishing Working Group and are subject to change. The most prominent known issues have been identified in this document and links provided to comment on them.

The work the past few month has been focused on sections and , which include the definition of a manifest making use of terms in schema.org, as well as the Lifecycle and WebIDL sections.

Significant to this draft is the removal of the Information Set (infoset) — the final data model produced from the processing of manifest properties. This draft instead relies on the canonical manifest to express this model, as it already encompasses the final JSON-compliant data set that user agents are expected to produce.

Introduction

What is a Web Publication

A Web Publication is a discoverable and identifiable collection of resources. Information about the Web Publication is expressed in a machine-readable document called a manifest, which is what enables user agents to understand the bounds of the Web Publication and the connection between its resources.

The manifest includes metadata that describe the Web Publication, as a publication has an identity and nature beyond its constituent resources. The manifest also provides a list of all the resources that belong to the Web Publication and a default reading order, which is how it connects resources into a single contiguous work.

A Web Publication is discoverable in one of two ways: resources either include a link to the manifest (via an HTTP Link header or an HTML link element [[html]]), or the manifest can be loaded directly by a compatible user agent.

With the establishment of Web Publications, user agents can build new experiences tailored specifically for their unique reading needs.

Flowchart depicts the resources of a Web Publication and their attachment to a manifest.

Simplified Diagram of the Structure of Web Publications.
A description of the structure diagram is available in the Appendix. Image available in SVG and PNG formats.

Scope

This specification only defines requirements for the production and rendering of valid Web Publications. As much as possible, it leverages existing Open Web Platform technologies to achieve its goal—that being to allow for a measure of boundedness on the Web without changing the way that the Web itself operates.

Moreover, the specification is designed to adapt automatically to updates to Open Web Platform technologies in order to ensure that Web Publications continue to interoperate seamlessly as the Web evolves (e.g., by referencing the latest published versions instead of specific dated versions).

Further, this specification does not attempt to constrain the nature of a Web Publication: any type of work that can be represented on the Web constitutes a potential Web Publication.

The specification is also intended to facilitate different user agent architectures for the consumption of Web Publications. While a primary goal is that traditional Web user agents (browsers) will be able to consume Web Publications, this should not limit the capabilities of any other possible type of user agent (e.g., applications, whether standalone or running within a user agent, or even Web Publications that include their own user interface). As a result, the specification does not attempt to architect required solutions for situations whose expected outcome will vary depending on the nature of the user agent and the expectations of the user (e.g., how to prompt to initiate a Web Publication, or at what point or how much of a Web Publication to cache for offline use).

Terminology

This document uses terminology defined by the W3C Note "Publishing and Linking on the Web" [[publishing-linking]], including, in particular, user, user agent, browser, and address.

Identifier

An identifier is metadata that can be used to refer to Web Content in a persistent and unambiguous manner. URLs, URNs, DOIs, ISBNs, and PURLs are all examples of persistent identifiers frequently used in publishing.

Manifest

A manifest represents structured information about a Web Publication, such as informative metadata, a list of all resources, and a default reading order.

Non-empty

For the purposes of this specification, non-empty is used to refer to an element, attribute or property whose text content or value consists of one or more characters after whitespace normalization, where whitespace normalization rules are defined per the host format.

URL

The general term URL is defined by the URL Standard [[!url]]. It is used as in other W3C specifications, like HTML [[!html]]. In particular, a URL allows for the usage of characters from Unicode following [[!rfc3987]]. See the note in the HTML5 specification for further details.

Web Publication

A Web Publication is a collection of one or more resources, organized together through a manifest into a single logical work with a default reading order. The Web Publication is uniquely identifiable and presentable using Open Web Platform technologies.

Conformance Classes

This specification defines two conformance classes: one for Web Publications and one for user agents that process them.

A Web Publication conforms to this specification if it meets the following criteria:

A user agent conforms to this specification if it meets the following criteria:

Web Publication Construction

Manifest

Authored and Canonical Manifests

A Web Publication is described by its manifest, which provides a set of properties expressed using the JSON-LD [[json-ld]] format (a variant of JSON [[ecma-404]] for linked data).

The manifest is expressed in one of two forms depending on the state of the Web Publication:

Authored Manifest

The Authored Web Publication Manifest, as its name suggests, is the serialization of the manifest that the author provides with the Web Publication (note that the author does not have to be human).

Canonical Manifest

The Canonical Web Publication Manifest is a version of the Web Publication Manifest created by user agents when they obtain the authored manifest and remove all possible ambiguities and incorporate any missing values that can be inferred from another source.

It is possible that an authored manifest is the equivalent of the canonical manifest if there are no ambiguities or missing information, but a canonical manifest only exists after a user agent has inspected the authored manifest as part of the process of obtaining it.

This specification describes the requirements for creating both authored and canonical manifests. This section, in particular, details how to create the authored manifest, while provides the various property definitions. These definitions include the rules user agents uses to supplement the canonical manifest. The algorithm for transforming an Authored Manifest into a Canonical Manifest is described in the separate section .

WebIDL

Explanation

Although a Web Publication manifest is authored as [[json-ld]], a user agent processes this information into an internal data structure in order to utilize the properties. The exact manner in which this processing occurs, and how the data is used internally, is user agent-dependent.

To ensure interoperability when exposing the items, this specification defines an abstract representation of the data structures using the Web Interface Definition Language (WebIDL) [[webidl-1]]. The WebIDL definitions express the expected names, datatypes, and possible restrictions for each member of the manifest. (A WebIDL representation can be mapped onto ECMAScript, C, or other programming languages.)

Authors of Web Publications are encouraged to review these definitions, but they are not necessary to understand.

The WebPublicationManifest Dictionary
dictionary WebPublicationManifest {
	
};

The WebPublicationManifest dictionary is the [[!webidl-1]] representation of the collection of Web Publication manifest properties. WebIDL definitions are also provided at the end of each property that belongs to the dictionary — these represent the members of the WebPublicationManifest dictionary.

Refer to for a complete listing of the WebPublicationManifest dictionary.

Manifest Contexts

A Web Publication Manifest MUST start by setting the JSON-LD context [[!json-ld]]. The context has the following two major components:

  • the [[!schema.org]] context: https://schema.org
  • the Web Publication context: https://www.w3.org/ns/wp-context
{
    "@context" : ["https://schema.org", "https://www.w3.org/ns/wp-context"],
    …
}

The Web Publication context document adds features to the properties defined in Schema.org (e.g., the requirement for the creator property to be order preserving).

As part of the continuous contacts with Schema.org the additional features defined in the Web Publication context file could migrate to the core Schema.org vocabulary.

Although Schema.org is often referenced using the http URI scheme, the vocabulary is being migrated to use the secure https scheme as its default. This specification requires the use https when referencing Schema.org in the manifest.

Values

Arrays and Single Values

Various manifest properties can have one or more values. As a general rule, these values can be expressed as [[!json]] arrays. When the property value is an array with a single element, however, the array syntax MAY be omitted.

Text Values or Objects

Various manifest properties are expected to be expressed as [[!json]] objects. Although the use of objects is usually RECOMMENDED, it is also acceptable to use string values that are interpreted as objects depending on the context. The exact mapping of text values to objects is part of the property or object definitions.

Publication Types

The Web Publication Manifest MUST include a Publication Type using the type term [[!json-ld]]. The type MAY be mapped onto CreativeWork [[!schema.org]].

{
    "@context" : ["https://schema.org", "https://www.w3.org/ns/wp-context"],
    "type"     : "CreativeWork"
    …
}

Schema.org also includes a number of more specific subtypes of CreativeWork, such as Article, Book, TechArticle, and Course. These MAY be used instead of, or in addition to, CreativeWork.

{
    "@context" : ["https://schema.org", "https://www.w3.org/ns/wp-context"],
    "type"     : "Book"
    …
}

Each Schema.org type defines a set of properties that are valid for use with it. To ensure that the manifest can be validated and processed by Schema.org aware processors, the manifest SHOULD contain only the properties associated with the selected type.

If properties from more than one type are needed, the manifest MAY include multiple type declarations.

{
    "@context" : ["https://schema.org", "https://www.w3.org/ns/wp-context"],
    "type"     : ["Book", "VisualArtwork"],
    …
}

User agents SHOULD NOT fail to process manifests that are not valid to their declared Schema.org type(s).

Refer to the Schema.org site for the complete list of CreativeWork subtypes.

partial dictionary WebPublicationManifest {
    required sequence<DOMString> type;
};

Properties

The naming, syntax, and requirements for manifest properties are defined in .

Although authors only have to understand the serialization requirements for manifest terms, they are encouraged to read through the full definitions for each property. The definitions describe, in some cases, how items are compiled into the Canonical Manifest in the absence of explicit information.

Relative URLs

Relative URL strings MAY be used in the manifest. These URLs are resolved to absolute URL strings using a base URL [[!url]].

The base URL for relative URLs is determined as follows:

By consequence, relative URLs in embedded manifests are resolved against the URL of the primary entry page unless the page declares a base direction (i.e., in a <base> element in its header).

The usage (or not) of the <base> element for embedded manifests is currently the subject of several issues in the JSON-LD Working Group: JSON-LD #22, JSON-LD #57, and, ultimately, TAG #312.

Embedding

A manifest MAY be embedded only in the primary entry page. In this case, the manifest MUST be included in a script element [[!html]] whose type attribute is set to application/ld+json.

Additionally, the script element MUST include a unique identifier in an id attribute [[!html]]. This identifier ensures that the manifest can be referenced.

<script id="example_manifest" type="application/ld+json">
   {
      …
   }
</script>

Linking To a Manifest

With the exception of the primary entry page, linking a resource to its Web Publication manifest is OPTIONAL. Including a link is encouraged whenever possible, however, as it allows user agents to immediately ascertain that a resource belongs to a Web Publication regardless of how the user reaches the resource.

Links to a Web Publication manifest MUST take one or both of the following forms:

  • An HTTP Link header field [[!rfc5988]] with its rel parameter set to the value "publication".

    Link: <https://example.com/webpub/manifest>; rel=publication
  • A link element [[!html]] with its rel attribute set to the value "publication".

    <link href="https://example.com/webpub/manifest" rel="publication"/>

When a manifest is embedded within an HTML document, the link MUST include a fragment identifier that references the script element that contains the manifest (see ).

	<link href="#example_manifest" rel="publication">
	…
	<script id="example_manifest" type="application/ld+json">
	{
        "@context" : ["https://schema.org", "https://www.w3.org/ns/wp-context"],
        …
	}
	</script>

The exact value of rel is still to be agreed upon and should be registered by IANA.

The following details might be moved to the lifecycle section in a future draft.

When a resource links to multiple manifests, a user agent MAY choose to present one or more alternatives to the end user, or choose a single alternative on its own. The user agent MAY choose to present any manifest based upon information that it possesses, even one that is not explicitly listed as a parent (e.g., based upon information it calculates or acquires out of band). In the absence of a preference by user agent implementers, selection of the first manifest listed is suggested as a default.

Web Publication Bounds

A Web Publication consists of a finite set of resources that represent its content. This extent is known as its bounds and is defined within its manifest — it is obtained from the union of resources listed in the default reading order and resource list.

To determine whether a resource is within the bounds of a Web Publication, user agents MUST compare the absolute URL of a resource to the absolute URLs of the resources obtained from the union. If the resource is identified in the enumeration, it is within the bounds of the Web Publication. All other resources are external to the Web Publication.

Resources within the bounds of a Web Publication do not have to share the same domain.

Resources

A Web Publication MUST include at least one HTML document [[!html]]—the primary entry page.

There are no restrictions on a Web Publication beyond this requirement. The Web Publication MAY include references to resources of any media type, both in the default reading order and as dependencies of other resources.

When adding resources to a Web Publication, consider support in user agents. The use of progressive enhancement techniques and the provision of fallback content, as appropriate, will ensure a more consistent reading experience for users regardless of their preferred user agent.

Primary Entry Page

The primary entry page is a key [[!HTML]] document required of every Web Publication. It represents the preferred starting resource for discovery of the Web Publication and enables discovery of the manifest.

Although any resource can link to the Web Publication manifest, the primary entry page typically introduces the publication and provides access to the content. It might contain all the content, in the case of a single-page Web Publication, or provide navigational aids to begin reading a multi-document Web Publication. To facilitate the user ease of consumption, the primary entry page SHOULD contain the table of contents.

It is not required that the primary entry page be included in the default reading order, nor that it be the first document listed when it is included. This specification leaves the exact nature of the document intentionally underspecified to provide flexibility for different approaches (e.g., the primary entry page could be a marketing document for the Web Publication instead of a specific page of content). If a default reading order is not provided, however, the primary entry page will be used as the default entry.

The primary entry page is the only resource in which a manifest can be embedded. To ensure discovery of the manifest, the primary entry page MUST provide a link to the manifest, regardless of whether the manifest is embedded within the page or external to it.

The address of the primary entry page is also the canonical identifier for the Web Publication (i.e., it serves as the unique identifier for the Web Publication).

In certain cases where information has been omitted from the manifest, user agents will sometimes use the primary entry page as a fallback source of information (see language and base direction and title).

Table of Contents

The table of contents provides a hierarchical list of links that reflects the structural outline of the major sections of the Web Publication.

The table of contents is expressed via an [[!html]] element (typically a nav element) in one of the resources. This element MUST be identified by the role attribute [[!html]] value "doc-toc" [[!dpub-aria-1.0]], and MUST be the first element in the document — in document tree order [[!dom]] — with that role value.

If the table of contents is not located in the primary entry page, the manifest SHOULD identify the resource that contains the structure.

When specified, the table of content MUST include a link to at least one resource, and all links SHOULD refer to resources within publication bounds.

Refer to the table of contents property definition for more information on how to identify which resource contains the table of contents.

Do we need a more detailed definition for the HTML TOC format?

Page List

The page list is a list of links that provides navigation to static page demarcation points within the content. These locations allow users to coordinate access into the content, for example. The exact nature of these locations is left to content creators to define. They usually correspond to pages of a print document which is the source of the digital publication, but might be a purely digital creation added to ease navigation.

The page list is expressed via an [[!html]] element (typically a nav element) in one of the resources. This element MUST be identified by the role attribute [[!html]] value "doc-pagelist" [[!dpub-aria-1.0]], and MUST be the first element in the document — in document tree order [[!dom]] — with that role value.

If the page list is not located in the primary entry page, the manifest SHOULD identify the resource that contains the structure.

There are no requirements on the page list itself, except that, when specified, it MUST include a link to at least one resource.

Refer to the pagelist property definition for more information on how to identify which resource contains the page list.

Web Publication Properties

Introduction

The Web Publication manifest is defined by a set of properties that describe the basic information a user agent requires to process and render a Web Publication. These properties are categorized as followed:

descriptive properties

Descriptive properties describe aspects of a Web Publication, such as its title, creator, and language. These properties are primarily drawn from Schema.org and its hosted extensions [[schema.org]], so they map to one or several Schema.org properties and inherit their syntax and semantics. (The following property categories typically do not have Schema.org equivalents, so are defined specifically for Web Publications.)

resource categorization

Resource categorization properties describe or identify common sets of resources, such as the resource list and default reading order. These properties refer to one or more resources, such as HTML documents, images, script files, and separate metadata files.

informative properties

Informative properties identify resources that contain additional information about the Web Publication, such as its privacy policy or accessibility report.

structural properties

Structural properties identify key meta structures of the Web Publication, such as the cover image, table of contents, and page list.

The categorization of properties is done to simplify comprehension of their purpose; the groupings have no relevance outside this specification (i.e., the groupings do not exist in the manifest).

Each manifest item drawn from schema.org identifies the property it maps to and includes its defining type in parentheses. Properties are often available in many types, however, as a result of the schema.org inheritance model. Refer to each property definition for more detailed information about where it is valid to use.

Schema.org additionally includes a large number of properties that, though relevant for publishing, are not mentioned in this specification — Web Publication authors can use any of these properties. This document defines only the minimal set of manifest items.

There are discussion on whether a best practices document would be created, referring to more schema.org terms. If so, it should be linked from here.

Requirements

The requirements for the expression of Web Publication properties are defined as follows:

REQUIRED:
RECOMMENDED:

These properties do not all have to be serialized in the authored manifest. Refer to each property's definition to determine whether it is required in the manifest or can be compiled into the canonical manifest from other information.

Quick Reference

The way that properties are expressed in the manifest often differs from how they are referred to using natural language. The following table provides a mapping between property names and the sections where they are explained to help clarify the differing nomenclature:

Property Name Defined In
accessMode Accessibility
accessModeSufficient Accessibility
accessibilityAPI Accessibility
accessibilityControl Accessibility
accessibilityFeature Accessibility
accessibilityHazard Accessibility
accessibilitySummary Accessibility
artist Creators
author Creators
contents Table of Contents
contributor Creators
creator Creators
dateModified Last Modification Date
datePublished Publication Date
editor Creators
https://www.w3.org/ns/wp#accessibility-report Accessibility Report
https://www.w3.org/ns/wp#cover Cover
https://www.w3.org/ns/wp#pagelist Pagelist
id Canonical Identifier
illustrator Creators
inDirection Direction
inker Creators
inLanguage Language
letterer Creators
link Links
name Title
penciler Creators
privacy-policy Privacy Policy
publisher Creators
readBy Creators
readingOrder Default Reading Order
readingProgression Reading Progression Direction
resources Resource List
translator Creators
url Address

Descriptive Properties

Accessibility

The accessibility properties provides information about the suitability of a Web Publication for consumption by users with varying preferred reading modalities. These properties typically supplement an evaluation against established accessibility criteria, such as those provided in [[WCAG20]]. (For linking to a detailed accessibility report, see .)

The following properties are categorized as accessibility properties:

Term Description Required Value [[!schema.org]] Mapping
accessMode The human sensory perceptual system or cognitive faculty through which a person may process or perceive information. One or more text(s). Expected values. accessMode (CreativeWork)
accessModeSufficient A list of single or combined accessModes that are sufficient to understand all the intellectual content of a resource. One or more ItemList. Expected values. accessModeSufficient (CreativeWork)
accessibilityAPI Indicates that the resource is compatible with the referenced accessibility APIs. One or more text(s).Expected values. accessibilityAPI (CreativeWork)
accessibilityControl Identifies input methods that are sufficient to fully control the described resource. One or more text(s). Expected values. accessibilityControl (CreativeWork)
accessibilityFeature Content features of the resource, such as accessible media, alternatives and supported enhancements for accessibility. One or more text(s). Expected values. accessibilityFeature (CreativeWork)
accessibilityHazard A characteristic of the described resource that is physiologically dangerous to some users. One or more text(s).Expected values. accessibilityHazard (CreativeWork)
accessibilitySummary A human-readable summary of specific accessibility features or deficiencies, consistent with the other accessibility metadata but expressing subtleties such as “short descriptions are present but long descriptions will be needed for non-visual users” or “short descriptions are present and no long descriptions are needed.” Text. accessibilitySummary (CreativeWork)

Detailed descriptions of these properties are available on the WebSchemas Wiki site.

Values SHOULD be drawn from the preferred vocabulary for each accessibility property, but user agents MUST NOT omit values from that are not included in the lists when generating the canonical manifest.

The author can also provide a reference to a detailed Accessibility Report if more information is needed than can be expressed by these properties.

partial dictionary WebPublicationManifest {
    sequence<DOMString> accessMode;
    sequence<DOMString> accessModeSufficient;
    sequence<DOMString> accessibilityAPI;
    sequence<DOMString> accessibilityControl;
    sequence<DOMString> accessibilityFeature;
    sequence<DOMString> accessibilityHazard;
    LocalizableString      accessibilitySummary;
};
{
    "@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
    "type"     : "CreativeWork",
    …
    "accessibilityAPI"      : "ARIA",
    "accessMode"            : ["textual", "visual"],
    "accessModeSufficient"  : [
        {
            "type"           : "ItemList",
            "itemListElement": ["textual", "visual"]
        },
        {
            "type"           : "ItemList",
            "itemListElement": ["textual"]
        }
    ],
    …
}

Address

A Web Publication's address is a URL [[!url]] that represents the primary entry page for the Web Publication. It is expressed using the url property.

Term Description Required Value [[!schema.org]] Mapping
url URL of the primary entry page. A URL [[!url]]. url (Thing)

If the address does not resolve to an HTML document [[!html]], user agents SHOULD NOT provide access to the resource to users. A Web Publication MAY have more than one address, but all the addresses MUST resolve to the same document.

The referenced document SHOULD be a resource of the Web Publication. It can be any resource, including one that is not listed in the default reading order. This document MUST include a link to the manifest to ensure a bidirectional linking relationship (i.e., that user agents can also locate the manifest from the document at the address).

If the document is not a Web Publication resource, user agents SHOULD load the first document in the default reading order when initiating the Web Publication.

To improve the usability of Web Publications, particularly in user agents that do not support Web Publications, authors are encouraged to include navigation aids in the referenced document that facilitate consumption of the content, (e.g., provide a table of contents or a link to one).

The Web Publication's address can also be used as value for an identifier link relation [[link-relation]].
partial dictionary WebPublicationManifest {
    required DOMString url;
};
{
    "@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
    "type"     : "Book",
    …
    "url"      : "https://publisher.example.org/mobydick",
    …
}

Canonical Identifier

A Web Publication's canonical identifier is a unique identifier that resolves to the preferred version of the Web Publication. It is expressed using the id property.

Term Description Required Value [[!schema.org]] Mapping
id Preferred version of the Web Publication. A URL [[!url]]. (None)

Ensuring uniqueness of canonical identifiers is outside the scope of this specification. The actual achievable uniqueness depends on such factors as the conventions of the identifier scheme used and the degree of control over assignment of identifiers.

The canonical identifier is intended to provide a measure of permanence above and beyond the Web Publication's address(es). If a Web Publication is permanently relocated to a new URL, for example, the canonical identifier provides a way of discovering the new location (e.g., a DOI registry could be updated with the new URL, or a redirect could be added to the URL of the canonical identifier). It is also intended to provide a means of identifying instances of the same Web Publication hosted at different URLs.

If a URL is not provided in the manifest, or the value is an invalid URL, the Web Publication does not have a canonical identifier. User agents MUST NOT attempt to construct a canonical identifier from any other identifiers provided in the manifest for the canonical manifest.

Is a canonical identifier necessary to call out explicitly, or can it be handled by other metadata.

The specification of the canonical identifier MAY be complemented by the inclusion of additional types of identifiers for the Web Publication using the identifier property [[!schema.org]] and/or its subtypes.

partial dictionary WebPublicationManifest {
    DOMString id;
};
{
    "@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
    "type"     : "TechArticle",
    …
    "id"       : "http://www.w3.org/TR/tabular-data-model/",
    "url"      : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
    …
}
{
    "@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
    "type"     : "Book",
    …
    "isbn"     : "9780123456789",
    "url"      : "https://publisher.example.org/mobydick",
    …
}

Creators

A creator is an individual or entity responsible for the creation of the Web Publication.

The following properties are categorized as creators:

Term Description Required Value [[!schema.org]] Mapping
artist The primary artist for the publication, in a medium other than pencils or digital line art. One or more Person. artist (VisualArtwork)
author The author of the publication. One or more Person and/or Organization. author (CreativeWork)
colorist The individual who adds color to inked drawings. One or more Person. colorist (VisualArtwork)
contributor Contributor whose role does not fit to one of the other roles in this table. One or more Person and/or Organization. contributor (CreativeWork)
creator The creator of the publication. One or more Person and/or Organization. creator (CreativeWork)
editor The editor of the publication. One or more Person. editor (CreativeWork)
illustrator The illustrator of the publication. One or more Person. illustrator (Book)
inker The individual who traces over the pencil drawings in ink. One or more Person. inker (VisualArtwork)
letterer The individual who adds lettering, including speech balloons and sound effects, to artwork. One or more Person. letterer (VisualArtwork)
penciler The individual who draws the primary narrative artwork. One or more Person. penciler (VisualArtwork)
publisher The publisher of the publication. One or more Person and/or Organization. publisher (CreativeWork)
readBy A person who reads (performs) the publication (for audiobooks). One or more Person. readBy (Audiobook)
translator The translator of the publication. One or more Person and/or Organization. translator (CreativeWork)

Creators are represented in one of the following two ways:

  1. as a string encoding the name of a Person [[!schema.org]]; or
  2. as an instance of a Person or Organization [[!schema.org]].

In other words, a single string value is a shorthand for a [[!schema.org]] Person whose name property is set to that string value. (See also .)

When compiling each set of creator information from a [[!schema.org]] Person or Organization type, user agents MUST retain the following information when available:

type
One or more strings that identifies the type of creator. This sequence SHOULD include "Person" or "Organization".
name
One or more localizable strings for the name of the creator.
id
A canonical identifier of the creator as a URL. [[!url]]
url
An address for the creator in the form of a URL. [[!url]]

Note that user agents MAY interpret a wider range of creator properties defined by Schema.org than the ones in the preceding list.

The manifest MAY include more than one of each type of creator.

partial dictionary WebPublicationManifest {
    sequence<CreatorInfo> artist;
    sequence<CreatorInfo> author;
    sequence<CreatorInfo> colorist;
    sequence<CreatorInfo> contributor;
    sequence<CreatorInfo> creator;
    sequence<CreatorInfo> editor;
    sequence<CreatorInfo> illustrator;
    sequence<CreatorInfo> inker;
    sequence<CreatorInfo> letterer;
    sequence<CreatorInfo> penciler;
    sequence<CreatorInfo> publisher;
    sequence<CreatorInfo> readBy;
    sequence<CreatorInfo> translator;
};


dictionary CreatorInfo {
             sequence<DOMString>         type;                     
    required sequence<LocalizableString> name;
             DOMString                      id;
             DOMString                      url;
};
{
    "@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
    "type"     : "Book",
    …
    "url"      : "https://publisher.example.org/mobydick",
    "author"   : {
        "type"  : "Person",
        "name"  : "Herman Melville"
    }
}
{
    "@context"   : ["https://schema.org","https://www.w3.org/ns/wp-context"],
    "type"       : "TechArticle",
    …
    "id"         : "http://www.w3.org/TR/tabular-data-model/",
    "url"        : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
    "author"     : [
        "Jeni Tennison",
        {
            "type"  : "Person",
            "name" : "Gregg Kellogg",
        },{
            "type"  : "Person",
            "name" : "Ivan Herman",
            "id"   : "https://www.w3.org/People/Ivan/"
        }
    ],
    "editor"    : [
        "Jeni Tennison",
        {
            "type"  : "Person",
            "name" : "Gregg Kellogg",
        }
    ],
    "publisher" : {
        "type"  : "Organization",
        "name" : "World Wide Web Consortium",
        "id"   : "https://www.w3.org/"
    }
    …
}

Language and Base Direction

The Web Publication has a natural language value (e.g., English, French, Chinese), as well as a natural base writing direction (the display direction, either left-to-right or right-to-left). The manifest has entries to set these values, which can influence, for example, the behavior of a user agent (e.g., it might place a pop-up for a table of contents on the right hand side for publications whose natural base direction is right-to-left).

Similarly, each natural language property value in the Web Publication's manifest (e.g., title, creators) is localizable [[string-meta]], meaning that the same information is available for each.

As a result, the manifest has entries to set:

  • the natural language, and
  • the base direction

of both the Web Publication and the natural language properties values of the manifest.

If a user agent requires the language and one is not available in the authored manifest (either globally or specifically for that property), or the obtained value is invalid, the user agent MAY attempt to determine the language when generating the canonical manifest. This specification does not mandate how such a language tag is created. The user agent might:

  • use the non-empty language declaration of the manifest;
  • use the first non-empty language declaration found in a resource in the default reading order;
  • calculate the language using its own algorithm.

No default values are specified for the language or the default base direction.

Proposal for handling localizable texts (writeup of the F2F discussions)

Global Language and Direction

The manifest MAY include global language and base direction declarations for the Web Publication using the following properties.

Term Description Required Value [[!schema.org]] Mapping
inLanguage Default language for the Web Publication as well as the textual manifest values language code as defined in [[!bcp47]] inLanguage (Property)
inDirection Default base direction for the Web Publication as well as the textual manifest values ltr, rtl, or auto (None)

The natural language MUST be a tag that conforms to [[!bcp47]], while the base language direction MUST have one of the following values:

  • ltr: indicates that the textual values are explicitly directionally set to left-to-right text;
  • rtl: indicates that the textual values are explicitly directionally set to right-to-left text;
  • auto: indicates that the textual values are explicitly directionally set to the direction of the first character with a strong directionality.

When specified, these properties are also used as defaults for textual values in the manifest.

It is important to differentiate the language of the publication from the language and the base direction of the individual resources that compose it. If such resources are, for example, in HTML, the language and direction need to be set in those resources, too. The language and base direction of the publication are not inherited.

The global language information MAY be overridden by individual values.

If the manifest is embedded in the primary entry page via a script element, and the manifest does not set the global language and/or the base direction (see ), the lang and the dir attributes of the script element are used as the global language and base direction, respectively (see the details on handling the lang and dir attributes in [[!html]]).

It is to be discussed whether this last paragraph, i.e., inheriting values from script, should be kept.

If authors intend to use a manifest, or a manifest template, both as embedded manifest and as a separate resource, they are strongly encouraged to set these properties explicitly to avoid interference of the containing script element in case of embedding.

partial dictionary WebPublicationManifest {
    DOMString     inLanguage;
    TextDirection inDirection;
};

enum TextDirection {
    "ltr",
    "rtl",
    "auto"
};
Item-specific Language

It is possible to set the language for any textual value in the manifest. This information MUST be set as a localizable string, i.e., using the value and language terms (instead of a simple string) [[!json-ld]]:

{
    "@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
    "type"     : "Book",
    …
    "author" : {
        "type"  : "Person",
        "name" : {
            "value"    : "Marcel Proust",
            "language" : "fr"
        }
    }
}

The value of the language MUST be set to a language code as defined in [[!bcp47]].

When used in a context of localizable texts, a simple string value is a shorthand for a localizable string, with the value set to the string value, and the language set to the value of the inLanguage property, if applicable, and unset otherwise. In other words, the previous example is equivalent to:

{
    "@context"   : ["https://schema.org","https://www.w3.org/ns/wp-context"],
    "type"       : "Book",
    "inLanguage" : "fr",
    …
    "author"     : "Marcel Proust",
}

(See also .)

It is not possible to set the direction explicitly for a value.

Setting the direction for a natural text value is currently not possible in JSON-LD [[json-ld]]. In case the JSON-LD community, as well as the schema.org community, introduces such a feature, future versions of this specification may extend the ability of Web Publication Manifests to include this.

When using Web Publication manifests with bidirectional text, user agents SHOULD identify the base direction of any given natural language value by scanning the text for the first strong directional character. Once the base direction has been identified, user agents MUST determine the appropriate rendering and display of natural language values according to the Unicode Bidirectional Algorithm [[!bidi]]. This could require wrapping additional control characters or markup around the string prior to display, in order to apply the base direction. (See .)

dictionary LocalizableString {
    required DOMString value;
             DOMString language;
};

Last Modification Date

The last modification date is the date when the Web Publication was last updated (i.e., whenever changes were last made to any of the resources of the Web Publication, including the manifest). It is expressed using the dateModified property.

Term Description Required Value [[!schema.org]] Mapping
dateModified Last modification date of the publication. A Date or DateTime value [[!schema.org]], both expressed in ISO 8601 Date, or Date Time formats, respectively [[iso8601]]. dateModified (CreativeWork)

The last modification date does not necessarily reflect all changes to the Web Publication (e.g., third-party content could change without the author being aware). User agents SHOULD check the last modification date of individual resources to determine if they have changed and need updating.

partial dictionary WebPublicationManifest {
    DOMString dateModified;
};
{
    "@context"     : ["https://schema.org","https://www.w3.org/ns/wp-context"],
    "type"         : "TechArticle",
    …
    "id"           : "http://www.w3.org/TR/tabular-data-model/",
    "url"          : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
    "dateModified" : "2015-12-17",
    …
}

Publication Date

The publication date is the date on which the Web Publication was originally published. It represents a static event in the lifecycle of a Web Publication and allows subsequent revisions to be identified and compared. It is expressed using the datePublished property.

Term Description Required Value [[!schema.org]] Mapping
datePublished Creation date of the publication. A Date or DateTime, both expressed in ISO 8601 Date, or Date Time formats, respectively [[!iso8601]]. datePublished (CreativeWork)

The exact moment of publication is intentionally left open to interpretation: it could be when the Web Publication is first made available online or could be a point in time before publication when the Web Publication is considered final.

partial dictionary WebPublicationManifest {
    DOMString datePublished;
};
{
    "@context"      : ["https://schema.org","https://www.w3.org/ns/wp-context"],
    "type"          : "TechArticle",
    …
    "id"            : "http://www.w3.org/TR/tabular-data-model/",
    "url"           : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
    "datePublished" : "2015-12-17",
    "dateModified"  : "2016-01-30",
    …
}

Reading Progression Direction

The reading progression establishes the reading direction from one resource to the next within a Web Publication. It is expressed using the readingDirection property.

Term Description Required Value [[!schema.org]] Mapping
readingProgression Reading direction from one resource to the other. ltr or rtl (None)

The value of this property MUST be either:

  • ltr: left-to-right;
  • rtl: right-to-left.

The default value is ltr.

This property has no effect on the rendering of the individual primary resources; it is only relevant for the progression direction from one resource to the other.

The reading progression of a Web Publication is used to adapt such publication level interactions as menu position, swap direction, defining tap zones to lead the user to the next and previous pages, touch gestures, etc.

If the readingProgression is not set, user agents MUST use the default value ltr when generating the canonical manifest.

partial dictionary WebPublicationManifest {
    ProgressionDirection readingProgression = "ltr";
};

enum ProgressionDirection {
	"ltr",
	"rtl"
};
{
    "@context"           : ["https://schema.org","https://www.w3.org/ns/wp-context"],
    "type"               : "Book",
    …
    "url"                : "https://publisher.example.org/mobydick",
    "readingProgression" : "ltr"
}

Title

The title provides the human-readable name of the Web Publication. It is expressed using the name property.

Term Description Required Value [[!schema.org]] Mapping
name Human-readable title of the Web Publication. One or more text items for the title. name (Thing)

The title is specified by the manifest expression, when present. If not included in the authored manifest, the user agent MUST use the value of the title element [[!html]] of the Web Publication’s primary entry page, if present, when generating the canonical manifest.

If the title is not available either in the authored manifest or as a non empty title element in the primary entry page, the user agent MUST create one. This specification does not specify what heuristics the user agent should use; it can, for example, use a language-specific placeholder title, use the URL of the manifest or the primary entry page, or use the value of the address in the authored manifest.

Relying on the title element could be semantically problematic if the Web Publication consists of several HTML resources (e.g., one per chapter of a book), because the HTML definition defines this element as "metadata" for the enclosing HTML document, not for a collection of resources. Using this element is, on the other hand, preferred in the case of a publication consisting of a single HTML document (e.g., a scholarly journal article).

A user agent is not expected to produce a meaningful title [[wcag20]] for a Web Publication when one is not specified.

partial dictionary WebPublicationManifest {
    required sequence<LocalizableString> name;
};
{
    "@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
    "type"     : "Book",
    …
    "url"      : "https://publisher.example.org/mobydick",
    "name"     : "Moby Dick"
}

Resource Categorization Properties

Web Publication resources are specified via the default reading order, the resource list, and the links, as defined in this section. These lists contain references to informative properties like the privacy policy, and structural properties like the table of contents.

Note that a particular resource's URL MUST NOT appear in more than one of these lists, and a URL MUST NOT be repeated within a list.

The manifest itself MUST NOT include a reference to itself, i.e., the reference to the manifest MUST NOT appear within these lists.

Default Reading Order

The default reading order is a specific progression through a set of Web Publication resources. A user might follow alternative pathways through the content, but in the absence of such interaction the default reading order defines the expected progression from one resource to the next.

The default reading order is expressed using the readingOrder property.

Term Description Required Value [[!schema.org]] Mapping
readingOrder

An array of:

  • a string, representing the URL [[url]] of the resource; or
  • an instance of a LinkedResource object

The order in the array is significant. The URLs MUST NOT include fragment identifiers. Non-HTML resources SHOULD be expressed as LinkedResource objects with their encodingFormat values set.

(None)

The default reading order MUST include at least one resource.

The default reading order is specified directly in the manifest, but MAY be omitted when it only consists of the primary entry page. When the default reading order is absent, user agents MUST include an entry for the primary entry page when compiling the canonical manifest.

If present in the Web Publication Manifest, this item MUST be mapped on the readingOrder term, defined specifically for Web Publications.

partial dictionary WebPublicationManifest {
   required sequence<LinkedResource> readingOrder;
};
{
    "@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
    "type"     : "Book",
    …
    "url"      : "https://publisher.example.org/mobydick",
    "name"     : "Moby Dick",
    "readingOrder" : [
        "html/title.html",
        "html/copyright.html",
        "html/introduction.html",
        "html/epigraph.html",
        "html/c001.html",
        …
    ]
}
{
    "@context" : ["https://schema.org","https://www.w3.org/ns/wp-context"],
    "type"     : "Book",
    …
    "url"      : "https://publisher.example.org/mobydick",
    "name"     : "Moby Dick",
    "readingOrder" : [{
        "type"           : "LinkedResource",
        "url"            : "html/title.html",
        "encodingFormat" : "text/html",
        "name"           : "Title page"
    },{
        "type"           : "LinkedResource",
        "url"            : "html/copyright.html",
        "encodingFormat" : "text/html",
        "name"           : "Copyright page"
    },{
        …
    }]
}

Resource List

The resource list enumerates any additional resources used in the processing and rendering of a Web Publication that are not already listed in the default reading order. It is expressed using the resources property.

Term Description Required Value [[!schema.org]] Mapping
resources

An array of:

  • a string, representing the URL [[url]] of the resource; or
  • an instance of a LinkedResource object

The order in the array is not significant. The URLs MUST NOT include fragment identifiers. It is RECOMMENDED to use LinkedResource objects with their encodingFormat values set.

(None)

The completeness of the resource list will affect the usability of the Web Publication in certain reading scenarios (e.g., the ability to read the Web Publication offline). For this reason, it is strongly RECOMMENDED to provide a comprehensive list of all of the Web Publication's constituent resources beyond those listed in the default reading order.

In some cases, a comprehensive list of these resources might not be easily achieved (e.g., third-party scripts that reference resources from deep within their source), but a user agent SHOULD still be able to render a Web Publication even if some of these resources are not identified as belonging to the Web Publication (e.g., when it is taken offline without them).

partial dictionary WebPublicationManifest {
    sequence<LinkedResource> resources = [];
};
{
    "@context"   : ["https://schema.org","https://www.w3.org/ns/wp-context"],
    "type"       : "TechArticle",
    …
    "id"         : "http://www.w3.org/TR/tabular-data-model/",
    "url"        : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
    …
    "resources"  : [
        "datatypes.html",
        "datatypes.svg",
        "datatypes.png",
        "diff.html",
        {
            "type"              : "LinkedResource",
            "url"               : "test-utf8.csv",
            "encodingFormat"    : "text/csv"
        },{
            "type"              : "LinkedResource",
            "url"               : "test-utf8-bom.csv",
            "encodingFormat"    : "text/csv"
        },{
            …
        }
    ],
    …
}

Informative Properties

Accessibility Report

An accessibility report provides information about the suitability of a Web Publication for consumption by users with varying preferred reading modalities. These reports typically identify the result of an evaluation against established accessibility criteria, such as those provided in [[WCAG21]], and are an important source of information in determining the usability of a Web Publication.

An accessibility report is identified using the https://www.w3.org/ns/wp#accessibility-report link relationship.

The manifest SHOULD include a link to an accessibility report when one is available for a Web Publication. It is RECOMMENDED that the report be included as a resource of the Web Publication.

It is also RECOMMENDED that the accessibility report be provided in a human-readable format, such as [[!html]]. Augmenting these reports with machine-processable metadata, such as provided in Schema.org [[!schema.org]], is also RECOMMENDED.

If present in the manifest, the accessibility report MUST be expressed as a LinkedResource. The rel value of the LinkedResource MUST include the https://www.w3.org/ns/wp#accessibility-report identifier.

The Working Group will attempt to define the accessibility-report term with IANA, to avoid using a URL.

partial dictionary WebPublicationManifest {
    LinkedResource accessibilityReport;
};
{
    "@context"   : ["https://schema.org","https://www.w3.org/ns/wp-context"],
    "type"       : "Book",
    …
    "url"        : "https://publisher.example.org/mobydick",
    "name"       : "Moby Dick",
    "links"  : [{
        "type"        : "LinkedResource",
        "url"         : "https://www.publisher.example.org/mobydick-accessibility.html",
        "rel"         : "https://www.w3.org/ns/wp#accessibility-report"
    },{
        …
    }],
    …
}

Privacy Policy

Users often have the legal right to know and control what information is collected about them, how such information is stored and for how long, whether it is personally identifiable, and how it can be expunged. Including a statement that addresses such privacy concerns is consequently an important part of publishing Web Publications. Even if no information is collected, such a declaration increases the trust users have in the content.

A privacy policy is identified using the privacy-policy link relationship.

A link to a privacy policy can be included in the manifest. It is RECOMMENDED that the privacy policy be included as a resource of the Web Publication.

It is RECOMMENDED that the privacy policy be provided in a human-readable format, such as HTML [[html]].

Refer to for more information about privacy considerations in Web Publications.

If present in the manifest, the privacy policy MUST be expressed as a LinkedResource. The rel value of the LinkedResource MUST include the privacy-policy identifier [[!iana-link-relations]].

partial dictionary WebPublicationManifest {
    LinkedResource privacyPolicy;
};
{
    "@context"   : ["https://schema.org","https://www.w3.org/ns/wp-context"],
    "type"       : "TechArticle",
    …
    "id"         : "http://www.w3.org/TR/tabular-data-model/",
    "url"        : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
    …
    "links"  : [{
        "type"           : "LinkedResource",
        "url"            : "https://www.w3.org/Consortium/Legal/privacy-statement-20140324",
        "encodingFormat" : "text/html",
        "rel"            : "privacy-policy"
    },{
            …
    }],
    …
}

Structural Properties

Cover

The cover is a resource that user agents can use to present the Web Publication (e.g., in a library or bookshelf, or when initially loading the Web Publication).

The cover is identified by the https://www.w3.org/ns/wp#cover link relationship.

The working group has not reached consensus on whether the cover should be any resource or should be limited to images.

The manfiest SHOULD include a reference to a cover.

More than one cover MAY be referenced from the manifest (e.g., to provide alternative formats and sizes for different device screens). If multiple covers are specified, each instance MUST define at least one unique property to allow user agents to determine its usability (e.g., a different format, height, width or relationship).

If present in the manifest, the cover MUST be expressed as a LinkedResource. The URL expressed in the url term MUST NOT include a fragment identifier.

The rel value of the LinkedResource MUST include the https://www.w3.org/ns/wp#cover identifier.

If the cover is in an image format, a title and description SHOULD be provided. User agents can use these properties to provide alternative text and descriptions when necessary for accessibility.

The Working Group will attempt to define the cover term by IANA, to avoid using a URL.

partial dictionary WebPublicationManifest {
    sequence<LinkedResource> cover;
};
{
    "@context"   : ["https://schema.org","https://www.w3.org/ns/wp-context"],
    "type"       : "Book",
    …
    "url"        : "https://publisher.example.org/donquixote",
    "name"       : "Don Quixote",
    "resources"  : [{
        "type"           : "LinkedResource",
        "url"            : "cover.html",
        "encodingFormat" : "text/html"
        "rel"            : "https://www.w3.org/ns/wp#cover"
    },{
        …
    }],
    …
}
{
    "@context"   : ["https://schema.org","https://www.w3.org/ns/wp-context"],
    "type"       : "Book",
    …
    "url"        : "https://publisher.example.org/mobydick",
    "name"       : "Moby Dick",
    "resources"  : [{
        "type"           : "LinkedResource",
        "url"            : "whale-image.jpg",
        "encodingFormat" : "image/jpeg",
        "rel"            : "https://www.w3.org/ns/wp#cover",
        "name"           : "Moby Dick attacking hunters",
        "description"    : "A white whale is seen surfacing from the water to attack a small whaling boat"
    },{
        …
    }],
    …
}
{
    "@context"   : ["https://schema.org","https://www.w3.org/ns/wp-context"],
    "type"       : "Book",
    …
    "url"        : "https://publisher.example.org/donquixote",
    "name"       : "Gulliver's Travels",
    "resources"  : [{
        "type"           : "LinkedResource",
        "url"            : "lilliput.jpg",
        "encodingFormat" : "image/jpeg",
        "rel"            : "https://www.w3.org/ns/wp#cover"
    },{
        "type"           : "LinkedResource",
        "url"            : "lilliput.svg",
        "encodingFormat" : "image/svg+xml",
        "rel"            : "https://www.w3.org/ns/wp#cover"
    },{
        …
    }],
    …
}

Page List

The pagelist property identifies the resource that contains the Web Publication's page list.

The page list is identified by the https://www.w3.org/ns/wp#pagelist link relationship.

User agents MUST compute the pagelist as follows:

  1. Identify the page list resource:
  2. If the page list resource contains an HTML element with the role [[!html]] value doc-pagelist [[!dpub-aria-1.0]], the user agent MUST use that element as the page list. If there are several such HTML elements the user agent MUST use the first in document tree order [[!dom]].

If this process does not result in a link to the page list, the Web Publication does not have a page list and this property MUST NOT be included in the canonical manifest.

The Working Group will attempt to define the pagelist term by IANA, to avoid using a URL.

If present in the manifest, the page list MUST be expressed as a LinkedResource. The URL expressed in the url term MUST NOT include a fragment identifier.

The rel value of the LinkedResource MUST include the https://www.w3.org/ns/wp#pagelist identifier.

The link to the page list MAY be specified in either the default reading order or resource-list, but MUST NOT be specified in both.

partial dictionary WebPublicationManifest {
    HTMLElement pagelist;
};
{
"@context"   : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"type"       : "Book",
…
"url"        : "https://publisher.example.org/mobydick",
"name"       : "Moby Dick",
"resources"  : [{
	"type"       : "LinkedResource",
	"url"        : "toc_file.html",
	"rel"        : "https://www.w3.org/ns/wp#pagelist"
},{
	…
}],
…
}

Table of Contents

The table of contents property identifies the resource that contains the Web Publication's table of contents.

The table of contents is identified by the contents link relationship.

User agents MUST compute the toc as follows:

  1. Identify the table of contents resource:
  2. If the table of contents resource contains an HTML element with the role [[!html]] value doc-toc [[!dpub-aria-1.0]], the user agent MUST use that element as the table of contents. If there are several such HTML elements the user agent MUST use the first in document tree order [[!dom]].

If this process does not result in a link to the table of contents, the Web Publication does not have a table of contents and this property MUST NOT be included in the canonical manifest.

Depending on the resolution to this issue, the manifest might contain a separate entry for a machine-processable table of contents, restrictions could be placed on the HTML structure of the referenced table of contents, or parsing rules for extracting a table of contents could be added.

If present in the manifest, the table of contents MUST be expressed as a LinkedResource. The URL expressed in the url term MUST NOT include a fragment identifier.

The rel value of the LinkedResource MUST include the contents identifier [[!iana-link-relations]].

The link to the table of contents MAY be specified in either the default reading order or resource-list, but MUST NOT be specified in both.

partial dictionary WebPublicationManifest {
    HTMLElement toc;
};
{
    "@context"   : ["https://schema.org","https://www.w3.org/ns/wp-context"],
    "type"       : "Book",
    …
    "url"        : "https://publisher.example.org/mobydick",
    "name"       : "Moby Dick",
    "resources"  : [{
        "type"       : "LinkedResource",
        "url"        : "toc_file.html",
        "rel"        : "contents"
    },{
        …
    }],
    …
}
<head>
    …
    <script type="application/ld+json">
    {
        "@context"        : ["https://schema.org","https://www.w3.org/ns/wp-context"],
        "type"            : "TechArticle",
        …
        "id"              : "http://www.w3.org/TR/tabular-data-model/",
        "url"             : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
        …
    }
    </script>
    …
</head>
<body>
    …
    <section role="doc-toc">
        …
    </section>
    …
</body>

Extensibility

The manifest is designed to provide a basic set of properties for use by user agents in presenting and rendering a Web Publication, but MAY be extended in the following ways:

  1. by the provision of linked metadata records.
  2. through the inclusion of additional properties in the manifest;

Although both methods are valid, the use of linked records is RECOMMENDED.

This specification does not define how such additional properties are compiled, stored or exposed by user agents in their internal representation of the manifest. A user agent MAY ignore some or all extended properties.

Linked records

Extending the manifest through links to a record, such as an ONIX [[onix]] or BibTeX [[bibtex]] file, MUST be expressed using a LinkedResource object, where:

  • the rel value of the LinkedResource SHOULD include a relevant identifier defined by IANA or by other organizations; if the link record contains descriptive metadata it MUST include the describedby (IANA) identifier;
  • the value of the encodingFormat in the link MUST use the MIME media type [[!rfc2046]] defined for that particular type of record, if applicable.

Linked records MUST be included in the resource list when they are part of the Web Publication (i.e., are needed for more than just manifest extensibility). Otherwise, they MUST be included in the links list.

{
"@context"   : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"type"       : "Book",
…
"url"        : "https://publisher.example.org/mobydick",
"name"       : "Moby Dick",
"links"  : [{
	"type"            : "LinkedResource",
	"url"             : "https://www.publisher.example.org/mobydick-onix.xml",
	"encodingFormat"  : "application/onix+xml",
	"rel"             : "describedby"
},{
	…
}],
…
}

The application/onix+xml MIME type has not yet been registered by IANA at the time of writing this document, and is included in the example for illustrative purposes only.

Additional Properties in the Manifest

Additional properties can be included directly in the manifest. It is RECOMMENDED that these properties be taken from public schemes like [[schema.org]] or [[dcterms]] and use values from controlled vocabularies whenever possible. Proprietary terms MAY be used, but it is RECOMMENDED that such terms be included using Compact IRIs [[!json-ld]], with prefixes defined as part of the context.

{
"@context"        : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"type"            : "TechArticle",
…
"id"              : "http://www.w3.org/TR/tabular-data-model/",
"url"             : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
"copyrightYear"   : "2015",
"copyrightHolder" : "World Wide Web Consortium",    
…
}
{
"@context"   : ["https://schema.org","https://www.w3.org/ns/wp-context"],
"type"       : "CreativeWork",
…
"id"         : "http://www.w3.org/TR/tabular-data-model/",
"url"        : "http://www.w3.org/TR/2015/REC-tabular-data-model-20151217/",
"dc:subject" : ["Web data description languages","Data integration","Data Exchange"]
…
}

A prefix definition dc for [[dcterms]] is included in the context file of [[schema.org]]. This means that it is not necessary to add the prefix explicitly. The same is true for a number of other public vocabularies; see the schema.org context file for further details.

Web Publication Lifecycle

See the diagrams in the appendix for a visual representation of the lifecycle algorithm.

Obtaining a manifest

The steps for obtaining a manifest, starting from the primary entry page, are given by the following algorithm. The algorithm, if successful, returns a processed manifest; otherwise, it terminates prematurely and returns nothing. In the case of nothing being returned, the user agent MUST ignore the manifest declaration.

  1. From the Document of the top-level browsing context of the primary entry page, let origin be the Document's origin, and manifest link be the first link element in tree order in Document whose rel attribute contains the publication token.
  2. If origin is an [[!html]] opaque origin, terminate this algorithm.
  3. If manifest link is null, terminate this algorithm.
  4. If manifest link's href attribute's value is the empty string, terminate this algorithm.
  5. If manifest link's href attribute's value is a relative URL, i.e., it points to origin and it has a non-null fragment identifying an identifier id in Document:
    1. Let embedded manifest script be the first script element in tree order, whose id attribute is equal to id and whose type attribute is equal to application/ld+json.
    2. If embedded manifest script is null, terminate this algorithm.
    3. Let text be the child text content of embedded manifest script
    4. Let base be the value of baseURI of the script element.
  6. Otherwise:
    1. Let manifest URL be the result of parsing the value of the href attribute, relative to the element's base URL. If parsing fails, then abort these steps.
    2. Let request be a new [[!fetch]] request, whose URL is manifest URL, and whose context is the same as the browsing context of the Document.
    3. If the manifest link's crossOrigin attribute's value is 'use-credentials', then set request's credentials to 'include'.
    4. Await the result of performing a fetch with request, letting response be the result.
    5. If response is a network error, terminate this algorithm.
    6. Let text be the result of UTF-8 decoding response's body.
    7. Let base be the value of manifest URL.
  7. Let json be the result of parsing text. If parsing throws an error, terminate this algorithm.
  8. If Type(json) is not Object, terminate this algorithm.
  9. Let canonical manifest be the canonical manifest derived from json, using the values of json, base, and Document as input to the algorithm described in .
  10. Check whether the canonical manifest fulfills the minimal requirements for a Web Publication Manifest, namely:
    • The JSON-LD context is set (see )
    • The Publication type is set (see )
    • The Publication address is set (see )
    If any of these requirements is not fulfilled, terminate the algorithm.
  11. Let processed manifest be the result of running processing a manifest given canonical manifest.
  12. Return processed manifest.

The algorithm does not describes how error and warning messages should be reported. This is implementation dependent.

Generating a Canonical Manifest

The steps to convert a Web Publication Manifest into a Canonical Manifest are given by the following algorithm. The algorithm takes the following arguments:

The steps of the algorithm are described below. As an abuse of notation, P["term"] refers to the value in the object P for the label "term", where P is either manifest, or an object appearing within manifest (e.g., a Person). The algorithm replaces or adds some terms to manifest; the replacement terms are expressed in JSON syntax as {"term":"value"}.

  1. let lang string represent the default language, set to:
  2. let dir string represents the base direction, set to:
  3. () if manifest["name"] is undefined, then locate the title HTML element using document. If that element exists and is non-empty, let t be its text content, and add to manifest:
    • if the language of title is explicitly set to the value of l, then add
      "name": [{"value": t, "language": l}]
    • or
      "name": [t]
      otherwise
  4. () if manifest["inLanguage"] is undefined and the value of lang is not undefined, add
    "inLanguage": lang
    to manifest
  5. () if manifest["inDirection"] is undefined and the value of dir is not undefined, add
    "inDirection": dir
    to manifest
  6. () if manifest["readingOrder"] is undefined, let u be the value of document.URL, and add
    "readingOrder": [{"type": ["LinkedResource"], "url": u}]
    to the manifest
  7. () consider P["term"], where P is any object in manifest (including itself) and term is If P["term"] is a single string or object v, then change the relevant term/value to
    "term": [v]
    Repeat this step for all possible values of P["term"].
  8. () if the value v in a manifest["term"] array, where term is one of the creator terms, is a simple string or localizable string, exchange that element in the array to
    {"type": ["Person"], "name": [v]}
    Repeat this step for all possible values of v.
  9. () if the value v in a manifest["term"] array, where term is one of the resource categorization properties, is a simple string, exchange that element in the array to
    {"type": ["LinkedResource"], "url": v}
    Repeat this step for all possible values of v.
  10. () let v be the value, or one of the values in case of an array, of P["term"], where P is any object in manifest (including itself) and term is:
    • accessibilitySummary; or
    • name; or
    • description.
    If v is a single string, then change the relevant term/value to:
    • if manifest[inLanguage] is set to the value of l then
      "term": {"value": v,"language": l}
    • otherwise
      "term": {"value": v}
    Repeat this step for all possible values of v.
  11. () if the value of P["term"], where P is any object in manifest (including itself) and term is:
    • url; or
    • id
    is a single string u which is not an absolute URL string, then resolve this value (considered to be a relative URL) using the value of base, yielding the value of au, and replace the term/value pair by
    "term": au
  12. Return the (transformed) manifest.

See the diagram in the appendix for a visual representation of the algorithm. Also, to help understanding the result of the algorithm, there is a link to the corresponding canonical manifests for all the examples in .

Some open issues, either in this working group or in the JSON-LD Working Group may modify some of the details above. These are:
  • The exact value of base (step (5.4) in ), the usage of the embedded values of lang and dir (steps (1) and (2) in this section) depend on JSON-LD #22, JSON-LD #57, and, ultimately, TAG #312.

Processing the manifest

The steps for processing a manifest are given by the following algorithm. The algorithm takes a json object representing a canonical manifest. The output from inputting a JSON object into this algorithm is a processed manifest. The goal of the algorithm is to ensure that the data represented in json abides to the minimal requirements on the data, removing, if applicable, non-conformant data.

  1. Let manifest object be the result of converting json to a WebPublicationManifest dictionary.
  2. Extension point: process any proprietary and/or other supported members at this point in the algorithm.
  3. Perform data cleanup operations on manifest object, possibly removing data, as well as raising warnings.
    1. Check whether the value of manifest object["url"] is a valid URL [[!url]]. If not, issue a warning.
    2. For all the terms defined in , except for accessModeSufficient and accessibilitySummary, check whether all tokens listed in manifest[term] are defined in the preferred vocabulary (see the list of expected values for each). Issue a warning for each unrecognized value.
    3. For all values in manifest object["accessModeSufficient"], check whether each token in each ItemList [[!schema.org]] is defined in the preferred vocabulary (see the list of expected values). Issue a warning for each unrecognized value.
    4. For all the terms defined in , check whether every object Obj in manifest object[term] has Obj["name"] set. If not, remove Obj from manifest object[term] array and issue a warning.
    5. Check whether the value of manifest object["name"] is not empty. If it is, generate a value (see the separate note for details) and issue a warning.
    6. For all the terms defined in , check whether every object Obj in manifest object[term] has Obj["url"] set. If not, remove Obj from manifest object[term] array and issue a warning. If yes, check whether Obj["url"] is a valid URL [[!url]] and, if not, issue a warning.
    7. Check whether manifest object["datePublished"] is a valid date or date-time, per [[iso8601]]. If the check fails, issue a warning.
    8. Check whether manifest object["dateModified"] is a valid date or date-time, per [[iso8601]]. If the check fails, issue a warning.
  4. Return manifest object.

User Agent Features

This section contains placeholders for possible reading enhancements/features the user agent may/should/must provide. The list is subject to addition, modification and removal as the enhancements get discussed in more detail.

Switch to publication mode

When a user agent obtains a manifest it SHOULD provide the option to switch the display to publication mode.

This feature has the following requirements:

  1. It MUST inform the user that the current resource is part of a Web Publication.
  2. It SHOULD display the title of the Web Publication.
  3. It MAY display additional metadata from the manifest.

Publication mode is a display mode implemented by the user agent that follows the conventions listed in presentation and navigation.

Presentation

Layout

The layout and rendering of Web Publications is governed by the same rules that apply to all Web content: HTML documents are styled and laid out according to the rules of CSS, SVG documents are rendered as defined by that format, etc. This specification requires no particular profile or subset of CSS, HTML, or SVG to be supported, other than the expectations set for these technologies by their respective specifications.

This specification intentionally avoids introducing any new layout features. Any shortcoming of the Web platform in terms of layout needs to be addressed for the whole Web platform, which means via CSS.

This working group will work with other relevant groups of the W3C to address platform-wide limitations that negatively impact Web Publications.

For the purposes of layout, each resource of a Web Publication is treated as a separate document. User agents MUST NOT mix content from multiple resources in the same rendering (e.g., CSS floats or absolutely positioned elements from one resource cannot intrude or overlap with content from an other resource).

Despite this general requirement that each resource should be treated as a separate document for the purpose of layout, there are some places where CSS specifications should be amended to be able to deal more intelligently with collections of resources like Web Publications.

One instance is the definition of cross-references, which are currently restricted to work only within a single document. This restriction should be relaxed to allow for cross-references between separate resources of a single Web Publication.

Another related would be to allow counters to accumulate across multiple resources of a single Web Publication (e.g., so that figures in multiple sections may be numbered in a single sequence).

User Settings

When a user agent renders a Web Publication, it SHOULD provide user settings to customize the experience.

User settings MAY include:

  • text size;
  • font family;
  • display mode (night, high contrast, etc.);
  • playback speed (for audio and video resources).

This specification does not cover how user agents override author styles to offer user settings.

To provide user settings in their reader mode, browsers usually get rid of most of the author styles. There is always a tension in reading environments between author styles and the user's preference, which is very hard to balance.

Scrolling or Paginating

Publications have historically been presented via paged media, whereas Web pages almost always scroll. As the preferences of individual readers vary, and as different types of publications are better suited for one or the other, this specification encourages user agents to support both, and to offer a choice to their users.

It might be useful for authors to be able to specify a preference between scrolling and pagination, even if a strict requirement is not possible. This should most likely be addressed through an extension of @viewport or of the viewport meta tag(see [[css-device-adapt]]), or possibly through an extension of @page (see [[css-page-3]]). This should be discussed with the relevant working groups (CSSWG, WebPlatformWG, WHATWG).

Paginated Layout

When a user agent renders a Web Publication in a paginated layout, it MUST lay out each document in the default reading order sequentially, with the last page of a resource being followed by the first page of the subsequent one.

To avoid blank pages, if a resource ends on a left page (resp. right page), the subsequent one should start on a right page (resp. left page) even if the page progression (see [[css-page-3]]) would otherwise lead to it starting on the opposite page. It should also be possible to use the break-before property (see [[!css-break-3]]) to force the content to resume on the opposite side if that was desired by the author.

[[css-page-3]] needs to be amended to describe this exception to the general behavior when dealing with collections of documents instead of individual documents.

How is pagination supposed to work when subsequent resources have opposite page progression directions (see [[css-page-3]]). For example, due to different a different writing mode? This is not necessarily a problem from a layout point of view, as each page is independent, but from an UI point of view. If swiping left means next page until the end of one chapter, and starts meaning previous page in the next chapter because the language is switched from English to Hebrew, this is going to be confusing.

[[css-page-3]] needs to be amended so that page counters are not automatically reset to at the beginning of each new resource belonging to the same Web Publication.

Navigation

Reading Order

Hyperlinks are the means by which multiple resources are linked together on the Web. When users reach the end of one resource, they have to activate a hyperlink to move to the next resource in the sequence. While this model of navigation is effective, it is also disruptive for immersive reading — it forces users to disengage from the content and perform the actions necessary to activate the links. It is also limited to media types that support hyperlinks.

The default reading order provides an enhancement to the hyperlink model, allowing the user agent to automatically move the user to the next resource when a more natural action occurs, like a swipe across the screen. It is similar conceptually and functionally to the link element's next and prev relationships [[!html]].

User agents MUST provide the ability to move forward and backward in the default reading order of a Web Publication.

Progression

While reading a Web Publication, the user follows a natural progression within a resource as well as between resources (following the default reading order).

User agents SHOULD provide the option to save this progression in the publication and returns the user to their last location the next time they open the publication.

When the user agent obtains a manifest for the first time, it MAY also prompt the user whether they would like to:

  • continue reading the publication from their current location; or
  • start reading the publication from the first resource in the default reading order.

Table of Contents

Short description

The user agent should provide access to the table of contents without leaving current resource from anywhere in the publication.

For accessibility reasons, it is RECOMMENDED for User Agents to use a table of contents to allow multiple ways for users to access content.

Affordances

The table of contents is a listed as a structural property in the manifest, see

The table of content is referred to in the Web Publication Manifest (see ) and is expressed using an HTML element; see for further details.

User agents MAY use the default reading order in the case a Table of Contents is not explicitly specified to create a table of contents.

Use Case References
Req. 12
“There should be a means to indicate the author’s preferred navigation structure among the resources of a Web Publication. A user agent needs to know the sequence in which to present components of a Web Publication to the user, including the starting point.” (See [[pwp-ucr]])
Req. 13
“Authors of a Web Publication should be able to provide the user agent with information to access random parts of the publication” (See [[pwp-ucr]])

Offline Access

Reading State

Short description

The user must be able to leave the Web Publication and return to it at the last position they left from. The User Agent must retain the reading position, based on the last known position of the reader in the web publication. The position should be based on the reader's position in the file, within the reading order.

The user agent may retain reading state if the web publication is revised.

Affordances

The navigation of the web publication should be defined in the required Default Reading Order.

User Agents should not have to set the reading state in the following type of resources:

  • External Links (i.e. a link to google.com)
  • Data references (i.e. a linked CSV file)
  • Multimedia content (i.e. a video)

Reading state should only apply to content documents listed as being within the bounds of the Web Publication.

Examples

Example 1:
Sarah is reading a long article on her way to work. She arrives before she has finished, but wants to continue from the place she left off. The user agent should remember her reading state for the next time she opens the publication.

Testing

If a tester opens a web publication in a WP-aware UA, moves ahead in the publication, closes the reader, then reopens it, they should be returned to the last known reading state.

Web Publication Locators

The document referred from this section, i.e., Web Annotation Extensions for Web Publications [[wpub-ann]], has been recently renamed. Its previous was "Locators for Web Publication". The terminology used in this section has to be realigned with the name change.

Locators are used to identify, locate, retrieve, and/or reference locations and content fragments within Web Publications (e.g., for address(es), bookmarks, and annotations). Locators traditionally take the form of fragment identifiers [[rfc3986]], where the portion of a URL preceded by a number sign character (#) identifies a specific position within the referenced resource.

For some use cases, it is essential to identify and reference a Web Publication resource—or a location in or a segment of a resource—in the scope or context of the Web Publication to which it belongs. A traditional fragment identifier cannot satisfy this requirement, since only the URL of the constituent resource containing the location or content fragment of interest is expressed. The Web Annotation Extensions for Web Publications [[wpub-ann]] document, based on the Web Annotation Model [[annotation-model]], addresses this issue by providing the means to express both the URL of the resource and the URL of the Web Publication.

Web Publication Locators also address the problem of referencing into a resource that was not authored with such a need in mind. A fragment identifier can only reference elements with explicit identifiers and locations with explicit anchor points. Web Publication Locators include a variety of selectors that work with the general structures and content of a resource (e.g., text selectors, CSS selectors).

As Web Publication Locators currently rely on a JSON-based expression syntax, it is not yet clear how much of this syntax can be translated to a fragment identifier. This may limit the usefulness beyond expressions that are also JSON-based (e.g., outside of annotations or bookmarks).

Illustrate with example of an easy to understand Web Publication Locator, such as might be used in annotating a simple Web Publication.

The semantics of Web Publication Locators are a mapping and extension of the Web Annotation Data Model [[annotation-model]] and Vocabulary [[annotation-vocab]] for describing and referencing a segment of a Web resource. As a result, Web Publication Locators provide the expressiveness needed for a broad range of annotation and bookmarking use cases. Additionally, Web Publication Locators provide a way to identify and reference a location within a Web Publication (i.e., as distinct from identifying and referencing a content fragment consisting of a span of characters or bytes). A Web Publication Locator can be used to identify, retrieve and/or reference a fragment of a Web Publication that spans multiple resources.

In composing a Web Publication Locator, use the canonical identifier of the Web Publication in preference to any alternative addresses. Such use facilitates the collation of Web Publication Locators associated with a particular Web Publication. URLs of Web Publication resources appearing in a Web Publication Locator should match the URL of the resource provided in the manifest.

Security

Placeholder for security issues.

Privacy

Placeholder for privacy issues.

Manifest Examples

Simple Book

A manifest for a simple book. The canonical version of this manifest is also available.


			

Single-Document Publication

Example for an embedded manifest example. The canonical version of the manifest is, as well as a more elaborate version for the same document are also available.


			

Audiobook

A manifest for an audiobook. The canonical version of this manifest is also available.


			

Examples for bidirectional texts

(These examples were originally published in the Activity Streams Recommendation [[activitystreams-core]].)

Character order in memory Direction Method Expected display
פעילות הבינאום, W3C rtl First strong directional character פעילות הבינאום, W3C
The document is titled, "&#x2067;פעילות הבינאום, W3C&#x2069;" ltr First strong directional character The document is titled, "פעילות הבינאום, W3C"
&#x200F;HTML היא שפת סימון rtl Bidi Control Character HTML היא שפת סימון
&#x200E;'سلام' is hello in Persian ltr Bidi Control Character 'سلام' is hello in Persian

Lifecycle diagrams

These diagrams provide a visual view of the lifecycle steps, as specified in .

Overview of the lifecyle algorithm

Overview of the lifecyle algorithm, depicting the main building blocks

Overview of the lifecyle algorithm, depicting the main building blocks.
See the normative description of the algorithm in . Image available in SVG and PNG formats.

Finding the manifest

First major block in the lifecyle algorithm: find the manifest, either through an HTTP request or as part of a script elements

First major block in the lifecyle algorithm: find the manifest, either through an HTTP request or as part of a script elements.
See the normative description of the algorithm in . Image available in SVG and PNG formats.

Manifest canonicalization

Second major block in the lifecyle algorithm: create a canonical manifest

Second major block in the lifecyle algorithm: create a canonical manifest.
See the normative description of the algorithm in . Image available in SVG and PNG formats.

Converting the manifest into a data structure

Third major block in the lifecyle algorithm: convert the manifest into a programming language dependent data structure that implements the WebIDL specification of the manifest.

Third major block in the lifecyle algorithm: convert the manifest into a programming language dependent data structure that implements the WebIDL specification of the manifest.
See the normative description of the algorithm in . Image available in SVG and PNG formats.

Cleaning up the data

Fourth major block in the lifecyle algorithm: check and clean up data by possibly removing data that cannot be interpreted.

Fourth major block in the lifecyle algorithm: check and clean up data by possibly removing data that cannot be interpreted.
See the normative description of the algorithm in . Image available in SVG and PNG formats.

Image Descriptions

Description for the "Structure of Web Publications" diagram:
A simplified diagram of the structure of a Web Publication. The Web Publication is broken down into two elements. The first element is the actual contents (all the real things listed in the manifest). This element is broken down into the CSS, the actual "things" such as the HTML documents, audio, etc, and the images, fonts etc. The actual "things" have an additional subset of items that includes the entry page to the publication and all of the other documents. The second element is the Manifest (JSON). The manifest is used to generate the canonical manifest, which consists of a list of all the "things" in the publication, the publication metadata, and the default reading order of content. It is noted in the diagram that the entry page has to link to the manifest. (Return to the diagram of Web Publication.)