This document defines a profile of the W3C Web Annotation Data Model by specifying a subset of the properties allowed in this model, and adding properties deemed useful to satisfy the Use Cases and Requirements for EPUB Annotations.

Subset of the W3C Annotation Data Model

This section defines the profile of the W3C Annotation Data Model used for EPUB annotations.

In the W3C Annotation Data Model, the core structure is the Annotation object, which contains properties defining the annotation's target and body.

W3C Annotations have 0 or more Bodies, whereas this document only defines a single Body per annotation. Such Body can be remote or embedded, whereas this document defines embedded Bodies only for textual comments (the inclusion mechanism is to be defined for audiovisual comments).

W3C Annotations have 1 or more Targets, whereas this document only defines a single Target per annotation. The Target of an EPUB Annotation is the content being annotated, which is a specific segment in a document within the EPUB package, defined by its relative Source URL and a Selector.

W3C Annotations can have multiple motivations, whereas this document only defines a single motivation per annotation.

W3C Annotations can have multiple creators, whereas this document only defines a single creator per annotation.

W3C Annotations can have additional properties not defined in this document, like a specific generator application and generated date per Annotation, whereas this document only defines these properties at the Annotation Set level. W3C Annotations also handle more property values and types than defined in this document.

W3C Annotations define an AnnotationCollection structure to handle paginated requests, whereas this document defines an AnnotationSet structure to group multiple annotations for sharing purposes. Implementers MUST support AnnotationSet and MAY support both structures for maximum compatibility.

Note: This document does not define how annotations are created, stored, or synchronized in a reading system.

Annotation

Annotation Object

This document retains the following annotation properties from the W3C Annotation Data Model:

Name	Description	Format	Required?
`@context`	The context that determines the meaning of the JSON as an Annotation. It MUST be http://www.w3.org/ns/anno.jsonld”.	string	Yes
`id`	The identity of the annotation. A uuid formatted as a URN is recommended.	URI	Yes
`type`	The RDF structure type. It MUST be "Annotation".	string	Yes
`motivation`	The motivation for the annotation's creation.	"bookmarking" \| "commenting" \| "highlighting"	No
`created`	The time when the annotation was created.	ISO 8601 datetime	Yes
`modified`	The time the annotation was modified after creation.	ISO 8601 datetime	No
`creator`	The creator of the annotation. This may be either a human or an organization.	Creator	No
`target`	The target content of the annotation.	Target	Yes
`body`	The annotation body.	Body	No

Note: The type of annotation should be considered when determining the value of the motivation property. An annotation with a Body structure corresponds to a "comment". An annotation without Body structure corresponds to a "highlight" if its Selector defines a range of characters, a space in an image or a time period, and a "bookmark" if it does not.

Question: should we add a "replying" motivation for annotations that are replies to other annotations?

Creator

Target

The target of an annotation associates the annotation with a specific segment of a resource in the current publication.

A Target with no Selector indicates that the annotation is targeting the entire target resource.

Source

Name	Description	Format	Required?
`id`	The identity of the creator.	URI	Yes
`type`	The RDF structure type. It MUST be "Person" or "Organization".	string	Yes
`name`	The name of the creator.	string	No

Name	Description	Format	Required?
`source`	The identity of the target EPUB resource.	URI	Yes
`selector`	The segment of the target EPUB resource that is annotated.	An array of Selector objects	No
`meta`	Indications that help locate the segment in the resource.	Meta	No

The target resource MUST be identified by the URL of an existing resource in the EPUB package. It MUST be one of the item/@href values of the manifest element .

Sample 2: the source of the annotation is the relative URL identifying an HTML document in an EPUB.

Selector

An annotation refers to a segment of a resource, which is identified by one or more Selectors. The nature of the Selectors and methods to describe segments depend on the resource type. Providing more than one Selector allows an annotation software to choose the most accurate selector from those it can handle and helps accommodate evolutions on the annotated resource.

Annotation selectors are specified in W3C Annotation Data Model, section Selectors . This specification filters selectors deemed useful for annotating publications and details the use of these selectors.

Note: New selectors will undoubtedly be defined in the coming months after discussion with members of the W3C Publishing Maintenance Working Group.

Body

Note: Read “Best practices for Reading Systems” about using a keyword in an annotation.

Name	Description	Format	Required?
`type`	The body type. It MUST be “TextualBody”.	string	Yes
`value`	The textual content of the annotation.	string	Yes
`format`	The media-type of the annotation value; "text/plain" by default; "text/markdown" is recommended.	rfc6838, rfc7763	No
`color`	The colour of the annotation; yellow by default.	"pink" \| "orange" \| "yellow" \| "green" \| "blue" \| "purple"	No
`highlight`	The style of the annotation; solid background by default.	"solid" \| "underline" \| "strikethrough" \| "outline"	No
`language`	The language of the annotation.	BCP47	No
`textDirection`	The direction of the text; left-to-right by default.	"ltr" \| "rtl"	No
`keyword`	Free text categorising the annotation.	string	No

Annotation Set

An Annotation does not contain information about its associated publication. If a set of annotations is shared as a detached file, it is mandatory to export with them information that will help find the associated publication even if the publication is not adequately identified.

Note: the AnnotationCollection defined by the W3C does not provide an adequate structure for sharing annotations either as a detached file or as a file embedded in a Zip package. The AnnotationCollection is intrinsically paginated and provides a way to retrieve annotations via a REST API.

The AnnotationSet object contains:

Name	Description	Format	Required?
`@context`	The context that determines the meaning of the JSON as an annotation set. It MUST be “ http://www.w3.org/ns/anno.jsonld” .	string	Yes
`id`	The identity of the annotation set. A uuid formatted as a URN is recommended.	URI	Yes
`type`	The RDF structure type. It MUST be "AnnotationSet".	string	Yes
`generator`	The agent responsible for the generation of the object serialisation.	Generator	No
`about`	Information relative to the publication.	About object	Yes
`generated`	The time when the set was generated.	ISO 8601 datetime	No
`title`	A title helping on the identification of the set.	string	No
`items`	The annotations of the set.	Array of Annotation objects	Yes

Generator

The Generator object contains information relative to the software from which the serialized annotation has been produced.

Name	Description	Format	Required?
`id`	The identity of the generator software. The recommended value is the GitHub URL of the application source code.	URI	Yes
`type`	The RDF structure type. It MUST be "Software".	string	Yes
`name`	The name of the generator software.	string	Yes
`homepage`	The home page presenting the generator software.	URL	No

About

The About object contains information relative to the publication. Such metadata in intended to help associate an annotation set with a publication:

Name	Description	Format	Required?
`dc:identifier`	Publication identifiers. An ISBN is preferred.	Array of strings	No
`dc:title`	The title of the publication.	string	No
`dc:format`	The media type of the publication.	string	No
`dc:publisher`	The name of the publisher.	string	No
`dc:creator`	The author(s) of the publication.	array of strings	No
`dc:date`	The release year.	calendar year using four digits	No

Note: all properties defined above are from the Dublin Core vocabulary, referenced in the Web Annotation Data Model.

Sample 8: An AnnotationSet containing one annotation.

				
					{
					  "@context": "http://www.w3.org/ns/anno.jsonld",
					  "id": "urn:uuid:123-123-123-123",
					  "type": "AnnotationSet",
					  "generator": "https://github.com/edrlab/thorium-reader/releases/tag/v3.1.0",
					  "generated": "2023-09-01T10:00:00Z",
					  "title": "Annotations Mme Prof, La Peste, cours 1ere B",
					  "about": {
					     "dc:identifier": [
					        "urn:isbn:1234567890"
					     ],
					     "dc:format": "application/epub+zip",
					     "dc:title": "Alice in Wonderland",
					     "dc:publisher": "Example Publisher",
					     "dc:creator": ["Anne O'Tater"],
					     "dc:date": "1865"
					  },
					  "items": [
					    {
					      "@context": "http://www.w3.org/ns/anno.jsonld",
					      "id": "urn:uuid:234-234-234-234",
					      "type": "Annotation",
					      "target": {
					      },
					      "body": {
					      }
					    }
					  ]
					}

Serialization

The examples throughout the document are serialized as [JSON-LD] using the Context given in Appendix A of the Annotation Vocabulary [annotation-vocab], which is the preferred serialization format. The media type of this format is "application/ld+json;profile="http://www.w3.org/ns/anno.jsonld".

This specification introduces a dedicated file extension for serialized AnnotationSets: .annotation .

Embedding annotations in EPUB

The OPTIONAL my.annotation file in the META-INF directory holds an AnnotationSet.

Best Practices for Reading Systems

This section is non-normative.

Displaying filtered annotations

Reading systems should enable filtering by motivation, colour, highlight mode, keyword and creator. For instance, a user can display "blue" annotations only or “teacher” annotations only. Filtering on multiple criteria is a plus.

Using multiple selectors

It is recommended that Reading Systems export multiple selectors, including at least one precise selector (e.g. CssSelector + TextPositionSelector) and one selector resistant to content modifications (e.g. ProgressionSelector).

When displaying an annotation, a Reading System is free to use the most precise Selector available. It will select an alternative Selector as a fallback in case the preferred one does not return a correct position in the publication: this can happen if the publication has been modified after the annotation has been created.

Not all selectors are equally easy to implement. Reading Systems MAY choose to support only a subset of the selectors defined in this specification.

The W3C Publishing Maintenance Working Group is expected to define one or more selectors reading systems are required to implement, as a lingua franca.

Exporting annotations as a detached file

When a user decides to export an annotation set from a reading system, he SHOULD be proposed to filter the annotations by keywords (multiple choice). “Annotations with no keyword” and “All annotations” SHOULD be proposed as options. The advantage of this practice is that, for instance, a user can export personal annotations (usually with no keyword) and leave “teacher” annotations unexported.

They MAY enter a title for the annotation set (empty by default). Such a title SHOULD become the exported filename.

They MUST be able to choose the directory in which the annotation set will be stored.

The file extension MUST be .annotation .

The application may propose alternative formats at export time: an HTML or markdown format with human-friendly references to the location of each annotation may be handy.

Exporting annotations in a publication

When a user decides to export a publication from the Reading System, he SHOULD be proposed to embed the annotations associated with the publication.

If the user decides to embed annotations in a publication, he SHOULD be proposed to filter the annotations by keywords (multiple choice).

Importing annotations

To simplify the association of annotations with a publication, a Reading System MUST offer a way to select a publication before selecting an annotation set. The drag and drop of an annotation set into a Reading System MAY also be proposed, but identifying the proper publication from the metadata in the annotation set is more complicated.

When importing an annotation set, a Reading System SHOULD display a message with the title of the annotation set and the number of annotations in the set. The Reading System MUST offer the user the choice to abort the import.

Each annotation is uniquely identified. If during the import of an annotation set, one or more annotations are re-imported, the Reading System MUST offer to the user the choice to override existing annotations or abort the import of the annotation set.

Dealing with colours

This document specifies a closed set of six colours chosen because of their extensive support in well-known reading systems. However, most existing reading apps offer a smaller set to their users.

If an application imports annotations with a colour it does not support, it should display them with a neutral colour. The recommended neutral colour is grey.

Some applications may support colours not in the set defined by this specification (e.g. brown). In this case, a 1-to-1 substitution at export time is required (e.g. brown to orange).

Note: we didn't spot applications with more than six annotation colours.