W3C

Publishing Maintenance Working Group Telco

21 May 2026

Attendees

Present
Abe Jellinek, Charles LaPierre, Dale Rogers, Grigorily Manucharian, Hadrien Gardeur, Ivan Herman, Laurent Le Meur, Masakazu Kitahara, Matt Garrish, Shinya Takami, Susan Neuhaus
Regrets
Brady Duga, Gautier Chomel, Gregorio Pellegrino, Toshiaki Koike, Wendy Reid
Chair
Shinya Takami, Susan Neuhaus
Scribe
Susan Neuhaus

Meeting minutes

Do we need the concept of Annotation Sets? https://github.com/w3c/epub-specs/issues/3000

Laurent Le Meur: do we replace the annotation structure with a zip that could contain any annotation type plus any information about the book

Laurent Le MeurLM: it makes sense to have information about the book outside of the structure
… what shape should this take

Hadrien Gardeur: even if we have a zip we want a way of grouping the annotations together
… the concept of the set will continue to exist. we want to have a well known location
… we've agreed that there is a usefulness but a limited one for metadata
… mostly for services that need the metadata and for matching but this might not be so useful
… only when we have to present the metadata outside of the book
… so we should have few pieces of metadata
… citation styles would be a useful influence on the kinds of metadata we present

Ivan Herman: we are talking about standardizing the interchange format
… at that level, the set creates one single file for import and export
… when we accepted images, etc. we decided that zip is the format
… then later we decided that zip is the only accepted format
… we do not need the grouping function, one file that lists all the annotations is not needed
… if the annotations sets role is to have collective metadata, there are two things there,
… information about the generator and the publication

<Laurent Le MeurLM> current ebook metadata relative to the book: https://w3c.github.io/epub-specs/epub34/annotations/#about

Ivan Herman: the generator isn't a necessary item in the interchange format
… the metadata for the publication, is mostly copies of data in the package document
… so we are repeating those values.
… which seems superfluous
… my conclusion is that the simplest way to do this is to make the metadata package include a copy of the metadata in the publication

Laurent Le MeurM: there will be import and export of files plus push and pull of sets of annotaions in the rest API
… I don't think there is an issue choosing zip as a container
… it is also possible that our format should support this
… about the generator, many files include this kind of information, though I don't suggest including the date
… this can be useful to differentiate annotations that don't conform to the specifcations
… I would be worried about totally tieing our format to epub by using the opf in annotations
… and it mixes xml and json which is never great
… we just need descriptive information about the book, and no list of annotations
… just title, author, etc or something that links the annotation to the book

Hadrien Gardeur: we recently released Thorium on IOS and we have academic users
… we have almost 40% of everything that is read is in PDF
… so when we implement the highlight feature we will be asked to do it for PDF as well.
… if we want to implement this feature, it has to work for more than epub
… we need something as format agnostic as we can
… when we look at what we have for interchange, de-paged annotations, we define that we use a zip, that we have an extension and profile, and a list in .json
… so we have one document in which we have a list of all documentations even if we have a set
… by definition we need a set if we have mulitiple files
… now I think we have too much data, do we need dc date, dc creator?
… at a minimum we need something to identify annotations when they are separate from the book

Ivan Herman: we have currently a json structure for the set. I feel an itemized list isn't necessary
… we can have a sub directory for organizing the files
… coming back to metadata, we have never formalized that this should work outside of epub
… assuming this is a requirement now, it becomes a different discussion
… we should have a new issue about which metadata is necessary
… we must make clear how the identifiers are related to each other
… we cannot be silent about how this is related to the epub version
… I propose a pr that removes the itemized data from the annotation set/metadata
… we can discuss the exact name later, and we can further discuss it in the pr

Laurent Le MeurM: you will open a PR on the metadata? and we will leave the generator for now

Hadrien Gardeur: I think having one file will all the information is more efficient
… I think there is a utility in having it all in one place
… that means if you look at annotation sets, we have metadata including items
… which is necessary because you need names in json
… i'm not sure we need type, generator, etc
… we need a minimal set of metadata
… I would go for a light weight approach but keep the set

Ivan Herman: we have a clear difference in opinion; then I will hold off on the PR for now

Laurent Le MeurM: it would be useful to have the advice of other reading systems. I will try to get some feedback

Proposing a text on merging [annotations] w3c/epub-specs#3001

Ivan Herman: I have a question about when annotaions are merged. What can we describe normatively
… it turns out there is not much consistency in book metadata so we have to rely on the heuristics in the rs
… so we need a paragraph to make clear to readers what they can expect on import
… if I am a user and not an implementor, I may never see it
… I don't want to hide the text from reading systems either
… if we are able to separate the information into implementer and user sections, then this text should go into the user part

Laurent Le MeurM: most of section six is about what a user can expect
… there are sentences that are aimed at developers
… we can change the language to make it clear that implementers should pay attention to
… rather than duplicating information

Dale Rogers: I notice section 6 is called "best practices for reading systems" we could in each section say what audience a section is aimed at

Laurent Le MeurM: the risk is duplication and being unclear
… developers know how to deal with a use case, if we reorient this section more like use cases we can avoid splitting and duplicating

Ivan Herman: perhaps we could rewrite the section as use cases, and then have other sections or notes for implementors
… most of the sections would be use case oriented
… editorially, the simplest thing is to merge the PR and then we will move it during rewriting

Laurent Le MeurM: I can take a run on rewriting it, I'll show you the branch before the PR

Susan Neuhaus: Do users look at the reading system spec? I expect not, so rewriting it with a use case framing makes sense

Laurent Le MeurM: I think users won't read the spec but there is an interim party, people who make articles and usage notes who would need this information
… and they will pick up on this vocabulary

Add some context to the use of the term Segment in the Target section w3c/epub-specs#2990

Laurent Le MeurM: we know we can create bookmarks and annotations. A bookmark is a placeholder in a text or image but not so precise for a user
… an annotation is a highlight of text or other media, and has some range in the content
… the selectors we define can isolate a range even for a bookmark, since there is no specific marker for a book mark
… will we accept that the selector of a bookmark can be a range or must be a single point?

Hadrien Gardeur: reading systems usually have a specific affordance for bookmarks
… like an icon on the corner of the page, we look at what is currently displayed to the users
… and check if there is a bookmark there
… if there is a superlong range, the text could be displayed across many screens and cause problematic behavior
… i suspect rs will want to define a bookmark that won't cross the boundary of multiple screens

Ivan Herman: this is related to the previous discussion because th choice of selectors is done behind the scenes by the rs
… so this is a matter of the affordance that Laurent Le MeurM was talking about. So the situation you describe won't happen
… so reading systems should do this properly. We should say if a readers intent is to set a bookmark, we should use a single selector

Laurent Le MeurM: a text position selector has a beginning and end, and is a range
… do you agree that a bookmark should be a single location we should add that to the spec

Laurent Le MeurLM: if reading system A generates a bookmark, it will still work on reading system A, but export that annotation to reading system B and B handles bookmarks differently, we could have problems
… even if we include a specification, reading system B will have a sanitisation process
… that will then decide if the bookmark would be at the beginning or end of a range
… we can try to force whatever and people will do what they want
… I don't think we will be able to control what people will do
… it might be good to have some recommendations but reading systems shouldn't expect it

Dale Rogers: a reading system can do what it wants with our declarations, and we can make those declarations

Minutes manually created (not a transcript), formatted by scribe.perl version 248 (Mon Oct 27 20:04:16 2025 UTC).