W3C

Publishing Maintenance Working Group Telco

15 January 2026

Attendees

Present
charles, Charles LaPierre, Dale Rogers, Brady Duga, Elizabeth Kraler, Gautier Chomel, GeorgeK, Grigorily Manucharian, Gregorio Pellegrino, Hadrien Gardeur, Ivan Herman, Daniel Kimberg, Masakazu Kitahara, Matt Garrish, Romain Deltour, Shinya Takami, Susan Neuhaus, Toshiaki Koike, Wendy Reid
Regrets
-
Chair
Wendy Reid
Scribe
Susan Neuhaus

Meeting minutes

EPUB Blog Post

<Wendy Reid> https://www.w3.org/blog/2026/epub-and-html-survey-results-and-next-steps/

Wendy Reid: the blog post about HTML survey has been posted

<Wendy Reid> https://www.w3.org/zh-hans/blog/2026/epub-and-html-survey-results-and-next-steps/

Wendy Reid: feel free to share it. There is a translation into Chinese, and a Japanese translation is coming

Annotations Vocabulary

Wendy Reid: first up the annotations vocabulary

Ivan Herman: this is a technical thing we must do
… the annotation work relies on the W3C annotation done in JSON LD
… any change we do on vocabulary we have to publish a proper vocabulary
… and define terms in a formal way
… I have a tool to do that
… we have put the vocabulary and tools in the repository
… at some point we need to decide if this is published as a W3C note or leave it as a document on github
… publishing it as a note indicates stability and I am in favor of it

<Ivan Herman> current version of the vocabulary

Wendy Reid: any thoughts?

GeorgeK: with the previous vocabulary from annotations, are we extending it?

@Ivan Herman: we are not changing the semantics, but we are creating some restrictions
… when it comes to EPUB only certain values are permitted as sub properties
… there is one new concept, the annotation set, that we have defined for ourselves

@Ivan Herman: is there anyone who opposes publishing this as a note when the time comes?

Brady Duga: just to clarify, what is the timing of the note?

@Ivan Herman: when we publish the first working draft, and both documents will evolve in parallel

@Ivan Herman: the official name will be a draft note

Grigorily Manucharian: there are several segments of the text where it is explicitly stated that they are placeholders

@Ivan Herman: those are there because in the current iteration of the spec they are not designed

<img> in SMIL - w3c/epub-specs#2883

@Hadrien Gardeur: SMIL allows references to text, audio, and video
… currently in epub we only use text. It references IDs. We also use the audio element without a time fragment

<Ivan Herman> strictly speaking, 'img' and the others are defined as "alias" of 'ref'

@Hadrien Gardeur: a number of specialized libraries have experimented with narrated comics using pre recorded audio
… the current spec doesn't provide a good way to do this, so the current experiments are in proprietary formats
… I have been investigating how much work it would be to support this in EPUB
… we would need a way to specify a fragment of an image
… we can either use regions or spacial fragments, which I think is much easier
… we already use this in a few other pages
… there is interest for this, there is no format to do this either in DAISY or EPUB
… it is a pretty small gap to allow this feature in EPUB

@Ivan Herman: for the record, the SMIL specification predates the specification for media fragments
… I agree that in 2025 we should use media fragments
… are there any prospects of real epub readers that would implement this
… we need two independent implementations for a new feature

Brady Duga: I think there were RS that were offering implementations

Hadrien Gardeur: there is support in readium to support this, once it is in the Readium toolkit it is pretty easy to support
… Thorium will support this
… I have asked the maintainer of StoryTeller to support this also
… it is mostly used by people who do this on their own, who offer this to their library patrons
… I cannot commit on the timeline for Storyteller but Thorium will support it this year

@Ivan Herman: Thorium is clear, Storyteller, from the official point of view, the AC would ask for an EPUB Reader like Thorium
… to really implement and use this
… Colibrio would be fine, it doesn't need to a big player

Hadrien Gardeur: I expect specialized libraries to use this

@Ivan Herman: if I have an EPUB that uses this, I should be able to read it in another system

Brady Duga: what are the implications for existing implementations for Read Aloud?
… could a publisher just reference portions of an image and not have the audio overlay? The image would then be the synced media?
… is that a potential use here? are we worried about it

Hadrien Gardeur: I don't think it would be a good thing to overload text
… wrapping an image in text is a bad thing for accessibility
… the benefit of having image, is opening the door to image based content
… it also opens the door to something highly visual in nature, plus audio, and a textural element
… it opens the door to more things without negatively impacting the current system
… media support in EPUB remains more limited than we wish
… even with the current take on media overlays things could be better with RS support

GeorgeK: …read aloud means to me, taking the text and playing it into the system, different than audio overlays
… if we have a fixed image, and there are four audio clips in parallel with that image
… the media fragment shifts focus on the image, would you have something like
… two people in an image, a focus on one person, then a shift to the other person like a conversation

Hadrien Gardeur: that is one way you could implement it
… currently they can zoom into a panel and play the audio from that panel

Charles LaPierre: about implementation concerns, I don't think we need explicitly two reading system implementations
… just two implementations of the technology so a tool would be fine
… if you had a screen reader enabled, and you have the textural part overlayed on audio, I'm trying to understand

GeorgeK: I think the SMIL implementation would play and the screen reader would be silent
… if you pause the screen reader

Brady Duga: when I said Read Aloud I meant reading the included audio
… to Hadrien 's point, we have a flow for children's books
… the publisher adds audio to existing images/media
… the reading system then takes that and plays it
… is it expected now that Reading Systems must synchronize the audio?
… are we breaking children's books by enabling this?

Hadrien Gardeur: I don't think this is very likely. Producing an audio layer is much more work than including text
… the kind of org that will produce this media overlay is one that cares deeply about accessibility
… the cost is much higher to produce them

Brady Duga: these are not text to speech books, but with media overlays

Dale Rogers: I wonder if there would be a smooth transition from one image fragment to another?

Hadrien Gardeur: this is not about controlling visual presentation. A lot of movement can be hard for people with certain sensitivities

wendreid: for example, a lot of RS have settings for page turn behavior. I think this would fall in that same class
… especially on mobile, where you can have multiple page turns actions

@Ivan Herman: we may be getting into areas that are not covered by this proposal
… the original SMIL spec refers only to the ref element
… the spec doesn't say anything about transitions
… it is always a trick discussion about the AC and how they accept implementation reports
… I think beyond the legal terminology for the process, they ask
… if I create a book with this feature, am I stuck with only one reading system?
… if we decide to put this in the spec, we mark it as an at risk feature
… so if we don't get enough implementation we can easily remove it at the end without triggering reviews

Matt Garrish: you want to see two similar implementations, like two Reading Systems and two authoring systems
… is this incubation material? Where are we going with this?
… is it at the point we want to implement this?
… what are the knock on implications? I'd like to be sure of this before we add this to the implementation at the end of the process
… maybe the at risk implementation or a cg note would be the way to do this

GeorgeK: what is the relationship of this technique to other ways to make comics?
… it seems like an alternative to other developments for comics

Hadrien Gardeur: this is directly a response to a number of specialized libraries have implemented especially in Northern Europe

<Grigorily Manucharian> Hadrien the proprietary use cases in Scandinavia are comics only?

Hadrien Gardeur: they have produced files that do this, and everyone does it slightly differently
… I look at what people are using, and what people need
… this is quite specialized, mostly about adaptive content, but it is missing
… this is different than adding something no one asked for

GeorgeK: I have seen the US Library of Congress dabbling with this too

Hadrien Gardeur: this is one way we can support this

@Ivan Herman: I don't see this as an incubation matter because we don't have to create new vocabulary
… we are just allowing a tiny bit of what is already in SMIL in EPUB
… it is defined already, we just allow it here

Charles LaPierre: I get your point about two implementations on one side or another, and the
… Readium implementation could go out to multiple readers
… in a comic, could you have it read each panel in the correct order
… if you click a Read button in the RS, would it do that? Is this a potential use case

Hadrien Gardeur: yes, you could have a panel with a textural and/or an audio equivalent

Grigorily Manucharian: another potential use case is language acquisition/application. Having two audio setups in a Manga

Grigorily Manucharian: you could switch between the translations
… this kind of use case could be studied a bit further

Hadrien Gardeur: I think this is different. You are talking about having two images in two languages?

Grigorily Manucharian: SMIL doesn't support multiple track?

Hadrien Gardeur: not as we use it, no

Wendy Reid: that gets into something we are also discussing, parallelized content
… we may have an open Issue about that where we do want to talk about language acquisition

<Gautier Chomel> Parallels content is a discussion at https://github.com/w3c/epub-specs/discussions/2829

Wendy Reid: this is where text to speech might be better than SMIL so they can choose their narration voice
… SMIL is fixed, referencing specific content
… it sounds to me like we want to keep exploring this
… it might not be a huge change to the spec
… the major question is if we have the right amount of implementations
… I'm concerned about industry uptake, if it will be supported by most major reading systems
… we need to make sure that there is more than one reading platform available
… just for the sake of user choice

Brady Duga: I'm not convinced, I'm still worried about breaking existing workflows.
… there is nothing tying this to specific content, and I would need more reassurance that people
… won't start making children's books that would break on current reading systems

wendreid: we have a greenlight to keep studying this. Can we get sample files to study?

Hadrien Gardeur: Nota has public examples. We might get the files in JSON

@Ivan Herman: I'd like to explore what it means editiorialy, can Hadrien do a draft PR for the author and RS specs?

Hadrien Gardeur: I can do this, but not immediately, maybe in February

<Grigorily Manucharian> Hadrien on one of these sleepless nights

Wendy Reid: we will carry over our last topic, selectors, next week

Minutes manually created (not a transcript), formatted by scribe.perl version 248 (Mon Oct 27 20:04:16 2025 UTC).