Publishing Maintenance Working Group Telco

Meeting minutes

AI and its impact on our work

Susan Neuhaus: Today our discussion is about AI and its impact on our work
… to sum it up, there is a cross-W3C effort to address AI, they are looking for places to collaborate and work together
… we've been asked how current AI technologies may impact our work
… [reading from issue]
… given that we only have an hour, we need to stay focused
… I thought we could start by throwing out ideas
… by typing them directly into IRC
… people can contribute in different ways, type your own comment, then we can make sure everyone is heard
… then we can look at them, pick some of our favourites, and discuss those to share with in the issue

Susan Neuhaus: I'll start, mention the trend and a short description of the impact

<Ivan Herman> ivan: automatic on-line translations

<Susan Neuhaus> @sueneu: improved artificial voices, may need more markup

<Ivan Herman> ivan: read-aloud may need additional markup

<Dale Rogers> DaleRogers: AI assisted EPUB coding that creates validation errors

Wendy Reid: More metadata properties or values to communicate use of AI within publications so consumers/retailers can make informed decisions

<Susan Neuhaus> Toshiaki Koike: content for AI training, prevent unauthorized use

<Charles LaPierre> CharlesL: AI generated alt text for images and long descriptions

<Ivan Herman> ivan: metadata to assess whether the author is AI

<Toshiaki Koike> 1. Challenges for the publishing industry (copyright and content protection)

<Toshiaki Koike> A major ongoing concern in the publishing industry is how to handle or prevent unauthorized use of content for AI training. Legal and ethical concerns around AI-generated content, including potential infringement, remain significant and continue to affect the overall ecosystem.

<Toshiaki Koike> 2. Opportunities for tools and reading systems (accessibility and efficiency)

<Toshiaki Koike> From the standpoint of authoring tool developers and Reading System (RS) vendors, AI also offers clear benefits that may influence technical priorities:

<Toshiaki Koike> Authoring tools: Assisting creators by generating or suggesting alternative text for images.

<Toshiaki Koike> Reading systems: Improving accessibility by enabling text-to-speech or descriptive output even when alt text or metadata is missing.

<Toshiaki Koike> Development: Accelerating implementation of publishing tools and standards through AI-assisted development.

Wendy Reid: Facilitating dynamic or personalized layouts in EPUB reading systems through generative UI

Susan Neuhaus: Anyone else? Let's sum these up
… we have improved artificial voices in read aloud, validation errors caught with AI-assisted coding, metadata to identify AI, AI generated alt text, copyright and content protection, AI training
… I'm not sure what opportunities for tools and reading systems, do we mean new capabilities within reading systems

Ivan Herman: What I think for the working group the interesting question is how do these different things affect or potentially affect the EPUB specification per se
… there are some things that happen that we have no influence over, there are a number of entries here that we may want to put some effort into features that will help improve what AI gives us

<Avneesh Singh> One main thing is metadata to indicate that parts of content is AI generated. I think this is clearly in EPUB domain

Ivan Herman: for example, in translations, a future where the RS automatically translates the text, an english book translated into French
… is there some additional things we need to put in EPUB that would improve translations?
… there is already a specification called ITS we allow in EPUB 3.4 to help translation, but that may not be good enough, it may make the input unwieldy
… do you need to flag what should and should not be translated, or acronyms or idioms

Susan Neuhaus: What form do you think would be most helpful to the IG?

Ivan Herman: The question is for each WG, if all the AI tools were already widespread, what would our specification look like?
… there are things we may not care about because its solved, or where we need to offer assistance to improve the experience

Wendy Reid: an example might be how do we structure document for AI content?
… for instance in FXL text and images have to be properly specified and structured
… to support on the fly generative content for accessibility

Susan Neuhaus: Can I ask another question? You know W3C at large
… specifically for AI generated voices and artificial speech, not just EPUB but others
… we should include it in our comments even if it isn't our problem

Ivan Herman: Exactly

Avneesh Singh: APA already has a taskforce for TTS and markup for pronunciations, there's others concious of the increasing use of TTS engines
… of course in DAISY we're using these engines because the quality is so much better
… at TPAC we'll be having a discussion with them, especially for the publishing industry and markup

Ivan Herman: That gives me a question, we know we have the PLS, is it in line with such an approach to create a lexicon for translations?
… well known acronyms or terms that should not be translated for a specific publication

Avneesh Singh: I think so, it would help, like "read" (reed) or "read" (red)
… any technology to reduce ambiguity, AI needing to do less sense-making
… anything to assist

Susan Neuhaus: That sounds amazing, let's put that aside for other issues
… we talked about read aloud, document structure, metadata
… AI coding errors
… we'll vote on the most popular topics, discuss those, and if we go over time we can find additional ways to continue the discussion

- AI coding errors

- Metadata to identify AI content

<Wendy Reid> +1

<Ivan Herman> +1

<Avneesh Singh> +1

<Charles LaPierre> +1

<Toshiaki Koike> +1

<Masakazu Kitahara> +1

<Dale Rogers> +1

- AI generated alt text for images

<Ivan Herman> +1

<Charles LaPierre> +1

<Masakazu Kitahara> +1

Avneesh Singh: Is that separate?

<Avneesh Singh> +1

Ivan Herman: I believe so

<Wendy Reid> +1

<Dale Rogers> +1

- copyright infringement/ai training

<Ivan Herman> +1

<Susan Neuhaus> +1

<Avneesh Singh> +1

<Dale Rogers> +1

<Toshiaki Koike> +1

<Masakazu Kitahara> +1

- realtime translations

Dale Rogers: Metadata is 7 votes, AI generated alt text is 6, Copyright is 6 votes

Metadata for AI content

<Ivan Herman> See also this mail sent to the CG recently: https://lists.w3.org/Archives/Group/group-pm-wg-chairs/2026May/0001.html

Wendy Reid: we already have an issue in our repo about this, but its important to
… crosswalk epub metadata with vendors to identify AI content
… because there is a lot of skeptical views and hostility toward ai
… so people can make informed decisions about how they engage with AI content
… right now it is opaque, you can tell what AI might have generated in a book

Ivan Herman: Great points, an additional area where this becomes more critical is scientific publishing
… where submissions to publications may come entirely from AI
… contradicts the ethics and practices of scientific publishing, but on the other hand, using AI to improve your publication might be acceptable, especially when facing linguistic challenges
… most publications are in English for example
… making it clear to signify to reviewers what parts use AI
… absolutely necessary
… that needs metadata added in one way or another
… one more thing, this is mostly really a publishing problem, not a web problem
… the web might be much less sensitive to these issues

Dale Rogers: That can get really complicated, for example I can write 8 pages of content that is my work, I can ask NotebookLM to only look at my resources, I can ask if it can find the pattern of my work, that is AI-assisted writing
… but it's only on my content, or is it just a tool like a grammar checker?

Wendy Reid: this could be important, for example, in social media platforms are using AI to find AI content, and people are saying they are being accused of using AI when it was their own content.
… adding metadata for AI so we can add more nuance to how it is used

Ivan Herman: Coming back to what Dale said, the example of write my summary, it's a perfect example, scientific writers may need to do, and might do
… we are not in a position on the merit of using these tools, eventually the scientific community will need to do that
… they are having those conversations now.
… our job is to provide the mechanism for this information being provided to the community
… right now there is no way for an author to make clear what tools they used and how
… the community can decide what to do with that, we can provide a means to help them disclose that

Avneesh Singh: One thing there, the Business Group is an important entity to be involved, not only technical people can decide on this
… we see how copilot is integrated in every microsoft product, people need to decide what the threshold is to being "ai-generated"
… 70-80% of content will be assisted in some way

Dale Rogers: I initially wanted to respond to what Avneesh said, how AI is just in everything, like in Photoshop and now it's integrated and if Photoshop created something, if an illustrator made something with AI in it
… now, if they were directing AI and human-made elements, and AI made elements, the nuance needed to determine things is very complicated

copyright infringement/ai training

Susan Neuhaus: Let's discuss copyright next
… people in publishing seem to be very interested in this

Toshiaki Koike: I shared my thoughts in IRC:
… A major ongoing concern in the publishing industry is how to handle or prevent unauthorized use of content for AI training. Legal and ethical concerns around AI-generated content, including potential infringement, remain significant and continue to affect the overall ecosystem.

Susan Neuhaus: Do we need to solve this or provide tools for it?

Ivan Herman: It's a pity that Laurent is not here, the TDM work is evolving towards marking up things that are a signal to LLM whether content should be used for training or not
… there was work on search engines and crawlers, and TDM started from that approach
… in the meantime we have LLMs now too
… for our work, we need clear ways to include metadata about a publication or parts of a publication can be exposed to LLM training.
… adding global metadata is relatively easy via the package.opf document, but more granular metadata is more complex, do we want to add that granularity?
… per chapter or image or video
… we need a granularity our current metadata structure is not perfectly aligned

Susan Neuhaus: Publishing hat, this could be particularly important for publishers, sometimes books are made up of content from different places, photosgraphs, stock photos, we'd need that granularity on a book by book basis

Ivan Herman: Or resource by resource

Wendy Reid: This is an important issue for us to highlight, because there are other groups working on the twin problems of ai content and fair use, and we need to be in alignment with them

AI generated alt text for images

Susan Neuhaus: AI alt text

Ivan Herman: I think Charles raised this, there are 2 things here, one is a requirement to create alt for an image in my publication and I can use alt text to do this
… the other possible scenario where I publish a book or webpage and I don't include alt text, and AI tools can come in to create the alt text for the user on the fly
… which scenario are we looking at?

Charles LaPierre: On that point specifically, this is something I brought up a long time ago, someone made a tool to make alt text descriptions to add them to webpages
… I was concerned because how do you know whether the image is informative or decorative
… went into a rabbit hole of aria roles and `alt=""`, the standards for decorative
… if you had `role="presentation"` and `alt=""` is a clear signal
… though you only need `alt=""`, adding the role is a sure sign
… that was my main point
… knowing when images get described by AI is important too, for clarity
… FYI, we're creating an ISO standard for remediated content, adding accessibility metadata, about to publish something for review, we talk about quality and AI there too

Avneesh Singh: There are two parts, the content side, image descriptions created by AI and added to a publication, we have a document on DAISY on it
… if a human is in the generation process, they take responsibility
… a difference between the front and backlist as well, but the use case Charles mentions is broader than publishing, but applies to us too
… Readium has this as a feature, as does Narrator, it's a very broad discussion we should have

Susan Neuhaus: Avneesh you bring up an interesting point, the copyright issue of on-the-fly image descriptions, it could be inaccurate enough to not be representative of what the author intended

Ivan Herman: Let me be a bit intentionally provocative, let's say the problem Susan mentioned doesn't exist, on the fly alt text is perfect, and metadata added to the image directs the AI properly, something like that
… on the other hand, at the moment we've had many discussions on images, we are very strong on requiring accessibility data for books; we had discussions whether we want images in the spine for example (which creates accessibility problems), is it possible if the AI tools improve, does it make this discussion moot?
… ie, image in the spine is automatically accessible because there are tools that do that automatically
… does that change what we say, or the emphasis, in the EPUB specification?

Dale Rogers: I create comics, one image per page, lots going on
… to decrease the amount of time to make it accessible, I will use AI to help me describe it, and sometimes I tweak it
… but if I have gone through the work to make sure it is adequate and the intent is expressed, I don't want AI to ignore what is already there because I've done that work
… if we could have something to prioritize existing content
… I don't know how we can fight with AI on this, this always gets back to author intent and reading system behaviour with AI on top

Wendy Reid: A big thing to consider is the potential power of ai to make things more accessible, there might be a future where this works
… but now, this isn't a consistent experience. One person gets well written alt text, but the next person doesn't but there is no control over it
… the only control we have is if publishers supply the alt text even if it s written by AI, but a human has reviewed it.
… automated tools may someday be good enough but right now we have to push for human involvement

Susan Neuhaus: What is the next step?

Ivan Herman: I will go through the minutes and add that to the issue, I'll clean it up a bit
… question for me, do we want to pick this up another time with more people present to get more perspectives
… I will contact Dom about this once the minutes are on the issue
… I definitely think we should talk about this again

Susan Neuhaus: Thanks everyone!

– DRAFT –
Publishing Maintenance Working Group Telco

14 May 2026

Attendees

Meeting minutes

AI and its impact on our work

Metadata for AI content

copyright infringement/ai training

AI generated alt text for images