Meeting minutes
EPUB to ISO (cont.)
wendyreid: yesterday we finished the day with an objection on ISO. We have to rediscuss. We want to update the iso version to up to date epub. Various countires on the world relay on iso.
wendyreid: we must consider the changes in relacion of epub with wcag that hapens with epub 3.3 / epub accessibility 1.1..
wendyreid: is there any questio or comment before we open this resolution again?
<MURATA> -1
Shiestyl: it's complicated situation, transition to iso is kind of risk for exemple in japan. epub accessibility 1.0 was made normative in Japan, but we haven't do anything about accessibility. We plan to.
ivan: if we move with current rec to iso, it should be our intention as w3C that if in the futur we come with a version beyond, we'll move it to iso also...
… what happend with idpf (they lost contact with iso) should not happen here...
… i suggest we add a commitement to submit any new version to iso...
… in ipdf time move to iso was a long and hard process to rewrite everything. At W3C we have a different way / agrement with iso, they just take our document, the content stays the same. Would you accept that as an additional comitement?
shiestyl_: yes
<MURATA> +1
brady: as it seems that contradiction has been removed i suggest we move to end of discussion and proceed.
<gpellegrino> 1-
<MURATA> Ivan is the liaison officer from W3C to SC34
leonardr: i support ivan proposal.
<MURATA> I am the one in the opposite direction.
<leonardr> FYI; info on the PAS process. https://
wendyreid: first proposal keeps the next as yesterday. and a second proposal is added to state on commitement with the futur.
<MURATA> Not in Japan.
<MURATA> Not in Korea.
cristina: i still have some dobt with the automatic choice. I don't understand why we would have to do everytime to iso. In Europe Iso is not everytime accepted as european standard, we might have to make europeans ones at on etimes.
cristina: i don't think we have to oblige now w3C for ever
ivan: iso does not have right to change our standard. they only refer to our standard.
<MURATA> -1
<AvneeshSingh> aq+
<MURATA> -100000000
leonardr: iso committee can make comments, and we accept or not changes. i suggest any organisation around the table to enlist as iso member.
wendyreid: it is also possible nothing happens at iso
AvneeshSingh: having our recs as iso is very important.
wendyreid: we do have to recognize that some countries has iso valid, even if not all countries, we have to take that in count. I agree the situation of this group may change. I'm happy to rewrite my proposal so it's not "for ever" but still stick with a commitment about future procedures.
wendyreid: we do have to recognize that some countries has iso valid, even if not all countries, we have to take that in count. I agree the situation of this group may change. I'm happy to rewrite my proposal so it's not "for ever" but still stick with a commitment about futur procedures.;
<wendyreid> Proposed: WG agrees to explore publishing to ISO when we have a new version of EPUB, EPUB Reading Systems, or EPUB Accessibility
<leonardr> +1
<MURATA> +1
<wendyreid> +1
+1
<toshiakikoike> +1
<CharlesL> +1
<shiestyl_> +1
<laurent_> +1
<ivan> +1
<MasakazuKitahara> +1
<AvneeshSingh> +1
<duga> +1
<Daihei> +1
<gpellegrino> 0
RESOLUTION: WG agrees to explore publishing to ISO when we have a new version of EPUB, EPUB Reading Systems, or EPUB Accessibility
<wendyreid> Proposed: WG agrees to publish the 3 recs as ISO standards under PAS process once the European Process is completed
<ivan> +1
<wendyreid> +1
<CharlesL> +1
+1
<MURATA> +1
<laurent_> +1
<duga> +1
<leonardr> +1
<shiestyl_> 0
<toshiakikoike> +1
<MasakazuKitahara> +1
<Daihei> +1
<romain> +1
<gpellegrino> 0
RESOLUTION: WG agrees to publish the 3 recs as ISO standards under PAS process once the European Process is completed
<MURATA> JWG7 will have a virtual meeting tomorrow night.
ivan: it could be good to have someone to go to iso meetings.
<MURATA> +1
Webtoons Standardization
<wendyreid> w3c/
See slides online
shiestyl_: (sharescreen)...
… current situation in Japan is we use epub 3.3 for webtoon, but no appropriate feature. Discussion at TPAC 2022 was not resolved...
… some RS already adopted rendition:flow in OPF. both pre-paginated and rendition:flow are specified...
duga: i remember last year we decided to push discussion to community group, but that not happened. we are just retaking 2022 discussion at the same point.
shiestyl_: (demo): with no rendition tag, move left to right. with rendition flow: automatically top to bottom
gautier: As a user, I want to have a choice of the way I read
… I don't want to be obliged to read a way
shiestyl_: Some reading systems can change, Apple does not, but others do
… depends on the system
laurent_:
shiestyl_: some reading systems don't allow for change
laurent_: i appreciate prototype but i dislike when a company imposes a choice to other RS without speaking with group. That's what happens now, i think it's a bad move.
duga: i believe it's up to RS to propose override. I agree with Laurent, i would have preferred discussion and time to address it. But also I would be annoyed to take no decision and stick with a stuck situation;
wendyreid: looking at the spec. it is a very minor change, webtoon is very much a format here to stay, as a user i'm not sure epub is much used, and i'm concerned about the accessibility. Yes it should go to a better incubation process, we should certainly recommend user switch possibilities.
Hadrien: i think it's misaligned with how we and epub works. webtoon is a single strip of image. divided in files for optimization. You could end up with display not ok, it's a terrible reading user experience in a lot of RS that don't support scrolling. RS end up looking at what is on the file and ignore opf.
ivan: my understanding is that there is currently a restriction in the spec. The change would be to stop this restriction and extend semantic. From the start my problem is with the terminology: is this a class 4 change (the line is thin betwenn class 3 and 4)...
… even if change is editorially simple, it means that epubcheck has to change, new features are requested for RS, etc.
ivan: we are not allowed to do class 4 changes.
shiestyl_: considering situation of apple and amazon, if we have new feature for webtoons, i wonder (not sure) they will follow. It's not a good situation to have new feature. I understand it's not suitable, but market situation is to be considered.
laurent_: i agree with Ivan, it affects all RS to do heavy dev to adapt. If there is a commercial problem and japanese industry really wants. I see a solution: one html only with sequence of mages would work everywhere. It's a naive implementation that could be complex with a lot of images...
… this will work today.
duga: i agree it's real change. I don't think RS have to evolve as they can just ignore the metadata; it's business decision more than technical discussion. I agree with Laurent's solution, would work today, out of the box. It would be great to have people who implemented that in the room. it's dangerous to accept to align with a solution that was not discussed here and that breaks the spec.
wendyreid: class 3 version would be we add "RS may ignore" ; class 4 would be to completely rewrite and specify rendition flow in FXL. This is a challenge, there is a technical side to a business discussion. Webtoons are more and more popular, we are probably late for this discussion. If we don't address it we'll see more and more RS implementing the way they want. We also have to respect users...
… we also are overtime, i don't think we can take a decision today
Hadrien: i feel that's a bad decision for good business reason...
… we need a metadata hat says it's a webtoon. should be a specialized one, not a tweak.
ivan: the proper way i agree would be to have dedicated metadata. If we move away with this discussion, we have to re open a charter. But for some business reason we had this decision a year ago.
<leonardr_> It's 4am here - you don't want me scribing ;)
FXL Accessibility
https://
Wendy: This session was going to be about what's been going on in the TF, current draft posted, google link is working doc.
Wendy: Starting with what we've been up to, secondly what we learned we should not be doing.
Wendy: Writing doc on guidance for content creators on FXL accessibility.
Wendy: How do we give advice for common features in FXL books, and best practices in content development.
Wendy: many things we can do, there are still barriers that fxl cannot overcome. Biggest barriers: low vision and dyslexia accessibility issues. Users need layout adjustments that are currently not possible
Wendy: Interesting discussion around potential to change fixed layout, or potentially to introduce new concepts into layouts or improve underspecified epub concepts.
Wendy: technically possible in EPUB, but not explained well, which is likely why it has low implementation
Wendy: Discussed possibility of visual textual, visual setting for changing from fixed layout to reflow with different style sheet
Wendy: extended discussion to what reading systems should do with fixed layout, we only give advice to content creators at the moment.
Wendy: what features whould FXL readers make accessible.
Lately, been exploring different ways to fix fixed layout content to be more accessible. Looking into SVG implementations. Want to discuss vision based content as well.
Wendy: Want to do more exploration with CSS possibilities for things that are currently handled with JavaScript. CSS has improved.
Wendy: As you can tell, we have been all over the place, rooted in what's happening but also in potentials of format.
Wendy: As discussed yesterday, what we need in the short term is a document that addresses the needs of content creation as it relates to fixed layout content.
Wendy: Pivoting to; what content needs to be in this document to address needs of Euro commission in short term (next three months), what else do we need to be comfortable to publish as a note.
<gpellegrino> we can ask ChatGPT to fill in the text!
Charles: One thing, I was assigned to outline for reading systems, setting up headers for different levels. That section needs advice from reading systems on how/what to recommend.
gpellegrino: It seems to me that this work requires more R&D, it is incubation work, and I'm not sure in the short term we can come up with something valuable.
gpellegrino: Better to focus on what we have now, and then we can start again with more exciting R&D, but that may take years.
gautier: I agree, the EU legislation is giving us obligations to do as quickly as possible. It is important to note, in France, there is a lot of pressure. On r&D, given the long list of what Wendy mentioned we should prioritize what we can do now vs. the future.
Wendy: Logistical note; Thank you Charles. Agreed, gpellegrino, we need to split this into two work efforts. CG could publish note, would look better coming from WG. Would like to continue incubation work, as FXL is not enough.
Wendy: People need advice in the short term, in the long term we need to improve. In the short term: what are the concrete things that we know will be helpful, today. The question is, do we want to do content and reading systems, and how far do we take scopeHadrien
Shadi: Continuing off of last question, I agree with suggestion, our strategy overall, w/r/t EU Commission, we don't want them to be writing tech requirements, we want them to point back to this group where tech expertise is.
Shadi: What is needed in the short term is what is the problem description, what are limitations for certain kinds of publications.
Shadi: Question of getting closer to accessibility is a separate question. Certain layouts cannot be enlarged, doesn't mean other aspects cannot be made accessible.
Shadi: We don't want to end up in a situation where we apply EPUB accessibility across the bench.
Hadrien: By allowing this, we've opened a pandora's box. Difficult to close. On short term basis, difficult to do much, if anything.
Hadrien: The more we explore, the more we realize that this is a mix of potential best practices and things that will require us to return to spec and change things.
Hadrien: We do not know the shape of it, in the end. Short term is a good request.
Avneesh: I agree with what Hadrien said. There are issues with fixed layout accessibility. If we want a startegic move, we don't want to suggest that reflow is better.
Avneesh: If we can achieve this, it is not an issue to not have an immediate solution. I think it would be a good thing to aim for the short term solution.
<gpellegrino> https://
gpellegrino: One thing that we can do is update the mapping that we did between the Accessibilty Act and EAA. We made an assumption that it was for reflow. We may add some notes with the limitations of fixed layouts with what can be achieved now vs what cannot be.
shadi: Agreed, first it is an existing document; easier for us, easier for those that consume.
Wendy: queue closed because we are at break time. Sounds like we have a scope; paint a picture on current state and provide information on what we can now. Think about future state later.
Wendy: We will probably need to discuss logistics, can do so at end of session.
Wendy: Adjourned for 20 minutes.
Thank you wolfgang for your numerous spell checks!
Wendy: Enough people have returned that we can restart proceedings.
TDMRep and EPUB
See slides on-line.Wendy: We have Laurent, who has asked to do a presentation on TDMRep and AI opt-out
Laurent: I will take 15 minutes to discuss TDMRep opt-out. First, context, EU CDSM Directive states that anyone who has legal access to resource can freely download and use for TDM.
It says that publishers can opt-out with a machine readable means, but void for scientific research.
Laurent: AIE (Italy) and ERDLab created TDM Res Protocol CG in 2021
Laurent: 45 people joined, mostly publishers, final report released on Feb, 2022. POC was developed by cairn.info and seraphine.legal (both french.)
Laurent: Objective = blocking a class of rotbots, not one specific robot, and being very simple to implement. One boolean property: opt-out or not. Optional url to point to an ODRL 2.2. json resource which contains the publisher's contact and conditions for obtaining mining rights.
Laurent: Properties 'tdm-reservation' and 'tdm =policy', expressible in the HTTP header of each resource, in a well-known file (tdmrep.json) or as html metadata
Laurent: New interest triggered by growth of gen-AI, the FEP supports the solution. French and North European Publishers as well.
Laurent: Urgent need: reach global consensus that "TDM" covers "AI"
Laurent: Some publishers would like to embed this information inside of EPUBs they publish. There would be no impact on doing so outside of W3C, but it would be better. Nobody seems to care about embedding this in ONIX records, which should make sense.
Laurent: Proposal, Use OPF metadata, specify that the directives cover every resource in the EPUB. Define a new namespace with prefix 'tdm' (or 'tdmai') simply add two info, tdm:reservation: boolean and tdm:policy: URL. TDMRep CG can write specification, ok?
Laurent: Some publishers really want to put this in the EPUB itself, possibly because they want to avoid user downloads of EPUB and then imput into AI training.
Wendy: You already answered my initial question of where we put it in the EPUB. To your question of why this vs. ONIX, I understand because ONIX is easily divorced from the book.
Wendy: Recent controversy, revealed that a GEN AI company, that a group of researchers through a backend exploit grabbed a large number of EPUBs from smashboard. Technically a violation of policy of website, but would help to have EPUB policy also explicit.
Wendy: May be good to have both in ONIX and EPUB.
<Zakim> tzviya, you wanted to recommend publishing tdm in tr (somewhere)
tzviya: All LLMs have used the Books2 or 3 model, which include copyrighted books My question is a little off topic. Laurent, we should discuss publishing report more publicly. In proper track.
tzviya: i really support this. Second, I was loosely involved with this, you have not discussed scholarly publishing, has their been input from that world?
Laurent: We did not have much input from scholarly publishing.
gendler: Couple things, I primarily come from news publishing
… and book publishing, we're looking at this actively
… other questions of its usefulness, there is a breakout session tomorrow
… we might take this on here, but there will be a lot of adjacent discussions
duga: I'm a little surprised how eager everyone is for this. A couple of concerns, first, almost trivial, is TDMRep the right solutions. Other groups are working on similar things.
duga: Would like to do more research before jumping on TDMRep as the right solution. Bigger concern, feels like we are adding DRM to EPUB metadata, and that worries me.
duga: we do not do DRM, and this is some sort of putting rights in the EPUB. We use contracts, that should handle the use of the EPUBs, sounds like the real concern is just for downloaded EPUBs.
+1
duga: not excited to add this without understanding further ramifications of this proposal.
laurent: the contract, at least in Europe, is overridden by the law, where the permissions structure is that it is allowed unless the publisher opts out.
duga: the contract says I can do whatever I want, but your TDMRep says no I opt-out, would that be illegal?
Laurent: I believe so, because the law says so.
<leonardr> @duga - it depends on where you are in the world...(and the laws are still in flux)
duga: Well then I have a concern that I may have a contract with you, but then I'm somehow breaking the law. So I am even more concerned about using this directly in EPUB
leonard: A follow up to what Brady said,I have an alternative proposal not sure whether to present it now, or save for later.
Wendy: Let's hold for later.
Ivan: Several things, I wasn't here at the very beginning, my understanding is that the TDM is based on a legal requirement. Now, did the EU give its blessing that the TDM Rep is one of the answers?
<wolfgang> s/begging/beginning/
laurent: The EU have not blessed this as THE answer, but as a possible answer. The EU commission suggests that any machine readable solution will be good enough.
laurent: they will not bless any specific solution.
ivan: More specifically to our case, is the formulation of the EU Directive and the creation of the TDM Rep, if inserted into EPUB also going to be considered a valid machine readable solution. This is fundamentally different from what we started with.
laurent: they don't specify whether the data should be inside the content or outside of the content. As shown, we could put it inside or outside of the html. I know that initially, I, wanted it to be outside of the content, but not required.
Ivan: different level, provided the answer is to go ahead, at the moment the TDM is a CG report. It would probably give much more weight if it was in PR space. It would not be appropriate to push ahead without doing so. It would require a separate interest group or something.
https://
laurent: I am a proponent of prototyping and pushing ahead with what we have. Ivan: Just highlighting potential pitfalls.
ivan: This raises a slightly more general question, it is adding metadata to OPF, which is mechanically fine, and standard. Is it possible that this is the kind of action that may come from other sources as well. Do we as a WG push back on this specific question vs. how is similar metadata from other WG incorporated into our work. Do we need a registry mechanism?
ivan: possible that other WG would register metadata that they want added to the OPF.
<leonardr> +1 to @tzviya cringing!
laurent: I agree with this possibillity. Ivan: this would be well within our charter to set up such a registry. We may need to discuss whether we want that or not.
gpellegrino: I have seen a lot of EPUBs with internal metadata, we have a mechanism by which to add metadata into OPF. We do not need a standard approach, if the industry wants to add it, they can do so directly.
Ivan: but we should as a WG give a blessing or not, which can give it weight. I don't want to get into a debate on decentralization, because we will be here all week. Simply, do we want to give our blessing to things, or no?
wendy: I don't know if we want a registry, I also see Brady's concerns, and they are incredibly valid. This is another example of something the industry is concerned about, and we need to handle that. I don't want to bless a specific solution, TDM may be what we settle on, but leonard has other proposals, and max has mentioned others as well. A proposal for EPUB may be to have a namespace for general solutions so that we can allow it to be open, [CUT]
wendy: specific. Because we do want to be open.
ivan: what is the next step as far as we are concerned?
laurent: we need to discuss that what we have done with TDM covers AI. We need more communication on these topics as well, from prototype to something that is industry wide.
laurent: for the EPUB inclusion, the TDMRep CG will work on a draft and come back to the WG with proposal.
https://
duga: Is there a potential for moving CG to WG?
laurent: we are discussing with other groups, those that create APIs, we want to have traction and stabilize the work before chartering a WG to build a standard.
<duga> +1
leonard: it seems premature to discuss implementing TDMRep before we have discussed whether TDMRep is how we will go forward.
wendy: leonard, would you care to show your alternative proposal.
Leonard's presentation slides.
leonard: I am presenting this on behalf of the C2PA. Mentioned as part of the anti-counterfeit work. This started in 2019 around misinformation, adapted work that had already been out there. This led us to the area of TDM
leonard: C2PA is the standard body under Linux Foundation, they build standard. Content Authenticity Initiative is a group that is implementing the standard, and is educating users/legislators.
C2PA has approx. 100 memebrs. We have Liasons from ISO, IPTX, ETSI, PDF Assoc.
leonard: CAI, much larger group. C2PA Spec 1.3 available online. CAI Open Source implemntation as well. Hardware implementations exist as well. We have a content credential. Uses a W3C verifiable credential.
leonard: includes assertions on provenance, and digital signature as well. Inludes action asseritions, which includes when gen-AI has been used in the creation of a piece of content. Has been chosen by Google to identify AI content.
leonard: has a Do Not Train assertion, modeled on top of the TDMRep work. We separate out training, gen training, inference, and mining. Not just a boolean, allowed, not allowed or constrained.
leonard: we use digital signatures for a trust model. Same approach used as PDFs and the Web. C2PA manifests can be embedded already into a number of formates. EPUB does not work with v1.3, we intend to add to v1.4
leonard: we also have members of our org that are storing these in seperate file systes, cloud, blockchain, etc.
leonard: a lot of questions coming up about legislation. C2PA has been working actively with U.S. Recently spoke with many legislative bodies to discuss how C2PA addresses these needs. This is a top priority for most governments in the world.
leonard: C2PA is ahead because if is already in use in the field, and supports many file formats.
leonard: as a publisher, if you're still delivering PDF, you can use the same format for both PDF and EPUB. Can do it as parts of the content, and not entitery if you'd like.
laurent: Just wanted to add that we are in discussions with leonard. I have some issues with C2PA, the adoption on provenance is good, but do not train adoption is not good. This for me is an issue that needs to be addressed.
laurent: we feel that it is overly complex for permissioning, most users are not technically savvy enough. It gives a very narrow definition of what is TDM, much narrower than what the EU commision gives.
laurent: different definitions of what is covered will create a large set of issues. But, we are ready to work together!
leonard: I do not disagree with some of your points, our solution is broader than just TDM, as it was intended to. Agreed details on AI need to continue to be looked at. My concern with your implementation, is you can download assets with DNT assertions already. It is implemented on that side, but not enough AI systems are reading that information.
leonard: there is at least one open source implementation that is doing so today.
wendy: is there a link somewhere to documents that you can put into the IRC for us to look at?
<leonardr> https://
charles: I'm just trying to understand, would the whole publication, when you create a digital signature to say it's authentic, how do you put that into the metadata, which would in turn change the signature?
leonard: this is a problem that has been around for 20 years, and has actually be solved. When you hash the data, not all of the bytes from start to finish, you leave a gap in the middle in order to put the signature in, by creating space for it does not affect the final hash.
Wendy: five minute break, and then we will head into our last session for what we haven't discussed. I would like to start with annotations.
EPUB Annotations
See slides on line.https://
https://
laurent_: in our RS we are integrating anotations, decided to use w3C annotations, allow detached annotations and embedded anotations. we want to make sure they are shareable and interoperable. will also watch about hypothesis.
laurent_: we find missings: created and modified properties ; body only plain text ; text direction, language and color properties
laurent_: we got a problem for location inside the file. we want to pint to a resource via several ways (cfi, textquoteselector and specific readium selectors including progression and domRange)...
… sets ocollection: there is a collection model but with burdens. partOf, cumbersome, first page last page we find no use of that.
… big issue is identifying the source publication; we think to add a new property "about" to log dc id title, publisher, source to have different options to identify correctly. with that we'll define an importation model with user choice to make sure one collection don't erase another one.
laurent_: we also add generator property to identify where the anotations come from.
laurent_: i will share this presentation for more precise indication (and links to references). And hope to discuss in the CG.
leonardr: it's nice to see that spec used and updated. One big thing: not using jsonld, should be investigated properly.
CharlesL: you mentioned headers for accessibility. I wonder how to deal with content headings and annotations headings.
laurent_: it must be think as "petit poucet"
<leonardr> @CharlesL - but then you'd violate the proper semantics of the documents. It would be better to get WCAG APIs to add annots as a first class citizen
ivan_: count me if you start working on the CG. specifics: your reaction to paging, we added that become the model is related to server. Later eventually, if you have worked with a server, it could surface again...
<leonardr> @wendyreid - was just a comment on the mapping of annots->H6. (implementation detail)
ivan_: we had a discusion long time ago about properly finalize cfi spec. If this become serious, we might have pressure to do so.
Other thing, I understand you make a selection of selectors, begin and end point might be a burden.
We started years ago some work never ended (in annotation WG), a document on selectors only. There is a note there to be found about specifics selectors.
gautier: Just to mention the structure and pagination of elements, it's important for me as a user
… main use case for annotation is usage outside of the book
… exporting is an essential function
wendyreid: we have just a few time to wrap up.
* On anotations, absolutely a good topic for the CG.
* On CFI, it will have to be on a future agenda.
for other topics that came in this F2F meeting, we have a lot to do.