W3C Publishing Community Group Plenary: "Advanced Features in Colibrio Reader"

Meeting minutes

Lars: I have no formal presentation. I have been experimenting with AI in Colibrio since long time now. I am particulary interested in having conversation with a book. Because I am a fan of fictions.

Lars: I have beeen using openAI, experimenting with theyre API is easy, it runs in the browser and it is client side. Last year, they released an asistant API that takes care of the boring stuff.

Lars: to use LLM we need tools, those are the APIs. Without tool, the LLM is a stupid huge base of knowledge. To be more precise we feed with very contextual information. You can give the context in a prompt. prompt engineering is about yourself being the tool. The more precise you are, the more accurate you get an answer. To go further in using LLMs we use context inputs from other databases. That's the role of the API.

Lars: adding context is costly and time consuming. It needs to be structured and expressed in a way understandable by the LLM. This complexity needs to be managed.

Lars: showing screen. This is the vanilla reader, available online. You need an openAI jey to make it work. It can be costly, when you work with images. Less when it is about text. At the opening of the book, i strip away unnecessary markup and keep only the HTML semantic. I clean to the bare minimum of code. That's pushed to the LLM as embedded. Think about it as a computer edition of the book. An edition made for computers. A

numerical representation of the book. It feeds a vector database. You can then ask question, queries, to the database.

Lars: A note, this embedded version, in my opinion, should be built and sold by the publisher.

Lars: Anyway that's an important part because this is the step allowing to get contextualised answers.

Lars: next, I open the dialog, a chat box built in the app, and start to ask.

Lars: the question goes to the vector database, which performs semantic search and provides chucks of 500 caracters to the LLM who is formulating the answers displayed to me.

Lars: To make sure the responses are from the book, the app performs a search and provides link references for each part of the answer. So you can activate the link and go to the part of the book stating that.

Lars: the models are not smart, it is the context and the dispositive deployed by the App developers that make it usefull. As a consequence, the best quality of the book make the best answers. Metadata are important too, we exctract and use them to feed the database.

Lars: Metadata, semantics, Table of content, all the ebook appareal is used here. It is our best chance to get good results.

<wolfgang> Gautier: publishers rely on AI systems - risks involved for customers

Gautier: there is a risk of loop. AI analysing data created by AI.

Lars: yes, that's a major problem actually, on every digital contents

Gautier: so probably it is of use to have a refine property to indicate that "this metadata or content was AI produced".

Lars: For sure! So we could alert the user, give a proportion of risk.

Lars: the LLM hype is too much, but still, the results are good. Let see with images. Here I send Image + context, including visible content aside of the image (the visible range we call it) , and always in the context of the book thanks to the embedded version stored in a database. I get good result. Trying with a contemporary art photo and a world map with data represented on it. This is complex to achieve on the production pipeline.

It is easier in the reading system because we have the complete numerical representation of the book stored in a database.

jonas: what is included in the visible range?

Lars: text that is available on the visual page. It is risky to expand too much, it could interpolate topics from other parts of the book. We could experiment adding title structure per example.

wolfgang: I feel, for science content, the chapter level can be the context.

Lars: this is to experiment, there are many different books fortunately! The solution will differ largely depending on this diversity. The more granular you are in the information (semantic, metadata, structure) you give, the best result you'll get. A schema attribute would bring a strong help, per example. Be smart when you build your ebook, you'll get strong feedback.

Lars: I am also adding semantic search and translation. All we add is meant for non visual readers, they have a stronger need.

Lars: it also works with local models so you are not obliged to send your content to feed the LLM. It is slower but it works.

jonas: what happens with copyrighted material?

Lars: never use free services. I pay for openAI, the contract say they don't use my contents for training. That's why we just provide a way to give your API key, then you are responsible. I don't want to take that responsability.

Lars: also, publishers should build and sell rights on embedded version. Meaning licensing your content, but ready for machine usage.

jonas: for libraries it's tricky, we usually don't own copyright.

Lars: you would need to buy two licences, one for public reading and one for machine usage.

wolfgang: in fact all the knowledge used in your system comes from the book. The LLM is only a vehicule here.

Lars: yes, the LLM is a conversonial interface, good at language, but we need to give them the knowledge by running other code aside.

Lars: and adding control checks to make the answer accurate and verifiable. That's part of the agrement with computers, we want to be able to check because they don't always tell the truth.

– DRAFT –
W3C Publishing Community Group Plenary: "Advanced Features in Colibrio Reader"

11 June 2025

Attendees

Meeting minutes

Diagnostics