Meeting minutes
Gautier:This session is part of a series with AI in RS.
… today Daniel Weck, lead developer of Thorium (reading system from EDRLab)
DanielWeck: AI-generated content descriptions in Thorium - unreleased experiment, thus work in progress - lays foundations for user experience we want to follow - demo two page spread with image of Jules Verne with description - inspect the HTML
… image has empty @alt and empty title - lnik takes me to an appendix where the image is displayed - but here the @alt is not empty - it says "linked image"
… would be great if we hade some help from AI - Thorium has a zoom feature - leads to room for textual description - can choose an LLM to generate description -
… decide between "short" or "extended" description - I can edit the system prompt, but there is a default system prompt - Gemini very good at discovering ppl in images - extended description would have two paragraphs - advanced view of system prompt in JSON format - additional information in prompt in this format
… select text fron answer - run a search on the Internet to get more information
… new work from W3C WG - complex image (bar chart) - link to extended description - rich text that is not part of a short description - plan to create a modal interface where you might consult AI
(1) user see descriptions (2) chat with AI (3) do further research on the web - familiar chat UI - modal interlay - default system prompt which sets useful boundaries - we also feed in metadata
… request short or extended descriptions easily - just "one shot" - we need to inform the user that an AI will hallucinate
… MCP Model Context Protocol for tool calls out of scope - RAG also not implemented - beyond basic embedding - also not local LLMs - response times OK, but not the quality - Gemini better for image descriptions
… you may give metadata as embedded context for the prompt - advanced user may edit the systemprompt and might remove blatantly irrelevant metadata
George: publishers are not happy with AI getting trained with their copyrighted materials. Any protections?
Daniel: All conversations in the chat with AI, are used for training if I don't pay for using the LLM - If I were to pay for the service, the data remain private -always depends on the terms and conditions of a particular model - for publishers TDM reservation protocol allows to opt in or out - Thorium would respect this
… , any ideas how that could be solved?
George: if image is not used for training, publishers are OK with that.
Daniel: Thorium would have to police the use of data by an LLM - Would Thorium have to blacklist some models?
James: Publishers are very twitchy about copyrighted material - with an epub you can mark the TDM or place a couple of metatags - 6 or 7 different ways to signal that training ist not accepted - training is an issue -
… on-device LLMS would be helpful
Daniel: Publishers don't want RS to create friction - with images copies and text scanning, it's so easy to be done (e.g. on a Mac) - we have to send the image to the AI, but can't control what the LLM wil be doing with it
James: could a publisher embed a token ?
Daniel: agreement with Mistral - access token for EDRLab - could run on a Thorium server - but Thorium doesn't transport the key itself- but uses it in accessing the LLM to answer users' requests
Ori: if you using the user's API key, you can't know what the AI does with it - Gemini say they don't use it for training, no idea what OpenAI does - using another key is problematic
… Gemini doesn't feed requests for image descriptions to humans
Daniel: main stumbling block: potential of legal issues - we could enable it in nightly build, but not in production builds
George: JPEG has metadata in it - is that transmitted?
Daniel: in FB Messenger or Signal I check that GPS data is erased before I share pictures - with AI once the image payload is transmitted - it will be readable for AI
Ori: guess it will not ingest geographical data
Daniel: most LLMs have restrictions - in Thorium we don't create requests for LLMs manually - we feed image data into an abstraction interface
… abstraction layer is fully client-side - it allows us to speak Javascript -
Ori: had to reduce size of image - don't send EXIf or geographical data
Daniel: images processed before sending them on the wire - reduction in size before sending
gautier: WCAG criteria; description must offer same service as the image - a way to fulfil this - focus on authored description (if available) - real success for WCAG requirement