Publishing Community Group

Meeting minutes

Wolfgang: Hello, let’s start by hearing from Anna.

Anna: I’m sharing my screen. 1D345 is a French company based in the French Alps. Our core expertise is automated language processing. We provide plagiarism detection, AI-generated content detection, content reuse analysis, and document comparison.

Anna: All data is hosted in France, ensuring GDPR compliance and strict confidentiality.

Anne: We offer simultaneous, in-depth document analysis.

Anna: We use machine learning and have developed our own models to identify similarities and detect suspicious text, enabling deeper analysis.

Anna: Our database includes indexed content built through partnerships with publishers worldwide.

Anna: To detect AI-generated content, we analyze tone, context, and cultural language patterns. We achieve 98% reliability with less than 2% false positives.

Anna: We also identify paper mill markers, such as tortured phrases and specific keywords, to support scientific integrity.

Anna: Our solutions can be used at every stage—submission, peer review (including revisions), and publication.

Anna: Upcoming features include reference and bibliography checking.

Wolfgang: Now let’s hear from Titusz.

Titusz: I’m speaking today in two roles: ISCC and Amlet.

Titusz: The core issue we address is fraud patterns on online book platforms.

Titusz: ISCC is an ISO-standardized system that is open, vendor-neutral, and transparent.
… We are also developing the ISCC Discovery Protocol, a neutral interconnection protocol for cross-registry exchange of authoritative metadata.
… ISCC is a code made up of letters and numbers, generated from multiple dimensions: metadata similarity, semantic similarity, syntactic similarity, data similarity, and integrity.
… Each component of the code is independent, and ISCC combines them into a unified identifier.

Titusz: A demo is available at covers.iscc.io, sponsored by Amlet.
… It indexes 3 million book covers. The initiative is designed to be cross-sector, so it remains lightweight. You can search for a title, find a cover, look it up on Google, and drop it into the tool to generate an ISCC. It then matches and explains the results.

Titusz: It’s an open standard, so you can implement it yourself or work with a provider. Please help us spread the word.

Titusz: Amlet, for its part, aims to address the TDM reservation protocol. It uses ISCC to let companies identify whether a title is available for text and data mining. It can identify books, excerpts, and translations.

Titusz: As a demonstration, I translated part of a book into Japanese—and it still matches.

Titusz: Amlet is currently in a waiting-list phase. Please join to have your catalog registered.

Wolfgang: Now, Sebastian.

Sebastian: Back in 2023, we started a task force related to anti-counterfeiting. I’ll try to summarize where we left off so we can resume.

Sebastian: I have been involved in ISCC, and I am the CEO of Liccium.

Sebastian: In 2023, we began receiving reports from publishing markets about illegitimate versions of their books appearing on platforms, both in print and digital. The seller accounts often did not belong to the original rightsholder.
… The problem has probably increased significantly because of the capabilities of generative AI.
… We identified full republications, partial republications (original plus fake content), imitative new content presented as associated with a brand or author, metadata used to promote fake titles, and altered content.
… We need publishers to engage with this problem so we can identify authoritative works and track fraudulent ones.
… In some cases, the same fake ebook is offered by several accounts under different seller names.

Sebastian: This is a challenge for publishers, who currently bear the cost of identification and monitoring. There is a need for standardized, interoperable methods to make this work easier.

Sebastian: The original work needs to be clearly identified, with verifiable declarations attached to it.

Sebastian: This is why the anti-counterfeiting task force was set up. Its mandate is to document patterns, develop shared terminology, identify technical approaches, exchange best practices, explore collaborations, and assess possible standardization opportunities.

Laurentlm: There are technologies that can detect identical content and metadata. One problem seems to be visibility: sellers and platforms are not always able to know all genuine sellers of a book.

Sebastian: First, the original publisher must assert their rights and put pressure on retail platforms to have those rights recognized.

Sebastian: The technical solutions exist; today, the problem is the will to use them.

Titusz: Technology can surface signals, but interpreting those signals depends on having exhaustive data and sufficient investment from stakeholders. Publishers need to act together to effectively track counterfeits.

Sebastian: Today, the burden is on publishers’ teams to search for infringements and file claims.

Anna: It’s difficult to get platforms to invest in this, because in the end they also earn money from fakes. This raises the question: how do we engage them?

Laurentlm: “Empty” books can damage their reputation—from the platforms’ point of view as well.

Sebastian: This also happens with print books, and it does not seem to be considered important by platforms.

Titusz: It is easy to scrape large amounts of metadata from these platforms.

Laurentlm: So the technology exists; what’s missing is the will.

Anna: I’m happy to see the complementarity in our approaches: verification at the publishing level and at the distribution level. Both approaches are necessary.

Anna: There are also ways to identify writing styles when two authors both claim to be the original creator of the same text.

Publishing Community Group - Protect and Defend Your Content in the Emerging AI Era

Attendees

Meeting minutes

Diagnostics