W3C Workshop on Smart Voice Agents

This is a Call-for-Participation for the W3C Workshop on Smart Voice Agents.

The workshop is free. To help us prepare and run the event effectively, please email a brief proposal when registering. Proposals help the Program Committee organize sessions and topics, assess interest and expected attendance, coordinate scheduling (including time-zone coverage), and identify any accessibility or technical support needs. They also help ensure talks align with the workshop goals and provide useful context for discussion. If you are interested to contribute with a talk, please see the information for speakers for details.

General background

Even simple applications with speech capability are becoming more and more popular for ease of user interaction and richer user experience. Some of them are already available in our homes.
Voice agents are essential applications on various devices, especially mobile phones, tablet devices, eBook readers, and gaming platforms. Moreover, the integration of speech capabilities into traditional platforms such as TVs, audio systems, and automobiles has been a recent major technical advancement.

Focus

Firstly, the workshop is designed to review the current status of voice-enabled smart platforms integrating multi-vendor services/applications/devices:
- Voice interaction with smart devices also in the Web of Things
- Control from Web browsers
- Interoperability and access to controls for accessibility/usability, e.g., smart cities
Secondly, we will discuss the current situation in voice interaction technology for global deployment across all languages, including even better integration of LLMs to enhance natural language understanding and generation.
As a result, we will identify and discuss the pain points, technological gaps, and clarify potential impacts on Web standards based on these pain points and gaps.

Possible topics

The following list of possible topics is quite broad as a starting point and will be refined depending on the interests of participants.

Clarification of use cases for smart voice agents and their requirements
Summary of the current status:
- Overview of existing browser support and platforms, for example smart speakers and mobile phones
- Integration of LLMs for voice interaction to enhance browser capabilities and platforms
- Common interoperability issues for smart voice agents among browsers and platforms
- State-of-the-art accuracy
- Implications of voice agents for compliance with regulatory requirements - potential gaps for standards, developers, and regulators
Needs of the users and developers of smart voice agents:
- User interfaces for smart voice agents, including accessibility/usability issues raised by smart agent technology
- Internationalization and compatibility with region-specific technologies
- Enhanced voice interaction with LLMs in the context of smart voice agents for improved usability, addressing issues such as:
  - Hallucinations: LLMs may generate outputs that seem plausible but are factually incorrect.
  - Ambiguity in outputs: Inconsistent or vague responses can cause confusion in automated workflows.
  - Lack of accountability: Identifying the root cause of errors in an LLM’s predictions can be challenging.
- Accuracy of input recognition and resulting actions, for example product identification and descriptions for e-commerce, choices of supplier, platform neutrality, etc.
- Managing input entities (sensors/applications) and output entities (actuators/devices/digital twins) from various vendors and their coordination.
- Addressing presentation issues such as how, what, and when to transfer necessary information from input entities (users, devices, or applications) to output entities (users, devices, or applications).
- Integrating multiple interchangeable modalities (typing, handwriting, voice, etc.).
Horizontal platform considerations:
- Discovery of resources
- Trust, privacy, and security, for example tools such as encryption
- Business aspects and the future of personal agents
Demo of existing voice agents

Examples of related use cases

The related technology area is broad, including:

Voice agents and chatbots in various environments like smart homes, smart factories, smart cars, and smart cities
Smart speakers and smartphones as portals/user devices

For example, Hybrid TV services (Web+TV integration based on HTML5, Second Screen, TTML, etc.) and smart home devices and services, possibly incorporating proprietary technology like MiniApps, can offer the following use cases:

Asking the voice agent on the TV in the living room to order takeaway, e.g., "I want to order a pizza."
Using voice commands to choose the food and saying "checkout" to the smartwatch to process the payment.

Another example is searching for podcast or video content. The user can ask, "Play [topic of a podcast or video]," and the voice assistant will respond with, "Here's what I found," while displaying search results on the smartphone display. A useful user requirement may be the ability to request congruent user feedback (i.e., if voice is used for input, then speech is used for feedback).

NOTE: The above are just a few examples of possible use cases, and the development of use cases and their requirements will be one topic of the workshop.

Who should attend?

Many possible stakeholders, including:
- Service providers/System implementers
- Government
- Users and developers from various countries/communities
- Standard organizations
- Researchers from academia and industry

Program Committee

You can send emails to the workshop Program Committee at: group-voiceagents-ws-pc@w3.org.

Chairs

Deborah Dahl, Conversational Technologies
Dirk Schnelle-Walka, Switch Consulting

Committee

Deborah Dahl, Conversational Technologies
Dirk Schnelle-Walka, Switch Consulting
Gérard Chollet, CNRS
Michael Koster, Dogtiger Labs
Song Xu, China Mobile
Brian Kardell, Igalia
Phill Archer, GS1
Leonie Watson, TetraLogical
Bev Corwin, Consultant
Markku Hakkinen, Educational Testing Service
Kimberly Patch, Invited Expert
Janina Sajka, W3C Invited Expert
Kaz Ashimura, W3C Invited Expert, Nagasaki University