WebEvolve
2024 Annual Conference

Machine Learning, WebGPU and Media Technologies

28-29 May 2024 · Shanghai

Event summary

Overview

The WebEvolve-2024 Annual Conference was held on 28-29 May 2024. The event focused on the themes of Machine Learning, WebGPU and Media technologies, with a particular emphasis on discussing multimedia technologies, media transport, machine learning, metaverse, immersive Web, WebGPU and related standardization activities on the Web.

The event was hosted by W3C China (Beihang University). Many thanks to bilibili, Khronos Group and Microsoft Reactor for co-organizing this event.

The event took a hybrid format combining online and in-person interactions. Over 90 attendees participated in onsite discussions, with more than 15,000 views on the live webcast platforms of W3C China and Microsoft Reactor, including bilibili, WeChat channels, CSDN and SegmentFault. Onsite and online attendees actively engaged in real-time interactions with the presenters. We extend our gratitude to all the speakers for their insightful presentations and to all the attendees for their support and participation.

Talks summary

Topics discussed during the event are summarized as follows, including public-vislbe slides and recordings authorized by speakers.

Media and Entertainment on the Web: François Daoust (W3C Media Specialist) shared exploration of standardization in W3C in the areas of real-time media, web games, AI, metaverse, and the latest status of those work (See slides, video on bilibili, YouTube).

WebCodecs opens a new chapter in Web audio and video: Jun Liu (senior engineer at bilibili) highlighted WebCodecs, discussing what WebCodecs can do, its applications, advantages and limitations (See slides, video on bilibili, YouTube).

Vulkan Synchronization for WebGPU: Nathan Li (Senior Manager of Developer Ecology at ARM) compared the relationship between WebGPU, WebGL, and Vulkan, introduced the Dawn project (an experiment centered around GPUs), and shared relevant developer resources (See slides, video on bilibili, YouTube).

Update on WebGPU and Web AI: Yang Gu (responsible for Web Graphics and AI at Intel) shared core updates to the WebGPU API and WebGPU shading language specifications, their implementations status in mainstream browsers, and upcoming new features under development (See slides, video on bilibili, YouTube).

WebRTC media transport explorations and signaling standardization applications: Cheng Chen (senior engineer of WebRTC transport at ByteDance) shared insights into WebRTC media transport (including WebRTC media transport flow, WebRTC Insertable Streams, Unbundling WebRTC), and WebRTC signaling standardization applications (including WHIP/WHEP protocols and application scenarios) (See slides, video on bilibili, YouTube).

Overview and update on WebNN: Ningxin Hu (Principal Engineer at Intel Web Platform Engineering team) showcased the hardware engines of CPU, GPU, and NPU in AI PCs, explained how WebNN brings a unified abstraction of neural networks to the Web, access to AI hardware acceleration through native OS ML APIs, providing near-native performance and next-generation use cases, and shared the development progress and implementation status of related specifications (See slides, video on bilibili, YouTube).

ncnn Vulkan Machine Learning UpdateHui Ni (researcher at Tencent Youtu Lab) outlined the ncnn neural network inference framework, discussed the progress in machine learning based on Vulkan (See slides, video on bilibili, YouTube).

WebXR practices in 3D engine: Qianwei Xu (Galacean 3D Interactive Engine team lead at Ant Group) presented business demands related to XR, shared the Web 3D engine (Galacean Engine), reasons for choosing WebXR, XR framework design within the Web 3D engine, and ongoing work on client-side WebXR infrastructure and XR editor (See slides, video on bilibili, YouTube).

Practices of Web Media Processing and Real-time Communication Standards: Chun Gao (senior architect at Shengwang) shared new trends in the RTC industry, and discussed the cases of E2E Encryption, Digital Rights Management, H265 Supporting for RTC and Alpha Video Transmission (See slides, video on bilibili, YouTube).

Matroska unpacking principles and practices: Yanjun Wang (senior engineer at bilibili) shared the background, principles, solutions, applications of Matroska unpacking, and discussed how to further optimize the parsing process and enhance parsing capabilities (See slides, video on bilibili, YouTube).

Exploration and practices of Secure Camera: Wu Zhang (senior technical expert at IIFAA Alliance) explored how to enhance the credibility and security of the entire image stream, through the use of Secure Camera, when facing risks of real-time image injection attacks like DeepFake (See slides, video on bilibili, YouTube).

Exploration on next generation of Internet - Interplanetary Network: Song Xu(Technical Director of China Mobile Migu) shared human activities in exploring space internet, challenges of interstellar communication, and considerations for building the next generation space internet.

Exploration and practice of Immersive Web API: Yifei Yu (technical expert at ByteDance) shared Bytedance's cross-platform team's exploration, implementation, and practical applications of immersive Web API (See slides, video on bilibili, YouTube).

Building a brand new ecosystem with OpenXR: Shuai Liu (XRRuntime tech lead for PICO at ByteDance) focused on introducing the concept and applications of OpenXR, an overview of the OpenXR API, recent updates to the OpenXR 1.1 specification, and future extension plans (See slides, video on bilibili, YouTube).

Vulkan standard and its progress: Kangying Cai (senior researcher of graphics standards at Huawei) discussed the evolution of graphics APIs, compared traditional and modern graphics APIs, and highlighted the Vulkan API and its development roadmap for 2024.

HDR Vivid and Audio Vivid technical standards and related applications: Yun Zhou (senior engineer at the Academy of Broadcasting Science, National Radio and Television Administration, NRTA) shared the architecture, key technologies, features, applications, and standardization challenges related to HDR Vivid and Audio Vivid technologies, and raised futher thoughts on the application of HDR and spatial audio experiences on the web.

Embeddable WebGL + WebXR binocular implementation: Yazhong Liu(technical expert at Rokid) introduced the spatial mini app, its unified rendering, the implementation path of WebXR unified rendering, and future plans (browsers in space).

AI enables intelligent production of accessibility films: Wei Wang (senior engineer at Zhejiang University) explained accessible movies, accessible live commentary, and their production processes, discussing directions to optimize movie subtitle recognition capabilities, how AI can further empower accessible movie production processes, and showcasing their current achievements (See slides, video on bilibili, YouTube).

Web-based 3D digital human development and practice: Lei Zhao (Frontend Development Director at China Mobile Migu) shared the concept of digital intelligent human, its technical development and related practices, looking ahead at the future prospects of WebGPU combined with 3D rendering, generative AI and rendering.(See slides, video on bilibili, YouTube).

Application practices of front-end in digital human authoring tools: Bean Deng (senior engineer at bilibili) demonstrated the "必剪Studio" intelligent editing tool based on voice cloning and digital avatar technologies, explaining the principles, processes, technical challenges, and solutions of green screen clipping, as well as applications such as audio-visual synthesis, audio waveform visualization, online audio transcoding, and SSML visual editor (See slides, video on bilibili, YouTube).