DRAFT Minutes of the W3C Workshop on Web & Virtual Reality

Sean White's keynote
WebVR Intro
1. implementation status, obstacles, future plans
2. Browser updates
VR user interactions in browsers
Accessibility of VR experiences
Multi-user VR
Authoring VR experiences on the Web
High Performance VR on the Web
360° video on the Web
Immersive Audio
Breakouts
Standardization Landscape

#Sean White, keynote.

Slides: The Long Now of VR

Video: The Long Now of VR [video]

Reminiscences from the Placeholder Project (1993). It was immersive but very expensive. VRML 1994. A-Frame, 2016.

The Long Now: how do you enable VR experiences that are unique to the Web?
Account for the range of future mixed reality experiences?
Use VR to make life better in the real world?
Move Fast!

#WebVR Intro

#implementation status, obstacles, future plans

Megan: we'll start with an intro from Brandon on WebVR status and then go into implementation status from implementers

Brandon Jones, Google

This is a quick intro on what WebVR is and where the different browsers are at.

A brief history: this whole WebVR thing literally started in a Tweet exchange, with Vlad suggesting how to plug a browser into the first Oculus kit. It seems so consensually Web to me: no funding, no managerial discussion - just two guys doing it in their free time. And look where we're at - that's the kind of innovation the Web enables.

The initial version of the spec was very different from what we see today - it was based on the hardware available then, built around DK1 (DK2 was not available yet).

The API has been changing a lot, with lots of cooperation between vendors. It's getting better and better. What we've called 1.0 has been revised to get up to par with the latest hardware. Microsoft has jumped in to contribute to the spec, Oculus announced WebVR support.

The state of the art is what we call WebVR 1.1 - which is likely the last version to be numbered - it will just be WebVR after that. This new version is based on great feedback from Microsoft to make sure WebVR can work with their platform, removing guess work for the developers and tightening up behaviors.

There are more non-backwards compatible changes that are upcoming - but we have to do them now. We're getting feedback from lots of people doing VR, feedback to make sure we have the right set of capabilities for the long term.

We are working on bringing Web Worker compatibility which is quite important - the main thread in JavaScript is a big mess, and workers allow to remove some of that constraint, even to the cost of backwards compat. We're also getting feedback from a Web-platform perspective, making sure we get the right model.

What's next? At a mundane level, we want to keep extending the API - VR layers; we want to allow more WebGL extensions targeted for VR (e.g. multiviews). WebGL 2 is going to come out reasonably around the same time of WebVR. And likewise, we want to get ready for WebGL next.

We also need to keep track of evolutions in the VR hardware.

We also need to focus on tooling - there were great announcements from Oculus recently. Some people were asking about this as being a competition to what I've beeqn working on - but it's a non-issue, it's actually great - nobody is trying to get control or take over the ecosystem. We need more tools, more browsers, more support for people who want to build content. There is no bad side to doing that.

We have decades experience in the browser which we have been ignoring - we need to bring Web & VR to get closer.

We need to look into support new media formats that are VR native - gltf (not specific to VR, but better support in browsers would be fantastic). New media formats for video and audio that are very spatially aware with lots of info on the environment - we don't want every single developers to reinvent how to get that on the screen.

We need to make sure audio is a first class citizen in VR - there is lots of interesting innovation on the audio side. Good VR without audio is missing half the point - it should not be an afterthought.

We want to make sure the Web enables AR scenarios. [screenshot of hololens user]. WE want to enable full use of capabilities of Hololens or Tango capabilities.

In the end, we really need to think about how to move past this of the current VR experience of a modal app - VR is a very exclusive thing where you launch an app, you use it for a while, and then you move to something else. If we want VR to be a general computing platform, we need to move beyond that. It's extremely important, especially for the Web.

Most people here would agree we're not where we want to be for VR yet.

How do enable composition of activities, allow overlap and still keep it functional. This will require lots of experimentations and failures, but we will get there.

#Browser updates

Megan Lindsay - update on Chrome. We are building support for WebVR 1.1 in Chrome for Android, including the game extensions. We're targeting M56 for daydream, should be in beta in December, and stable release in January. We will first release as an origin trial - web sites need to request a (free) token to get access to the API. Origin trials:

The API is still changing quite a bit, so this enables us to get feedback from developers before a stable release.

There will also be cardboard support coming soon.

We're also working on WebVR in Chrome for Windows desktop, with plans to support both Rift and Vive - not sure if we're targeting M56 or 57, as origin trial.

Beyond WebVR, we're working on a version of Chrome that supports browsing in VR for regular 2D sites. Timeline towards first half of 2017 for daydream, with later support for desktop.

We're also interested in standardization on 360° videos, and we also want to look beyond WebGL with declarative VR. We want to enable existing web sites to add VR features without a complete rewrite.

We also want to look at AR but we're not sure what this would entail yet.

Frank Olivier, Microsoft

WebVR is in development for Microsoft Edge, it will be coming up in an insider release at some point. It's built on Windows holographic APIs. We are also very interested in advancing the specifications - we think there will be needs in this space for years to come.

If you maintain WebVR content, we're definitely interested in running it in our browser, to test it beyond our test suite.

Justin Rogers, Oculus

Carmel Developer Preview based on chromium m55 should come pretty soon, with support for WebVR 1.1, and limited support for Gamepad. Target Gear VR and Touch controler. No 3D motion integration.

In Nov 2016, developer release of Carmel, and of React VR.

We'll do another update with M56 in Jan/Feb 2017

We will work with other partners also work in other browsers.

Chris Van Wiemeersch, Mozilla
We started in 2014 as an experimental project based on Brandon and Vlad's work, who is also involved in WebGL. We had an API and wanted to see if we could render content in a browser.

The workflow is: you go to a Web site, you click on a VR button and you switch to a stereopscopic view.

This was a demonstration that this was possible.

It wasn't until Justin Rogers did a hackathon with a deep dive on the PoC API, and made lots of great suggestions to improve performance, developers ergonomics and API. http://www.justrog.com/search/label/WebVR

We started a WebVR Community Group - it has 100+ participants, it's free, chaired by Brandon and myself.

As we introduced more hardware support, we had to update a number of our assumption in the APIs.

WebVR is getting a lot of legitimacy with "native" VR developers.

Casey and @@@ worked on a prototype of what a VR browser would look like, demonstrating e.g. how to click on a link, transitioning to a new view.

This is built around iframes - we don't have hooks for handling navigation transitions.

We created a backwards-compatible VR browser to find deficiencies in the WebVR API.

In the demo, we navigate in regular 2D pages, controlled by Xbox; when moving to a 3D page, it switches to immersive 3D view. This made us realize of some deficiencies in the tooling - that's where a-frame came in: it catalyzes the creation of content and also helped us assess the API is viable.

[these examples show VR doesn't have to be gaming or action oriented - it can be meditative]

[this demo illustrates the performance and interactivity you can get in Web browsers for high end VR]

https://iswebvrready.org/shows interested browsers, support for the various APIs - it's collaboratively maintained, including by other browser vendors.

Casey Yee, Mozilla

It's crazy to see all the work that has been happening for past couple of years. The platform work is headed up by Kip @@@ who couldn't make it today - I'm speaking on his behalf.

A lot of the focus has been around getting the API implemented, making it work in our browser. The APIs are available in Nightly, with support for HTC5, Oculus Rift, and working on OSVR(?) support.

As we continue with implementation, we are trying to get a pipeline and process to bring these APIs in production form. We have 1.0 API support, and we're planning to have these APIs available in the stable browser for Firefox 51, available in Q12017.

It will be available in a production browser, so available to lots of developers - which is very exciting.

It's going through our beta API program.

Beside performances, we have been working with other teams needed for VR: audio, gamepad, WebRTC for co-presence - lots of exciting things coming up.

Laszlo Gombos, Samsung

I'm going to talk mostly about mobile VR - as enabled by Gear VR.

The Gear VR has an external touch pad, and an external usb port which enables to plug more input devices.

Gear VR has pretty large field of view, it has improved tracking - when docked, it doesn't use the mobile device sensors but its own IMU.

We built a VR-first browser - it not only supports WebVR, but it also enables browsing the regular Web in VR. For instance, to consume existing medias in VR - kind of the TV use case.

This is good way to get people hooked up on VR content.

We try to take advantage of the space around you for interactions.

This shipped in Dec 2015, 10 months ago. We got pretty positive feedback.

First stable release in March 2016, with WebVR support based on M44 Chromium. The WebVR support has to be enabled specifically though.

We are 1.0 since April, and our latest release was in August.

This is a VR-only product that users have to download - we reached over 1M downloads!

If you are in mobile and in the samsung browser, watching content/video, and dock the mobile in the headset, we will switch you seamlessly to the VR browser.

We support voice input for text. We have curated content.

We have an extension of HTML video support.

We allow the Web page itself ot change the environment (e.g. show a background scene image around the browser window in immersive view) - this is already shipping, the skybox JavaScript API. “Progressive enhancement” from 2d Web.

[showing demo of Skybox API]

What's next: we want to enable WebVR by default, which requires improving performance; we want to bring more control to the VR environment e.G. with the Skybox API.

Discussing what bringing the latest improvements of Web Platform to VR is also quite exciting to me (e.g. Progressive Web Apps).

#VR user interactions in browsers

#Designing the browser (Josh Carpenter, Google)

Slides: Designing the Browser

Video: Designing the Browser [video]

Josh: been working on figuring out how we want the experience of surfing the web with a VR headset. As designer, want to create possibilities - create the conditions so others can make experiences.

By the way - that (demoed on screen) is all CSS.

“Cambrian explosion” of browsers. For years we’ve been trying to cram browsers in those tiny screens. In comparison - VR is an infinite canvas! We are about to see an explosion in browsers.

What are the things we think all browsers should do consistently? Somewhere between pure freedom and a few guard rails:

Avoid dead zones - give developers freedom to design
Facilitate speed - this is where the web can compete (not necessarily graphics but speed, especially speed of loading). Show demos of transitions.
Big paws and oil tankers - obligation to learn from nimble VR-first projects

#Link traversal in WebVR (Chris Van Wiemeersch, Casey Yee, Mozilla)

Slides: Hyperlinks

Video: Hyperlinks [video]

Casey and Chris. Colleagues at Mozilla. Always wanted to move between content in VR. Hyperlink is a fundamental portion of what we think the web should be, want to port that to Web VR.

Web evolution. Move from Page to World.

Our responsibility, what we have to do to preserve future of the web as we know it:

User Control - choice of where to go and how to get there
Security - no surprises. Protect the user from bad surprises, cross scripting etc
Openness - enforces interop. Make sure these aspects of navigation remain

Demo (maybe??). OK, no demo. Videos.

Representation of the link as some kind of orb in space

Note this is not something discussed in depth in the webVR community. This is an invitation to come and discuss the how - this was only the why.

#Hand tracking and gesture for WebVR (Ningxin Hu, Intel)

Slides: https://huningxin.github.io/webvr-hand/#/

Video: Hand tracking and gesture for WebVR [video]

Want to talk about hand tracking and interaction.

Quote Brandon Jones: “They watch some demos or play a game and walk away saying how impressive it is, but almost everyone makes a remark about how they wish they had hands.”

Today’s techniques generally using depth camera. Create depth map.

Two kinds of hand tracking: skeleton tracking and cursor tracking

Demo of skeleton hand tracking - for avatar hand tracking. More computing intensive, thus challenging for mobile devices as it impacts the battery life and may introduce latency. No standards for this today, typically using websockets. Area for standards?

Cursor tracking models the hand as one pointer in space. Lighter weight, requires less computing, and can be more stable and accurate which makes it better for accurate UI control.

Another aspect is gesture recognition. This delivers high level events. Recent trend to use hardware acceleration to handle this. This area is quite heavily patented, could be problematic.

Quote “So my prediction is that in five years we'll see good avatar hand tracking and gesture-based simple interface control”- Michael Abrash

We need to get open web ready for that.

#Coordinate Systems, Spatial Perception in Windows Holographic (Nell Waliczek, Microsoft)

Nell: one of the developers on the Windows Holographic dev platform. Now enabled in mainstream PCs.

Want to talk about spatial perception.

Need to know where the user is. Hololens does that without external HW. But we also need persistence of the world around us. “Go anywhere” with hololens (photo in ISS).

What’s the catch? No such thing as fixed, global coordinate system. Landmarks may not be where they’re expected to be. Other issues sucgh as low lighting, obstruction etc.

Solution? Spatial perception.

Attached frame of reference
Stationary frame of reference
Spatial Anchor

Best practices (lots of bullets!) @@ need link to slide

#Ray Input: default WebVR interaction (Boris Smus, Google)

Slides: https://docs.google.com/presentation/d/1k1dAEKB7axIOum5OiwNTcq4vVpHjSlbQdkjeEDdVKgw/edit#slide=id.p

Video: Ray Input: Defaults for WebVR input [video]

Want to talk about Ray Input. But first, go back to 1998. The web had Pages, scrollbars, blue links etc. Imagine if we only had a mouse and no interaction standards to start with?

2016 VR world? Yeah, that. What should the input patterns be? Currently a lot of experimentation, which is both good and frustrating for end users.

Proposing laser-pointer style defaults, and fallbacks for non-VR platforms.

[Video] shows all the interaction modes, depending on platforms:

Mouse interaction. Look around with mouse lock, ...
Touch interaction. Touch panning, tapping
Cardboard mode. Tap anywhere and interact
Daydream - ray-based interaction

Fairly simple API, open source.

Questions: where do you position the daydream controller. Arm model? Give it orientation + position. Built open source simulator. Feedback welcome on github. @@ link?

#Samsung VR Browser for GearVR learnings (Laszlo Gombos, Samsung)

Laszlo continues from earlier talk. Want to talk about how to solve issue of tab management in VR. Currently feedback is along the lines of “too complicated”, or “have to move head too much”.

Came up with another model, but then people started asking for more windows.

Mike Blix takes the floor to talk about Skybox API. Goal: easy enhancement for VR. Change skybox environment in VR web browsing environment. Discussions on how to declare.

API is fairly simple. Use case for preview of webVR content, for example.

#Group Discussion / Joint Summary

Opening the floor. Potch reads questions from slack.

How do you bridge the gap [didn’t get details]

Nell: tell me more about what you’d like to see?

Question on raycasting. How do you have arm model without extra input?

Boris: heuristics based on fixed offset between head and elbow.

Also on raycasting. How do we borrow more interactions from 3D game developers. Look to huge library of 3d games and borrow beyond interaction with basic pointers.

Boris: still very new

Philip S: great start, but there are so many other interaction patterns we want to bring in and make reusable. Maybe topic for breakout tomorrow.

Brandon: there’s a slack channel we just created

Jim Bricker: question on gesture control: anyone thought of using device everyone already knows (smartphone)

Ningxin: depends on smartphone vendors adding the right sensors.

Potch: how do we find basic standard set of gesture given cultural variety

Ningxin: [missed response]

Philip S: interesting thing we’ve done: hand facing out interacts with the world, looking at palm of your hand shows menu

Sean W: worth thinking on IEEE VR, not to copy them but worth adopting some of their work on e.g. click

Question for Samsung folks on skybox: can you do things like gradients.

Mike: you can generate color image dynamically, etc. Sharing details on slack.

Chris vW: how do you imagine implementation of [??] (skybox, scribe assumes?)

Laszlo: you have basically HTTP headers, meta tags or js. Discussion on which one you want. Our intent would be to standardise.

Nat Brown (Valve): not enough thinking about room scale. Transition important in larger areas.

Potch: leads us to goals of this session - identifying need for basic primitives. Maybe better primitives for location and room scale.

Brandon: reiterates needs for common primitives, using the idea of scrollbar as example. This is like native apps where they all control differently. There is value to that, to having things purpose built. But we want to give the web primitives that work in more predictable way.

Philip S: we also should think about AR at the same time, because it is different - there are objects in the world you want to interact with. Need to think about that.

LucC: unifying by thinking of AR as different only in transparency?

Brandon: main difference is environmental knowledge.

Nell: if you’re not trying to bound user to where they are in environment, none of this is hard. Transparency is only a piece of that. Simultaneous connection to reality of place too.

Chris vW: webVR is the more mature of the immersive APIs. But we’re going towards 3D browsers, while developers are used to 2D. It’s really about getting the web into 3rd dimension, not just about VR, AR, MR. 3D web.

Philip S: want to add to the AR discussion. In a VR world everything is in that world. Different interactive distributed simulated environment.

Diego: 1 to 1 mapping of space to movement for room scale environments. Ability to integrate real objects into the VR world. We can explore this.

Dom: tries to summarise things needing attention from standardisation perspective:

Gesture recognition (but culture)
Perception API
User input model
Skybox API
Keep AR in mind for interaction model
transitions / flash of light

preload next page/experience (header?)

flow / simple layout (via library?) for 3D
collision detection / event management
priority zone (avoid getting lost)

how to navigate VR experiences / design navigation mechanisms
guiding the user: what is the standard model to get oriented

Jason Marsh: (pre)loading/whitespace transitions. How do we know how that feels/should work?

Chris vW: ongoing work on transitions. Going to help 2D but will also help VR immensely. (@@link?)

James: simple layout, flow. Like in CSS. Also need something for collision detection.

Jeff Rosenberger: need for a priority zone so user does not get lost. I.e., a place where new apps appear “in front”

Casey: navigating VR environments. Need to design wayfinding mechanisms.

Poch: good segue into accessibility set of lightning talks.

#Accessibility of VR experiences

#Accessibility Not An After Thought (Charles LaPierre, Benetech)

Slides: https://benetech.app.box.com/s/0vgnivm0pawfti0izzs11srppp9dfn1o

Video: Accessibility Not An After Thought [video]

Overview of accessibility at W3C: crosses a number of domains and disabilities. Disabilities to consider include auditory, cognitive, neuro;ogical, phsicial, speech, visual.

Needs to be part of all aspects.

Multimodal: Accessibilty is inherently providing information from modality in another.

Both inputs and outputs need to be accessible.

Example, an accessible shopping experience. If one user uses gloves to interact with video, another visually impaired user might have self-voicing option. Proximity. More detail. Speech input,

Consider multiple disabilities, e.g. deaf-blind. May need another input or one that hasn’t yet been invented.

Consider all the inputs in a shopping experience. Sight, touch, description; how would we present the “shop for a dress” experience to all the senses, including for someone who lacks some of those senses.

Make accessibility a first-class citizen. Work with accessibility groups.

Built accessibility use cases, look for ways general interfaces support accessibility.

VR to help the disabled, e.g. to help train with prosthetics…

Consider lessons learned from making videos and images accessible.

#Browser UX in VR (Justin Rogers, Oculus VR)

Slides: https://onedrive.live.com/view.aspx?resid=DB996F916EA4CF47!60653&ithint=file%2cpptx&app=PowerPoint&authkey=!AAsddctAo_Xqqsc

Video: Browser UX in VR [video]

I’m talking about making Web content accessible to all. Is 2d web accessible in VR? The web still has lots of 2d content. Input is limited.

What are the most used applications? Virtual desktops and browser VR. They’re sort of uncomfortable: it’s an accessibility problem for all of us.

4 key areas to look at:

High quality reprojection.

Fix text. Cylinder surfaces.
Good UI, positioned for head comfort.

Nail input.

Most inputs can be resolved with gaze
Big hit-targets, Hit-target attraction
Link disambiguation UX
Voice commands, simplified ux
Limited context

Design for security and trust

anti-spoofing

Redesign for the medium.

#Mixed reality (Rob Manson, awe.media)

Slides: https://docs.google.com/presentation/d/1FHbh0XrOcsPNs5pjJs9oFPU-D5p2bTZ2wVtkGFVz5wk/

Video: Web-based Mixed Reality [video]

I’m from Awe Media. Pitching development of AR in the web browser, this is a critical time.

Mixed reality on the web. Real-world input into virtual space. Geometry of tech shape how we use it, changes innovation. Consider the evolution of the camera, from camera obscura, brownie with viewfinder, SLR, LCD screens, selfies.

Geometry of spatial apps in HTML5. Objects i the real-world that exist beyond “far” How can we bring depth in?

Orientation and position.

#Discussion

Question: what about color-blindness?

Charles LaPierre: We want to be able to allow shifting of color spectrum, manipulated by APIs or by user. That’s possible now. We can use VR to demo what a person with color-blindness sees in different scenarios, and how to make it more accessible.

Brandon: I am red-green deficient, so this is important to me. In VR today, there’s already some post-processing, so we could add color-shifting. Also, volumetric data.

Shannon: I worked on ARIA spec, declarative attributes on HTML DOM tags for purposes and intents. We should look at how that can be adapted to WebVR.

On-click element, declarative.

LucC: Look at solutions from “real life”, diversity of accessibility solutions, we should make it possible to adapt these in VR.

JamesBaicoianu: Re post-processing, look at CSS, users can specify styles that override the web author.

Anssi: Declarative, learn from CSS. How do the existing accessibility tools for the web fit into WebVR? What abstraction layers?

Charles LaPierre: We need low-level APIs that can interact with existing assistive tech. That exists, e.g. braille display, tongue sensors for visualizing imagery. Where APIs exist, you should be able to interact with those in VR. Give developers hooks to those APIs. Personalization is a big part of accessibility: not every blind person can read braille, so let them work through the modes they do use.

Philip: HDR displays will introduce new issues. Extended color gamut makes RGB look dull. Contrast ranges need to be mapped to displays.

Casey: Lots of accessibility already built into the browser. Use it.

There’s also machine accessibility. How search and links work.

Potch: I’m looking forward to VR alt-text.

Is there a need in primitives to separate visual from spatial? How do we take the interactive and spatial components and not make assumptions about users’ characteristics?

Chris: A-Frame meant to be the JQuery for VR on the Web. document query selector proved the value, then got standardized and implemented.

Big q: Should every entity be represented in the DOM?

Q from slack: klausw: accessibility: Roomscale/6DOF has its own challenges, think about wheelchair-bound users, people with only one arm, very short or tall people. Many of the same accessibility/ergonomics issues as for real-world locations. I remember having to lift up my daughter to reach the virtual pot in Job Simulator.

Brandon: Cool utility @@.

Possibility for extension utilities to help with common tasks, e.g. “make me taller, to reach items”. As patterns emerge, maybe they become part of the platform.

Klaus: do we have the same expectations for all sites? Difference between action games and shopping?

Charles: For an image, we can have a tactile representation, but feeling it doesn’t make sense without a tour to describe what you’re feeling.

3D audio is great for people with visual impairments. You can create amazing 3D audio games; so games can be made a good experience for people with disabilities.

@@: MSR game with no visuals.

Justin: some preferences we’ve exposed in WebVR, e.g sitting to standing pose. These keys can be valuable to improve the experience. Be careful about privacy considerations. As we tailor experience to a user in a wheelchair, are we exposing to fingerprinting/tracking?

Q regarding text.

Justin: we’re looking at SDF fonts, we don’t have a canned solution to improving 2d text.

Brian Chirls: top-down vs responsive design. If you take accessibility into account, you get lots of great stuff for free. Even a person who has all the VR equipment for roomscale sometimes wants to sit on a couch. Can we build features you don’t have to “turn on”?

Anssi: list of topics. [on screen]

@@: API that faces up, and the API that faces down to the hardware. Accessibility works well when device devs can experiment with serving user needs.

#Multi-user VR experiences

#Internet-scale shared VR (Philip Rosedale, High Fidelity)

Slides: Internet-scale shared VR

Video: Internet-scale shared VR [video]

Philip: We’re working on an open source license that enables you to do face-to-face interaction. I’ll present some assertions and discuss what we’re doing.

[Assertions slide]

There are two reasons that this is disruptive. The first is the order of magnitude of interactions, the 2nd one is face-to-face communications. Allowing hands and heads to move naturally is the general use.

Client-server model, enabling interactions, rather than app-store centralized

[New Technology slide]

There is a need for low latentcy - millicseconds needed

3D audio and avatar motion

Compressed scene description

Real-time scene / object

Distributed servers.

[Areas for Standards and Cooperation slide]

Identity needed - bring your avatar, name and appearance

Content portability - interactive need

Authenticity for assets - not DRM but verify that things are what they say they are

Server discovery - something beyond DNS

#From clickable pages to walkable spaces (Luc Courchesne, Society for Arts and Technology)

Slides: From Clickable Pages to Walkable Spaces

Video: From Clickable Pages to Walkable Spaces [video]

Luc: There have been many things done on this topic including @@@

[picture of @@@]

Transitions slide

How do we move between experiences

Most people will not care - they will go together without thinkng about it

[slides of sharing experiences]

Now with HMV immersion things can be experienced more easily

WebVR slide

This is going to transform the Web into multiple spaces

We understand the need for Avatars to invite people in

The limitations are a problem

Slide of topics

Virtual “situation room”

“On site” design review

Goal is to build virtual teleportation platform

I’m going to show you what we did at SAT over the past few years on that

Video of test work

In 2012 we moved to Kinect

We could extract the forward from the background

You feel the presence of someone from their natural body language

2013 The Drawing Room

By using 3 cameras we could port the system in real time

What we’re working on now is ### experiments

We need help - find us if interested.

#
Multimedia & multi-user VR (Simon Gunkel, TNO)

Slides: Multimedia & Multi-User VR

Video: Multimedia & Multi-User VR [video]

Simon: I work for TNO in the Netherlands. I’m glad to be hear and am very impressed that many of the topics we’re struggling are being addressed.

We are looking for attached experiences vs. detached experiences.

At TNO we do applied research so today I want to talk about my first experience and some of the problems along the way.

Say you’re sitting on the couch and want to have an experience.

This is a mock up as it doesn’t show a headset, etc.

This shows people looking at each other and feeling their presence.

People slide

How to engage people while letting them interact in 360/3d

How to position people

Interaction with environment

Multimedia Objects slide

Synchronization

Dash streaming and tiling

Adaptation of spatial audio

Browser slide

Different browser support webvr

Different browser support of Spatial audio

Performance of JavaScript and webgl

Conclusion slide

You can make a natural interaction but you have performance concerns

#Mixed reality service (Tony Parisi, Wevr, on behalf of Mark Pesce)

Slides: Mixed Reality Service

Video: Mixed Reality Service [video]

Tony: Mark in in Sydney. I’m going to talk to you about his new service.

About 22.5 years ago we pulled together enough money to send Mark to the first WWW conference in Geneva

It built the spinning banana running on a server in CERN

We’ve kind of gone full circle. Pokemon Go delights millions but causes odd behavior

Mixed reality service provides this metalayer layer, binding the real and virtual worlds

Very simple protocol showing ‘add’ operation

Delete is the reverse of that

And the search is going to let you find a service set of URI’s

[demo]

We think the world needs this type of markup.

#Copresence in WebVR (Boris Smus, Google)

Slides: https://docs.google.com/presentation/d/1k1dAEKB7axIOum5OiwNTcq4vVpHjSlbQdkjeEDdVKgw/edit#slide=id.g184a962562_0_71 Co-presence in WebVR

Video: Copresence in WebVR [video]

Boris: As we all agreed we are building an isolating technology

List of components used for demo

You enter and have an avatar you control

You can see other people

You can shrink or grow and sound scales accordingly

3DOF Pose Audio Stream slide

Shows p2p connection

0(n2) will not scale

[slide showing streams]

Some fun features

Mouth moves

$$$$$
This all doable, but it’s just a demo

There is a lot to be figured out

How do we do this, what is the path to identity, avatars and payments

#Discussion:

%%%%: A couple of the talks spoke of users and finding them in a space. This requires things working together. Has anyone started working on this standardization? Feels like everyone’s working on their own implementations.

Tony: We’re addressing some of that and it seems like a service layer that’s waiting to happen.

Potch: How do you position people in space where Avatars aren’t near each other. There is balancing of the needs from many perspectives.

#####: When we’re talking about social interactions we’re talking smaller numbers. We have differences in real world spaces. It doesn’t make sense to have everyone in the world, just the 2 - 3 of your friends that want to talk.

Potch: A more practical question. Where do we see the authoritative database being hosted.

Tony: Good question, who’s the authority over a certain space. We hope as the problems become evident we work on them together to solve them.

^^^^: There are a lot of projects to make data more distributed and less centralized. We’ll have to see how this evolves.

Nicoli: There was a session on latency. What’s the real need, is it 100 ms or less? If someone speaks to me and I don’t hear it, was it too long?

@@@: I don’t know if it’s any easier in games, most have a minimal latency, I think we have a lot of work to do on this. We should look at game engines to learn about low latency and figure out how to scale that.

Potch: question about having a private UI and a public avatar. Is there a way to have me see things you can’t. It gets to questions of privacy.

^^^^: I think people are going to go towards everyone seeing the same thing. I think several solutions will be as literal as they can be.

Wendy: That conversation raises the question to me of how much do we expect the environments to be end-user customizable. How do I bring my own private annotations, accessibility tools, maybe I have things I want to share with friends.

Brad: If you want to record someone’s phone conversation you have to tell them. So if you were to bring something into a VR environment and record someone else’s VR would that be legal?

%%%: If I try to do something in AR do I need to worry about copyright or do I just get pixels.

Dom puts up list of topics that were discussed in the session.

Discussion on the use of avatars and the security around them

Privacy/anonymity: for some users and situations, it’s dangerous to be identified, for others, it’s unwanted. VR needs to accommodate multiple identities and pseudonmys/anonymity.

The Internet is composable. It’s not uniform. Will VR be a metaverse or an Internet?

Charles: drawing on that, if you are paralyzed you may not want to show that in VR. You may want to have an alternative presentation.

#####: People react to your avatar and in VR we need to see if people are more respectful to each other.

@@@@: A lot of stuff being discussed here was covered in Virtual World discussions so it will be interesting to see how that’s changed over time.

Potch: To what extent is the user agent going to impose restrictions on the character model.

%%%: The idea of smooth degradation of avatars needs to be considered as well.

#Authoring VR experiences on the Web

#HTML & CSS, (Josh Carpenter, Google)

Slides: HTML & CSS

Video: HTML & CSS [video]

(Demo of 2D website viewed in 3D)

WebGL - steep learning curve, no consistency of user experience. “Recall the Web of the 90s where we each had to implement our own scrollbars.” Z-depth is the drop-shadow of VR.

Permissions model for depth, akin to full-screen, permission to come closer.

#WebVR with Three.js (Ricardo Cabello, Three.js)

Video: WebVR with Three.js [video]

JS 3D rendering library, open source

threejs.org

(Chrome) Browser extension: WebVR emulator, interact with 3D environment without HMD

(Slides showing JS code)

#A-Frame (Kevin Ngo, Mozilla)

Slides: A-Frame

Video: A-Frame [video]

aframe.io

Web framework for building VR experiences

Based on three.js

Custom HTML elements: a-scene, a-sphere etc

Demo: Hello Metaverse

Works well with most other JS-based frameworks

Entity-Component-System under the hood

A number of built-in components, easy to expand with custom components

Registry: Curated collection of components

Inspector: Inspect and modify components

A-painter: Paint in VR in the browser

#React VR (Amber Roy, Oculus VR)

ReactVR: Bridge needs of web developers and VR developers

React: JS library for building UI for the web (Facebook)

ReactVR, baed on three.js, WebGL, WebVR

Key features: Based on React code - diffing and layout engines, code combined w/declarative UI, behavior/rendering in one place, optimized for one-page web apps

(Code example)

Code transpiled to JS

Demo: developer.oculus.com/webvr (Hotel Tour)

#XML3D (Philipp Slusallek, DFKI)

Slides: XML3D

Video: XML3D [video]

XML3D: declarative 3D framework extending HTML5

Generic data types + data flow programming (Xflow)

Programmable shaders

Renderer-independent: WebGL + shade.js

XML3D-NG

Uses Web Components, WebVR

Smaller core library, domain-specific components

Shareable web components from remote library

Core data model: Data tables and entries

Code example: Replicate <a-sphere> using attribute binding and core elements

Demo: WebVR plugin: King’s Cross Station (did not work)

#Webizing VR content (Sangchul Ahn, LetSee)

Slides: Webizing VR content

Video: Webizing VR content [video]

Some experiences with current VR/AR approaches

Issues: Limited functionality, no way to refer to real world, separated rendering context

Expectations: Mashable, dynamic, responsive to both virtual and real worlds

Demo: A mobile AR web browser

Requirements and opportunities for standardization:

Evolution of HTML for VR/AR
“Physical things” as resource
New media type for VR/AR

AR demos: AR book inspector, MAR.IO AR game prototype, LetseeBeer

#gLTF (Tony Parisi, Wevr / Amanda Watson, Oculus VR)

Slides: https://docs.google.com/presentation/d/1BRdEGqJFIWk3QOehOxJqM9dIE4kIBNQhIm7UeBaVse0/edit#slide=id.g185e245559_2_28

Video: gLTF [video]

Real-time 3D asset delivery

No file formats specified in WebGL

Or for 3D in general

.gltf : JSON node hierarchy, materials, cameras

.bin : Geometry, …..

(and more)

Oculus: A need for a standards format for 3D scenes

Spec and issues discussion on github

“The JPG for 3D” !

#Vizor - visual authoring of VR in the browser (Jaakko Manninen, Pixelface)

Slides: Vizor - visual authoring of VR in the browser

Video: Vizor - visual authoring of VR in the browser [video]

Publishing platform for VR content on the web (creating, publishing, discovering)

Visual 3D compositing editor

Visual programming language / Node graph for programming

One-click publishing to the web

vizor.io : Discovery system on the web

#Discussion:

Q: Relationship between gLTF and a-frame?

A: a-frame intended for hand-coding, gLTF for exporting from tools. A-frame is a JS framework, gLTF a file format.

Samsung research: Encourage browser vendors to support/optimize for higher level formats. Browser should evolve into high-performance game engines

Also, browsers should natively support gLTF

Josh: Would like to reimplement the browser on top of WebGL, then solve backwards compatibility

Q: Already lots of standards, do we need another one (gLTF)? (ref. XKCD)

Neil Trevett: Yes, we need one that addresses this use case.

Q: Is gLTS positioning itself as the baseline spec for WebGL/WebVR?

A: “Firewall” between WebGL and gLTF spec work, no intention of mandating gLTF for WebGL.

#High Performance VR on the Web

Chrome Android learnings & pitfalls to avoid

Slides: Chrome Android learning & pitfalls to avoid

Video: Chrome Android learnings & pitfalls to avoid [video]

Justin Rogers, Gear VR Performance Tweaks and Pitfalls

Slides: https://onedrive.live.com/view.aspx?resid=DB996F916EA4CF47!60659&ithint=file%2cpptx&app=PowerPoint&authkey=!ABO1DWOmsUPuBec

Video: Gear VR Performance Tweaks and Pitfalls [video]

"Android & Mobile is really hard"

Building a WebVR Content Pipeline

Slides: https://onedrive.live.com/view.aspx?resid=DB996F916EA4CF47!60661&ithint=file%2cpptx&app=PowerPoint&authkey=!AK84_AzJVjRm3qE

Video: Building a WebVR Content Pipeline [video]

WebVR Next with more layers

Slides: https://onedrive.live.com/view.aspx?resid=DB996F916EA4CF47!60663&ithint=file%2cpptx&app=PowerPoint&authkey=!AJF8T-vOzNiR_ZE

Posh: reading from Slack - jank caused by GPU uploads - what can we do?

Brandon: no great answer, but for a lot of these things (e.g. texture uploads), there are specific mechanisms that allow to separate out pieces in the texture upload. Right now, you use an image tag to load an image, and once done, you ask to browser to upload it, all of that can be blocking. What you really need is having compressed textures all the time throughout the pipeline, and we should expose that as a primitive to the browser.

@@@. What optimizations are available for 360° videos?

JustinR: lots of big problems for 360° videos - e.g. how to stream it. Can we do a better job of sending information to optimize some of these streaming concepts? The server could optimize which slice to prepare for the viewer.

Paola: how do you measure performance? What are the metrics? What are the tools?

Brandon: in Chrome, we have really cool dev tools, although somewhat undocumented and a bit hard to discover. The timelines give you beautiful graphs with very nice clear ideas of what takes time in your apps - we've some screenshots of that throughout the day.

Klaus: back to the room - to what extent the developers want this exposed to JS? What should it look like? What metrics would be most useful?

@@@JanusVR: is it now possible to do @@@ with renderTexture?

Justin: it's one of the bug we hit in Chromium - I think that has been fixed in drivers

Brandon: in WebGL 1, if you want to use renderToTexture, you can't do multi-sample @@@. Won't be fixed in WebGL1, but WebGL2 will fix this - but you shouldn't deal with that type of approach for VR. For mobile devices, you want to use well-tested techniques from the 90's, multi-path is not there yet.

Justin: for your main rendering process, use a simple WebGL context, especially on mobile

ChrisVW: mirroring can be important for demos esp on Desktop; what's the alternative to preserveBuffer in that context?

Justin: for demos, that's probably OK, although you'll find devices for which that will fail. But don't keep it in production. It's useful for demos or development, even debugging.

ChrisVW: have you investigated asm.js or WASM?

Justin: no

Brandon: mirroring is good on desktop, but never do it on mobile. You can chromecast content from the headset to the TV - this would be the better mechanism, with hardware acceleration. Don't turn preserveBuffer on on mobile.

Anssi: what are the changes needed in Web facing APIs? What needs fixing?

Dom: https://hillbrad.github.io/sri-addressable-caching/sri-addressable-caching.html

@@@: WebVR doesn't reflect any performance back to the app - e.g. on texture size, resolution scale.

Brandon: we have performance extensions for WebGL, although there is some additional cost coming from the compositor (which we hope to be able to turn off e.g. on mobile immersive). The concepts behind the requestIdleCallback API of a time budget could usefully be reapplied to requestAnimationFrame

@@@: we need to know how much GPU time you have left, not CPU time

Ricardo: what about controllers? Will they be integrated in the gamepad API or a different one?

Brandon: for one, we want to expose some really basic fundamental interaction models, regardless of what you're working with; on cardboard, the one-button interaction should be surfaced as a touch event, which can then be replicated in more advanced devices.

For gamepad, we should poll all the information at the same time.

Justin: Another crazy idea is to keep GPU cache across navigation, assuming we have the right conditions from a security perspective - this is critical to enable smooth navigation.

Ricardo: I'm wondering if at some point the browsers will give us the glTF model for controllers; right now we have to manually change this.

Brandon: OpenVR has this; I've been wondering if we should do this, not sure how to do it in a Webby way. It would be nice to defer to the browser and just ask for a representation of the controller. But I'm not sure what that API would look like - we could just resurface the OpenVR API, but we probably want something more useful.

Potch: We have ServiceWorker that can do request caching; we could have a way to get the SW to get back the fully decoded ready-to-use asset as a response. Sounds like a logical extension to an existing API.

@@@: catching CPU resources across navigation - we have very limited GPU resources, and have no indication of back pressure from the GPU. WebGL doesn't easily let us recover from exhausted resources. Right now we have to manage this blindly. Regarding dynamic resolution, there is nothing in the WebVR that tells us the target framerate.

Casey: how about non-graphics optimizations: physic, spatialized audio?

Brandon: I would love to see more tooling around audio and making that an easier thing for developers to get their head around. We have great engineers in the Chrome team who do amazing things e.g. the WebVR audio sample demonstrating spatialized audio. I would like to see more tools that handle audio in a more user grokable way, with the best performance outcome. We have the right primitives available, but I find them hard to use.

@@@: MultiView was mentioned as an improvement for WebGL - we have a proposed breakout session for tomorrow. Are there other acceleration extensions we would need for VR?

#360° video on the Web

#Louay Bassbouss, Fraunhofer, 360° video cloud streaming and HTMLVideoElement extensions

https://docs.google.com/presentation/d/1-stapAqJr5ICQcXEC7z-PYnTYY1T3fA3E4lgdBmy9Lc/edit?usp=sharing

[Louay via remote participation, audio+slides]

Slide2. 360-video on HbbTV devices. One potential tech. No capability yet to render 360 video, so render in the cloud software and send to TVs, controlled by remote. Client needs only to be capable of playing livestreams. No API to control buffering.

Slide3. 3 different options. 1: all in the client, advantage=no dependency on network latency. Option2, our Hbb solution. Video buffering is a challenge, and each session needs server session. Scaling challenge. => Option 3a, pre-processing step happens once, then user input. Option 3b gets user-input later.

Use MSE to render 360 video in the browser.

Slide 5. advantages/disadvantages.

[video demo] requesting different segments for different fields of view.

We think native video players, getting URLs, can show 360 video.

Slide9,10. HTML VideoElement Extensions. Shows buffering switch between two views.

#Laszlo Gombos, Samsung, Encode, tag, control 360° video

https://pres.webvr.io/video.html

VR browser, Samsung Internet for Gear VR.

You navigate to web page with video, play inline, and option to switch mode

That’s not a great user experience, so we have an extension for the HTML5 video tag to describe how video is encoded. If browser detects that tag, it can go directly to the right immersive playback

[slide showing tags supported= what we found out there on the web]

This is how we’ve extended the web, let’s discuss.

#David Dorwin, Google, Path to native 360° video support

Slides: https://docs.google.com/presentation/d/1FYbOKq_CyUjoCztLPvzk49oRc9P6zsjGDgyo1aTrXHw/

Video: Path to native spherical video [video]

Currently, non-rectangular media formats are not standardized, including projections.

Current solution: video->webGL->webVR. App determines and applies projection. Libraries in the near-term.

[slide3] Prereq: standardized media metadata.

Google proposal for ISO BMFF and WebM. includes custom projection

Recommended: Developing VR Media Technologies talk at Demuxed 2016

[slide5][slide6] Simplest approach, browser UX for spherical rendering, limited video-only experience

[slide7] More complete, make the DOM spherical.

[slide9] Encrypted media makes it more difficult.

[slide10] Media capabilities

Media Capabilities API on github, in WICG

#Q&A

Kfarr: Another question, there have been a number of creative (yet competing) concepts for mixture of compression and projection optimization to reduce bandwidth. Facebook appears to be the most advanced in this realm so far, at least in terms of publishing their implementations and providing proofs of concept, however they have not open sourced any of this nor are they intended for use in WebVr / browser environments. While some mechanisms are too advanced at this point (such as foveated rendering based on gaze position), there are some that are clearly effective and relatively simple such as alternate projection styles and tile / block compression alternatives that offer significant bandwidth savings already. Instead of Brightcove re-inventing the wheel to make yet another proprietary implementation thereof, can we work together to tackle the “known good” implementations as a standard?
David: we’re stuck with codecs today, experimenting with different projections.

Kfarr: Another question, If the “new” proposed “standard” were to include projection information in the mp4 container itself, there is significant existing content that would not have this tag. How would we deal with this transition period?

David: you can keep adding boxes to MP4
You can inject in JS. Ideally, we can get standardized

Container, you don’t need to re=encode, just repackage

natb@valvesoftware.com: in the interim, Samsung’s attribute proposal is good. Mesh extensibility is important.

Potch: FB’s pyramid approach and the apprach we saw here use MSE and varying on field of view. Is that well specified in MSE today or do we need more?

David: for WebVR, you need sync between what the app thinks is the FOV and the stream. A bug link I skipped over, if you want to change the projection during playback.

Brightcove: How should we put th eprojection into the video tag? What are the literal strings that represent. Underscores versus dashes, how can we converge on one representation?

EricCarlson: One we figure out what the strings are, they need to be added to RFC defining mime-type extensions. That’s not hard, we just need a proposal.

Klaus: precedence between media and tag?

@@: multiview support in VP9 similar to multiview HEVC?

Spherical DOM, what needs to be changed leaving cartesian space?

David: Important for W3C to figure out. New dimensions, CSS?

Klaus: Spherical DOM, per-element?

Potch: Circular DOM, CSS round

Casey: early in VR browser, tried to extend CSS Spec. Positioning and multiple orientations becomes confusing. Maybe there’s another place to do that.

BrianC: example video of Bill Clinton, camera in front of his desk, background was static. Rendering only from video file would make that more complicated. Also, matching of color between video and JPEG complicated

Rob: Camera capture, misleading to users to label 4k, effective viewport. Also are these sizes relevant if we have to scale to powers of 2?

cvan: eye tracking like with the FOVE, is progressive enhancement with http/2 or hinting something being addressed? also, is it a concern that the server could fingerprint with eye tracking?

natb@valvesoftware: prediction is complicated. It would be best to let people experiment. Decoding is complicated, mobile/laptop/desktop different capabilities. We’re ready for high-quality 360 video.

Kieran: where do we go from here? What’s the next step? Some complicated problems, some easy.

Dom: that’s the landscape session at the end of the day. We’ll have a workshop report shortly with standardization next steps.

Devin: propose a video representation breakout this afternoon.

Kieran: we want a standard plugin for videoJS. How do I get gaze position? Interactive video overlays+projection.

David: it’s not so easy to solve short-term

Rob: Gaze position is really WebVR’s pose - seems strange to replicate this at native level.

Wendy: W3C is actively looking for what’s ready for standards, what needs more incubation. We’ll have CG/mailing-list follow-up to keep you engaged .

David: apps can do today, jumping to a native solution may not be right before we understand the full range of the issues.

Justin: projection mesh means a very specific thing. Are people also interested in rendering state? E.g. blending, projections where pixels aren’t directly rectangles we’re transmitting.

David: metadata proposal not just for the web, cameras, etc.

Rob: If we’re adding metadata related to cameras would be good to add camera intrinsics/extrinsics too (ala XDM)

David: custom controls vs app/default controls. Appeal of specifying in the dom is opportunity to overlay

Potch: custom controls different if I’m using video as a texture

A: we should be able to have controls on the second one

@@: WebVR with layer for video?

Justin: closed captioning and other accessibility features.

In Summary:

Media type parameter for 360 videos (@IETF?)
Container metadata for projection with mesh; include rendering state?
Spherical DOM?
MSE adapted to 360° video
Expose Media Capabilities
WebVR layer for video?
Where to include close captioning & accessibility features?
Ad XML (video ad placement)

#Immersive Audio

Moderator, Olivier Théreaux, BBC; Panelists, Hongchan Choi, Google; Mike Assenti, Dolby; Raymond Toy, Google

Olivier: I work at BBC. I'm here today to make sure we're not missing half of the point of VR as Brandon put it yesterday about audio. We have a panel here to look at the status of audio & VR. We'll first get an update from Raymond, a lead developer on the Chrome Team and an editor of the Web Audio API at W3C, and will give us an update on the Web Audio Working Group, including the Web Audio spec which does synthesis and processing. The Working Group will soon recharter, so it is a good time to bring input on the next version of the specification. We'll have short presentations from each panelists, each followed by a short Q&A, but we'll also get more time after presentations.

#Web Audio API Overview

Raymond: This is a brief overview of the Web Audio API. In the beginning, the Web was silent; you needed flash or a plugin to play audio. HTML5 added the <audio> tag, which allowed streaming and some limited functionality such as start, stop, seek. But you couldn't do much beyond playing and it had limited timing precision. 6 years ago, the Web Audio APi was started to address these limitations and extend what you could do in the browser. It gives you an audio graph with sample-accurate timing with low latency. It allows you to create e.g synthesizer in the browser. Web Midi allows you to plug your keyboard into that procesing.

The Web Audio API gives you a way to define precisely when a source starts; in the audio graph, you get to use lots of different processing nodes (e.g. gain, audio effects, filters, spatial effects). Recently we have introduced the Audio Worklet that allows you to use JavaScript to do custom processing.

Where are we going? We have a draft spec that we are working very hard to bring it to Candidate Recommendation, hopefully by the end of the year. We left lots of stuff on the table to be able to move forward. We want to make the API extensible; for VR support, what do we need beyond what the API already provides with the panner? It seems to work pretty well already based on the demo I saw from Brandon, but we want to hear from you to make Web Audio useful for real immersive audio experiences.

Q&A:

Brandon: indeed, the room demo, it works really well. One issue we've found that if you turn your head fast, it creates discontinuity in the sound. What are your thoughts on how to improve this?

Raymond: that was an effect of the original panner node, but with the new API the problem should no longer exist.

Don Brutzman: in X3D, we have some experience in spatial audio from a content definition perspective. An ellipsoidal front/back and linear attenuation dropoff modelwas intentionally simplistic because most players were implementing this in software. Suggestion: consider setting the bar for an initial spatial-audio specification based on current hardware capabilities, and consider defining an even-high-fidelity spatialization for hardware folks to drive towards. Potential win-win.

Raymond: it turns out a lot of the nodes are based on OpenAL, but I'm not sure how to do more than that.

JamesB: question about codec support - Opus is supported in <audio> but not in Web Audio.

Raymond: that's something I'm working on fixing.

#Spatial Audio Renderer

Olivier: next Hongchan Choi also part of the Chrome Team and member of the Web Audio Working Group, and describes himself as a Web Music evangelist. You wanted to talk about your spatial audio renderer, Omnitone

Slides: https://docs.google.com/presentation/d/1Rrpv9kw18eIBUcIlev84dKu94AQ9D4EeGusDEIEbg4Y/edit#slide=id.p

Video: Omnitone: Spatial Audio on the Web [video]

Hongchan Choi: this talk tries to look at what you can today with the Web Audio API for spatial audio. When we looked at this in Chrome, there were 2 patterns: Object-based spatialization - we have a basic system for this in Web Audio with the Panner node

https://toji.github.io/webvr-samples/06-vr-audio.html?polyfill=1

The next proposal is to use ambisonics - a pretty portable audio format that can be played on any speaker setup. We project this into binaural rendering. But this was not part of the Web Audio feature set. We looked at how to make that possible.

One logical idea was to extend the <audio> & <video> element with a magical blackbox - but that seemed too risky.

Another approach was to use a MediaStreamSpatialAudioNode, but this had quirks as well. We'll probably go there at some point, e.g. in v2.

In the meantime, I started looking at doing this using the (now deprecated) ScriptProcessorNode - it worked but with issues with latency and glitches.

But I took a step back, and realized I could do this using native audio nodes - and we got a great review from TechCrunch on the result which works in any Web Audio-enabled browser.

See googlechrome.github.io/omnitone

Some issues we discovered in the process:

We're missing a compressed audio format for FOA/HOA multichannel audio stream; I think Opus solves this
There are alternative spatialization techniques ground on native implementations
Right now we have a separate rendering path for audio and video, which means there are synchronization issues that we need to address.

Olivier: we want to hear from the audience on what is needed for VR (whether the abstraction of the audio native nodes is right for the kind of needs of VR)

Q&A

Casey: are you looking at other spatialization techniques e.g. ray tracing?

Hongchan: not in this project; but it's a different matter for the spec

Chris: is this a worker friendly spec?

Hongchan: right now the Web Audio API is not accessible from a Worker, but we are developing this Web Audio Worklet

Chris: any high level library to wrap the API?

Hongchan: tone.js is pretty popular

Thomas: many 360 video comes with 2D sound - any way to make it sound better nevertheless?

Hongchan: I haven't thought about that

Philip: you mentioned orientation, position; I would have also expected velocity, also the room environment converted as a filter.

Hongchan: I was talking about the other approach - what you're talking is object-based parameterized spatialization. Ambisonics is not for that

Potch: any support for Doppler effect?

Hongchan: we had to remove it due to issues

Jason: I've done professional projects with Web Audio - great stuff. Being able to set the drop off rate would be useful.

#Object-based Audio for VR

Olivier: Mike Assenti from Dolby Labs will be looking at a different angle, on how to use object based audio for linear based VR experiences

Slides: Immersive Audio for VR

Video: Immersive Audio for VR [video]

Mike: I've joined the Web Audio WG recently, but been at Dolby for the past 8 years. I'll offer some perspective on audio content creation on audio for linear based VR experiences.

At Dobly, we distinguish interactive vs linear VR experiences. Linear is a storytelling or experential (e.g. live sport event).

A soundfield is a description of sound at a particular point in space, whereas an object model-based representations describes semantically which sound (represented as a channel) comes from where.

There are dedicated hardware for sound field capture, but for object-based require a more onerous but more flexible artistic mix.

[illustrating with the diner scene of Pulp Fiction]

“Reality is not really what we want with virtual reality, but an artistic experience.”

Live events - one of my favorite use case, you also need to mix the audio to make it pleasant. You would capture this with different mic points. Doing this live is pretty complex (which companies including Dolby are trying to help address), but it gives you more artistic opportunities.

In a live music show, again, you want a curated mix, not a soundfield capture.

For post produced VR, you get all these sources that you turn into channels with metadata describing the objects (at least the position), which can then be rendered into the various speakers setups. This can be exported in audio formats that support spatialization - one possible export is to render this as a simple soundfield (although then again you lost flexibility).

We talked about the difference between reality & VR. Rendering distance is not just a matter of attenuation - reverb also plays a role. You need metadata to determine e.g. if the object is supposed to be far or near.

Professional mixers don’t want us to do the attenuation for them, but to get control of the primitives.

You can also do head-track vs non head-track audio (e.g. a commentator should be fixed to the user). You can also increase the gain based on the gaze direction.

In a live event, you might want to do automated panning based on the position of the artis on scene.

You could also mix VR in VR.

Speaker-rendering is also interesting to avoid the isolation aspect of VR.

What about the Web? We need 2 components to play this back: decode the content and their time-synchronized metadata, pass it to a renderer in which we pass the orientation & position of the user, with playback configuration (e.g. binaural).

The options to do this in Web Audio: we pass the bitstream to the native decoder tightly bound to the renderer, to which you pass orientation & position : but that latter part you can't do.

Another option is to the rendering in the Web Audio API (e.g. a series of panner nodes with reverb) - it gives more flexibility, but it's heavy and might get consistencies.

VR audio is more than head-track spatialization over headphones. Audio production is also art which need to be combined with ambisonics for great VR experiences.

Web Audio v1 brings us very close to linear based VR experiences, but not quite to the end - this is a feature request for v2!

Q&A

Philip: you talked about the linear stuff; what about the interactive stuff?

Mike: from a rendering standpoint, there is a lot of similarities. But at Dolby, we look more at linear content in general. But if we get it right for linear, it should apply similarly for interactive.

Don: a useful direction for future work would be for the 3D & audio to inform each other in VR experiences. For example, high-fidelity audio rendering can likely be accomplished for this room on the order of 100 polygons. Adding audio properties for reflection/absorption etc. is also analogous to visual materials/appearance. SIGGRAPH conference is a good resource for such work, e.g. RESound: Interactive Sound Rendering for Dynamic Virtual Environmentsby Manocha et al., UNC, 2009.

Shannon: My dream is to have a way to adding audio-reflective properties to the scene (e.g. to distinguish wood from carpet), and have the browser coalesce this.

Potch: from Slack: what does it mean to duck a sound field?

Hongchan: we have a compressornode that would enable ducking with side-chaining.

Potch: from Slack, have their been experiences in configuring the surrounding audio environment (e.g. getting more or less sound from your friends).

MIke: it's very much an artistic choice; it's still very early days in that regard.

Potch: is there an audio analog to photogrammetry?

Mike: there was a great paper on capturing multiple sound fields@@@

@@@ NFB: from an A11Y, how do we describe that audio?

Issues / summary:

Compressed audio format for FOA/HOA multichannel audio stream
Alternative spatialisation techniques and native implementation
Tight synchronisation between audio and video frames (between all kinds of streams, audio/video, video/video)
worker-friendliness and audioworklets
Object-based audio pipeline - need to pass time-sync'd metadata ; sending orientation/position down to the decoder
Capability reporting for rendering (important for graceful degradation)
Speech synthesis
Speech recognition
Describing spatial audio for A11Y

#Breakouts

#Making a HMD WebVR ready

#Link traversal, are we there yet?

Proposer(s): Fabien Benetou

Summary: Identify what is missing to enable the Metaverse

Type of session: Open discussion (with #linktraversalchannel on Slack, follow up on cvan/casey’s presentation, see also #spoofing_phishingregarding security)

Goals: Check existing implementations (cf Andy Martin’s VRambling browser A-Frame tests cvan’s webvr-link-traversal), compared with current specs (navigation reason, transitions cf Casey/Josh’s exploration), define limitations with new needs (deep linking equivalents,avatar persistence across experiences, avoiding orientation rest, preview, pre-loading, 360 as waiting previews, security, current WebVR diameter… of 3 links)

What is the minimal set of features that HAVEto be in the browser in order to be able to navigate from one VR experience to another VR experience hosted by potentially different content providers without leaving VR?

Proposed priorities with time frame
Link traversal itself while staying in VR Current state of the browsers Navigation reason Transitions Coherence of style from the origin to the target Preloading with performance impact Previews Persistence across experience Avatar Name/ID/… Other (currency, points, ...) Reprojection Roadblocks Business case (walled gardens) Security e.g. spoofing	Tomorrow? No major today Since WebVR 1.1?

Link traversal, are we there yet?

Presence Firefox, Carmel, Chromium, Samsung, Edge

What is out of scope

Transition

Visual parts

Not UX

Room to talk about UX, could be done in the declarative session

What is missing from making it rock navigation solid

What does give the browser enough information to make the transition

Analogy with single page app

Spoofing giving importance to transition and have to ensure trust

Current status of the specs

requestPresent restricted requiring user gesture

Alt VRDisplayActivate using proximity sensor equivalent to user gesture

Future alt VRDisplayNavigate from a VR context to another

Close to OnLoad event? Treated as user gesture? (has to be called in context of that callback)

Requires delay so requires splash screen or some kind of visual transition

Carmel specific, VR only player, different restrictions

Usage of layers as preview/loading content

Difficulty of predictability for content fetching, loading, rendering

requestPresent/meta tag vs. HTTP-equiv vs. misbehaving pages

Mobile as model (meta viewport)

Viewmode eq for VR

Heuristic on timing

Problem of misbehaving page e.g. existing <meta> tag but no VR content provided giving 10 sec. of white

Alt providing anything

e.g., 1 frame with timewarp

Forces page to take ownership

Carmel

Favicon.gtlf as rotating

ServiceWorker as mean to delegate before content

Heavy weight, other process, expensive

Esp. when consider pages that are expected to load subsecond

Meta tag still as comment, not in spec

Web App Manifest format

`href` properties/favicon loading/… should be stored there?

Solved for 2nd+ navigations but not 1st time

Security equivalent of fishing prevention with daily blacklist

Cost of gaze for preloading… too costly

Reminder of Josh’s transition

Representation of omnibar for what is loading w/ progress bar as trusted UI

fades-in/out at propriete time

Spoofable

Remarks on security for Chromium for non https with its persistent overlay

Impossible to overlay

Handled by the composer (treating the WebGL generate)

Ignoring the red padblock equivalent?

Job simulator metaphor, not having to pop back up to the selection level part

Proposing Site A - Site B on same domain, some origin policy, etc.

Is the omnibar equivalent of popping backup?

Content NOT provided by the author as distracting

Link traversal is changing content, not just single page app equivalent

Question on `<iframe>` equivalent

Possibility to jump back to a safe place

Minimizing/docking/closing abilities… blur/focus events

Fallback to 2D

404 pages, etc.

In-app to browser navigation

Summary

<meta> tag http-equiv
Event
VRDisplayNavigate (for single-device scenario)
Transition later on with browser-specific backup at first at least

------

Any visual of the event timeframe?

There are many ways to render VR.
Could be some future CSS VR…

Think about WebGl assumption

Is a transition between pages a full reload?

It does not have to be. Example of transitions in JanusVR.

Missing concept of seamless portal, `<iframe>`s.

#Halving the draw calls with WEBGL_multiview

Proposer(s): Olli Etuaho

Summary: Accelerating stereo rendering with a WebGL multiview extension

Type of session: Open discussion

Goals: Resolve open questions about what a WebGL version of OVR_multiview should look like and how to display a framebuffer generated using such an extension in an efficient way in WebVR.

Slides https://drive.google.com/file/d/0B8RMuOp5lAYSYzdRNnBMcU9KUzA/view?usp=sharing

Strawman proposal: https://github.com/Oletus/WebGL/tree/multiview-strawman (Prettier version)

Notes

Proposed WebGL extension based on OVRmultiview extension. Draw calls in js and both views get rendered.
Cost of rendering in half? Yes most of the browser rendering will also get reduced half
What sort of reduction in processing? 40%? Hard to deduce.
Directed to different layers of the texture
Open questions? Slide 4
Couple of of extensions on the native.
Multiview and mv2
Restriction.
MV 2 no restriction
Q: Should we have such restriction? Or no restriction?
If you have more restriction: more possibilities of extensions. This will gain GPU performance. Less res : flexible for Apps but apps won’t get the perf benefits?
How many mobile platforms have MV extensions? Any GearVR has OVR_MV. Any phone that is intended for VR should have them in the future.
What are rester: Only change GL position on ViewID.
What’s significant difference between NVIDIA?
One draw calls submits 2 geometries.
Potentially? Can be done in Geometry shader? GS can be varying quality across platforms
In Chrome: can save calls in JS and calls sent over command buffer architecture but in the GPU process we can split it to universally support the extension. GL driver may be doing the same.
One proposal: Multiple levels of restrictions
Q: Holographic API: Are you given a buffer and both render into it. Now It is 2 like texture array. DirectX concept of stereo.
Few ways this can be handled:

Option A: WebGL multi view so render side by side into single texture. So no change in existing specs. Means no native extensions or it could be a bit tricky

MultiView will only beneficial for WebGL2 due to texture array, We can back port to WebGL1 though if the side-by-side option is chosen.

Option B: default FB that would be layered.
Option C: Stereo canvas element itself. Layered texture implementation. Individual left and right. Fairly big changes to the canvas. Can be used for DOM inside the VR. Canvas elements in VR. Quad buffer stereo.

Q: What happens when you do readpixels? A: Just as today, the read framebuffer will be just one layer of the texture.
Scenario outside of WebVR? Yes for example rendering a cube map more efficiently. NV extension is explicitly 2 views. Either way should not tie the multiview extension just to WebVR, but make it possible to use it more widely.
Looking ahead: Timewarp/Spacewarp in WebVR. We don’t explicitly do it. Good reason we want to disable. Jittering shadow. Interesting to raise it up to the WebVR level where the application. Maybe good to give option to explicitly turn off.
Takeaways:

Need more discussion in general right mechanism into WebVR (efficiently)
Probably start prototyping. Start with most restrictive. Proof of concept. Could be side by side so browser is emulating.

Conclusions: Need to be careful to specify the extension in a way that’s compatible with all the different display pipelines exposed by HMDs and inside browsers - should not trade draw call overhead to memory bandwidth overhead (extra copies). Extension does have uses outside also of VR, so need to consider this when specifying it. WebGL 1 compatibility might be possible if the extension is specified in a certain way, but this might limit performance so it could be made WebGL 2 only. Prototyping is required to determine performance impact, and should inform further work on the specs.

#Declarative 3D

Proposer(s): Tony Parisi

Summary: Should we contemplate a 3D equivalent to the DOM? Aka a Scene Graph? Not low-level content like glTF but something more like A-frame or React, with a full object model with properties, baked into the browser, presumably faster, and interoperable between browsers.

Type of session: Discussion

Goals: To gauge interest in this as a set of built-in features to optimize performance and provide a reliable baseline of features across browsers… non-goal is to generate a proposal, though if people show up with some we can discuss.

Attendees: approx 25

#Scene Object Model Discussion

Overall theme - browser as 3D presentation engine (vs 2D presentation engine) and application platform (vs. Declarative breakout, where the goal is to enable the creation of portable/archivable content… ?)

Resources: https://www.w3.org/community/declarative3d/

Some possible requirements

Declarative files
1. New markup syntax a la A-frame, React, X3D, GLAM, XML3D
2. Loading glTF, OBJ and other 3D formats natively - this may be a separate APIs from the scene object model (Model object, akin to current Image DOM object)
3. We will cover this topic in detail in the second breakout session
Runtime object model
1. Properties and attributes
Event model
1. Picks, collisions, visibility, model load callbacks
Host object set
1. Usual suspects like Mesh and Group
2. Supporting types like Matrix Vector3 etc…
3. There are some existing DOM objects in other specs, like Rect
Viewing/navigation/interaction models
1. Built in viewing and navigation e.g. inspect, walkthrough
2. Body position, sitting standing, hands stuff
3. Built in controls - the scrollbars and hand controllers
Presentation model
1. How does this relate to existing page compositing model?
Rendering model
1. Built-in materials specification?
2. Shading and programmability
3. Backgrounds and layers (including passthrough for AR)
Styling - “3D CSS”
1. Separation of concerns - ability to define visual styles flexibly
2. CSS also defines layout as well as colors, and layout engine currently causes all kinds of perf issues
Media integration - sound and video
Legacy web pages
1. Web pages on a texture? Supports user interaction?
Responsive design
1. This is probably all in <meta> tags, media queries
Physics
Animation
User preferences e.g. visibility, reduced motion and other browser features
Text
Personal information and other stuff that goes between WebVR sessions… e.g. showing controllers
Real-time multi-user updates to object model? Out of scope?
Link traversal animations and other transitions… -talk to link traversal group
Extensibility

What are the benefits of doing this native in a browser vs. at the JavaScript level?

Optimizations per-platform - more are available if it’s implemented in C++/native? What are the special cases that we want to enable
Piece-by-piece we can look at optimization - is that an option? Vs. a whole new declarative API e.g. optimize matrices
Rendering- and API independent - New architectures like Vulkan and Metal and non-GL rendering pipelines
Ability to embed pieces of DOM into the 3D context.
Rendering quality control
Consistency of interaction (picking, “cursor shapes” a la Boris’ ray picker), navigation (e.g. link traversal)
Standardized feature set
Security - payments, password entry etc. moves a lot of responsibility to browser which we think is better

Content follow up

Will this open things up to content developers? Counter argument: who cares, JS or declarative? And most of the quality 3D will be done in tools.
Nola: even if yes tools, the content will be more efficient. Look at Unity
Jason: people move down the stack as they get more comfortable, typically starting at the high level and moving down as they feel mastery
Scott Singer: harder to write simple tools that deliver low level stuff than higher level stuff

#Survey Questions

Q: We want to build a 3D runtime into the browser (at some level). Is this a

Good idea? 85%
Bad idea? 15%

Q: What do you think the biggest benefit is to building a 3D runtime into the browser? Pick one only:

Performance: 15%
Ease of authoring: 50%
Consistency of implementation and user experience: 35%

Diego: extensible web manifesto. Standardize after identifying common patterns. Ada’s flipside: if developers are continually bending over backwards to do something maybe that’s a good candidate to standardize…

Tony says let’s do a Survey Monkey

James B would like to combine components from different systems like A-frame and his own engine

Shannon : if it’s built in, then it’s a standard. Not a random library.

Ada: even a small library is still a library.

#Declarative 3D Discussion

Don Brutzman presentation on VR and X3D, the Extensible 3D Graphics international standard.

X3D originated and extended Virtual Reality Modeling Language (VRML97) standard. We have been at this VR on the Web challenge for a while!
Essentially “3D publishing for Web” with numerous players, codebases, converters etc.
Device neutrality for content, rendering + navigation + user interaction in single scene, composability, aligning with Web architecture, multiple “lessons learned” of group value, forward-compatible evolution of capabilities through extensibility, and even (X3D+scripts+CSS)-inside-HTML offered as exemplars for how things can work.
The Open Web Platform (OWP) already goes a long way - can VR become an actionable, bidirectional part of that ecosystem? Web3D community certainly thinks so. As snapshots & videos show, X3D implementations continue to demonstrate that, playing interactive content with HTML (and also in other devices - CAVES, phones, etc.)
Putting on a head-mounted display (HMD) is an act of great personal trust… Data-centered security can go beyond custom device-server pairings and make VR across the Web trustable.
Web3d participants have been working on some of the key technical bottlenecks in W3C. Authors can compress and gain data performance (thus reducing power consumption) using Efficient XML Interchange (EXI). EXI has recently been extended for JSON and CSS, also adding EXI Canonicalization (C14N). Authors can digitally sign (authenticate) and encrypt (preserve confidentiality) as well using W3C’s XML Security recommendations. p.s. W3C EXI group participants think this same tech combination can be used similarly in Internet of Things (IoT). (Gee what’s that “thing“ on your head?)
Playing well with others in the hardware direction is also important to Web3D participants. Geometric compression and progressive-mesh streaming by Shape Resource Container (SRC) is aligning at a low level with glTF, standardization effort expected in early 2017.
X3D version 4 efforts are aligning X3D closely with HTML5 evolution. Subsequent X3D version 4.1 expect to align with ISO-draft Mixed Augmented Reality (MAR) Reference Model. Further support for VR Web from this group is expected to be a natural area of activity for us… we want to define, play and interact with 3D content within VR.
Web3D Consortium has been a W3C Member for many years, with many positive benefits continuing to accrue. We’re keen to continue contributing in this great effort!

Interesting “declarative” discussion continued, building on prior session for Scene Object Model.

A-Frame, X3D, GLAM vs ReactVR

Scott S - I can build A-Frame scenes out of Houdini by spitting out some python.

#Survey Questions

Q: Which tags approach do you favor? Pick one:

A-Frame (Pure Declarative) 60%
ReactVR (Mixed imperative/declarative) 40%

Q: Would we prefer that something like A-Frame or a similar tag set eventually be built into a browser or remain in a library?

Built-in: 100%
Remain in a library: 0%

Don votes for all of the above - but it’s all about Web data, this is really a publishing medium. Some candidate “use cases” that can help to pull us through this design space:

Archival recording of a VR session should play both ways. Can someone play back yesterday’s VR session as an observer? Can they then choose to be a protagonist? Will it work next month, next year, on newer gear, etc. etc.? This means declarative.
Accessibility metadata is another major opportunity for declarative VR. Multimodal experience (visual aural haptic etc.) for the same place/experience? User preferences or accessibility preferences become very real to the individual… and leverage all the other W3C web recommendations/capabilities to best effect.

Tony says he thinks that Don’s statement implies the declarative side because it’s hard to do archival when you’re API driven.

Path to standardization - React data binding doesn’t provide that.

Another challenge associated with the programmatic approach is that a program can only do what it does, whereas content can be reused and open to reinterpretation.

Postscript and RIB will still render, because these are descriptions rather than code.

Shannon - Web Devs need HTML constructs to author stuff, that’s why we need onClick()... shouldn’t have to think about the programming. Prefers to be built-in, and it’s standardized so you know how it behaves. He doesn’t like jQuery and all these include libraries, React etc. depending on third party.

Very interesting session(s)... Scene object model “versus” declarative is more yin/yang that a fork in the road - they are two sides of same coin. Benefits accrue from stability.

Participants

Charles LaPierre
Casey Yee
Dana Dansereau
Wendy Seltzer

Need to know about the DOM A11Y

ARIA

Different media types Audio / Video.

Web VTT subtitles for the web be useful here?

Lot of metadata to about an object, weight, center of gravity, must be attached to this content.

Is saying that a waterball enough?

ARIA roles semantic description is it the dev platform need to define this? Vs. hosted solution t

What are the best practices, or the UA

There is a whole accessibility DOM is a complete representation of the traditional DOM

If we are talking about Nodes into the page would this scale, html import to supplement it

Not sure how to add the ARIA but how would that scale.

Firefox addons localize 110 lang. Proposed RTL stylesheet

Layout and semantic are separate

Configuring text on the VR to vertical, RtoL or LtoR

SnapChat. UI is gesture base,

iPhone Accessibility with VoiceOver changes the gestures

Click on a link you assume that mouse cursor click on a link but access to teat link is different

Color blindness shift the hue of the display

One eye how do you deal with stereo

Hard of hearing and freq. Shift.

Section 508, HIPA, are there things that will need to be compliant in VR.

Content ratings providing or permitting access to certain age groups.

Oculus and Vibe (comfort ratings

AFrame - add text description to their object models.

Objective description of what you are seeing, machine access, screen readers, TTS and search.

Loose that with webGL, its pivotable with helth of web.

Material design, consistency, sounds like user patterns boilerplates should all be a11y out of the box

Don’t worry about performance right away,just include the data and we can figure out performance optimization later

How to propose to analytics or measure user engagement, how do I know if a Blind user can use my system.

Add standardized way to get feedback from user and they can tell you that the site is inaccessible.

Privacy issues are a concern here (randomization of collection)

If you have this data it can be used to detail semantic and description. Delegating that to a trust

Being able to aggregate, the metadata that describes that gets passed back to the system

CSS media queries, you get pix aspect ratio, mouse or touch display.

Some can be rendered on the client and not sent back to the server.

What this difference between blind access vs. person who just wants to hear the site, or their Video display is broken.

Web has Skip to main content accessibility link which skips any header boiler plate.

Controls should shift in VR if a person has limited mobility

Just like in Charles’s talk when the computer had to be raised for his presentation that VR needs to do the same thing.

Ability to change your height by x %

Spacial sound for a person with hard of hearing in one ear you turn your head with your good ear towards the sound source

Octave difference for left/right audio

Bone-conduction headphones

Html-tags ARIA, alt tags, standards defined, you could come up with a 3rd party.

DSS offices do this in schools.

High contrast version in VR, or browsers job, or is it a modification of that tool.

Developers job that it is tagged appropriate so that it could be changed.

Buffer and could modify buffer before it goes to screen, with a11y hook for audio/video in order to modify it.

Its standards body to there would be nothing in the way of stopping and the information is there it

Text on webGL a machine reader could pull that out that takes it realtime

It could be but OCR realtime … standards body must say this is a problem and there should be a way to describe the space or object.

Sendero GPS - lookaround mode for POIs that you are passing

Describe the env. As you are walking down the street describing cool features around you as y9ou walk down the street

Described Video. All video in television in Canada and how to describe.

http://www.crtc.gc.ca/eng/info_sht/b322.htm

TV Access for People with Visual Impairments: Described Video and Audio Description

Persons with visual impairments can use described video or audio description to access the visual portion of TV programs. Learn about the Accessible Channel.

How to describe location movement.

Poet described Images from http://diagramcenter.org/

Described VR.

Dynamic select and provide it when needed

Kind of information that needs to describe

A11Y features in browsers

3d party services to solve Captchas

Uber for web pages.

#High-Performance Processing on the Web

#Depth Sensing on the web

The Preliminary Use Cases (pre-spec)
https://www.w3.org/wiki/Media_Capture_Depth_Stream_Extension#Use_Cases

Spec
https://github.com/w3c/mediacapture-depth

XDM Spec (for camera intrinsics/extrinsics)
https://software.intel.com/sites/default/files/managed/b0/bf/ExtensibleDeviceMetadataXDM.pdf

Experimental Scene Perception Web API demo: https://youtu.be/pXyDiYJO0nA

GetUserMedia exposes raw depth capability to the browser. Depth camera screen.

WebGL 2.0 r16 integer to upload depth map into shader.

Intent to implement already approved. No widely adopted camera api in kernel. Need to convert int16 pipeline.

Unsure of status in Edge - there is a team working on this. Perhaps ask Frank or Raphael.

USB based cameras such as kinect. Hololens would be a good example of a device. Also oculus is interested in inside out processing.

Rob: Range of use cases

Raw data vs surface meshes - can be implemented, but demand in the system is high - burns a lot of battery.

Different devices are expected to have their own capabilities - expect that quality of “baked data” will vary from device to device.

Security: What kind of restriction should we consider for ensuring privacy and security.
Permission fatigue (pose data plus sensor data plus room memory)
Likely to use full screen api (or requestPresent)
Does it make sense to chunk components (permissions) into a collective group? Are eternal sensors different from internal?
What is the user’s expectation of permission models? Can we provide multiple levels of permissions wrt AR experience?

Other asks

Relative location of sensors? Xdm exposes an adobe-xmp model that reveals relative locations.

Field of view, focal length, near/far.

Stream sync: depth stream and video stream need to be synchronized.

Also need to sync with IMU/HMD?
Different devices have different framerates.
Consider providing a synced framerate between devices.

The Chromium CL for the depth stream: https://codereview.chromium.org/2121043002/
On top of the raw data, what do we want?

Buffer array of raw geometry mesh, linked to pose information
Possibly something that can be updated to webgl shader
Movement and points of interest.
Simultaneous location of mapping (SLAM) -
Coordinate systems? What do we use to map them from frame to frame?

Next steps?

Combined interest group with other initiatives (webGL+ GetUserMedia + webVR + permissions)
Interested? Talk to girard@google.com nell.waliczek@microsoft.com leweaver@microsoft.com simon.gunkel@tno.nl ningxin.hu@intel.com rob.manson@awe.media

#Standardization Landscape

Dom: We’ll start our last session now. A couple announcements; a phone was found.

Another thing, we should do a group photo of the very first W3C Web VR Workshop.

We’ll go outside and take the photo as a memory of the event.

Neil Trevitt: thank you W3C for organizing this. In July 2008 I met Bill @ at SIGGRAPH who said we should do a binding to Canvas and have eye glasses everywhere. We took it into Khronos and now we’re here eight years later. Talking about standards landscape. Lots of interaction between what we are doing and what W3C is doing. Some of other standards orgs we come in contact with.

For those of you who don’t know the background, we share the same ideals and processes as W3C; more common than different. We’re committed to open standards and royalty free. We are focused on silicon software part of things. We have every GPU vendor. Our primary activity is to figure out how to expose acceleration for graphics, computation and vision process with APIs and enable software to use them in real applications. We have about 120 companies.

Members keep joining each other [laughs]. How Khronos relates to AR and VR. We do file formats. Collada is an API. We talked about GLtF yesterday. Will save us all work and make applications more efficient and avoid siloed content.

WebGL is all on GitHub. GLTF Work is like a W3C Interest Group. We also have Working Group where people have signed up for RF. You can join and have a seat at the table. Core business is APIs for acceleration.

Timeline for where WebGL came from. Open GL in 2008 was ubiquitous. Open GL was available on all the desktops. OpenGL2 was becoming ubiquitous on mobile. So becoming available anywhere Web browser was running. Test message

2011 WebGL1.0 launched. Four years not so bad. WebGL2 is based on ES3; four year pipeline or heartbeat.

New stack we have been talking about. Vulcan is the new.

Why is Khronos in WebGL? So closely linked to native API, silicon roadmaps. Portability; need GPU guys around the table.

What other issues do we have? Lots of the silicon guys - my day job is video - lots of VR capability, asynchronous context. Probably cannot expose all of them. Spoke to stereo first step. Do we want to expose more VR capability to expose more functionality at JS level.

Cameras, in 21st century we are still running processing on the CPU. If you can get power processing...ultimately orders of magnitude more. Need to get vision processing off the CPU.

Vision acceleration API. May not lift up to JS. Perhaps people implementing trakcing might use native APIs like Open VX. PokemonGo is awesome but dispiriting to see that the power users must turn off the AR, because it runs better on the battery. Silicon has failed to enable applications for AR. Apache is available in the phones.

One potential effort is to use GPGPU in GL12. We’re talking more general and higher level language and access to language not just for vision but also audio and neural nets. We’ll want to accelerate that quickly.

We have WebCL; lift into JS; still kernels in C. Learn it’s good when a standard takes off and gets adopted, be thankful. Web CL was one thing we tried that did not work. Perhaps a JS extension is the way to go.

Good that the breakouts covered so much of this. The cameras are about to become really diverse in number; mobile phone has four. The wide angles, depth, stereo are coming. Just controlling the camera is going to be a point of fragmentation at JS and developer level. We have this nascent group called OpenKCam, intended to be an abstraction for how to control the sensor. This has not taken off yet; may not be needed. Maybe a cul de sac, or too early. Take a look at OpenKCam. Should we re-invigorate this group? Get lift into JS domain.

This is a cheaky slide: VulkanVR: is there a need? We’re in a weird situation with VR SDKs. All similar but also different. Have apps running differently. Native rendering API things will consolidate quickly around Vulkan. @ has adopted Vulkan. UT, Epic, are porting onto Vulkan.

Seems the differfences between these environments are not sustaining competitive advantage. Would it be good for that community building on all these SDKs not to have all that friction. Not do it all differently. If the native community gets its act together, that would make things easier to have more consistency at native level. Interested in feedback.

Slide with big chart of SDO landscape

MPEG guys, video, audio, images. They have done work on 3D compression. They do have a declarative AR framework called ARAF.

Then OGC deals with anything around geospatial. Tony, get Mark to look and they could help.

Here are my suggestions. We should meet like this much more often. Discover problems that we can solve with standards. Some don’t need, but some do.

Figure out which SDO has closest domain expertise. Make sure they stay on track. Ensure community has a channel to feed requirements into the SFO. Continue regular meetings.

[Neil ends; applause]

Dom: I thought I would present what could be useful to standardize soon, longer term or never in W3C.

First is taxonomy. Most well known standards work is in Working Groups with formal IPR. We also have W3C Community Groups. WG only members; CG open to anyone. VR has been doing work in CG for now, has limited IPR commitment. We help to facilitate incubation. We also have Interest Group. Has more limited IPR.

Present what I heard, which is up for discussion. What should come up soon, later or not at all. And summarize what I heard over the past two days.

Existing relevant standards at W3C (slide 1)

Spatialized audio in Web audio WG

Gamepad API, Web Working in Web Platform WG

Media Streaming handling in HTML Media Extension WG

Low-latency data and AV transfer, identity hook in WebRTC

Existing relevant standards (slide 2)

Color space management in CSS WG

Performance metrics in Web Pef WG

UI Security in Web App Security WG

Payments in Web Payments WG

New standardization efforts soon? [slide6]

WebVR is the elephant in the room. Gamepad API extensions. Some proposals

Notion of having a VR mode in CSS Media Query. Not sure if it’s close or interesting. Broader conversation. Maybe something easy.

Notion of API that Samsung presented, putting context around content.

Has been a lot of work around Speech Recognition API in a Community Group

Interest in Web timing CG for Media synchronization; check out

Web Assembly is a CG at W3C. Still a lot of churn on this. VR might be another motivation to get done sooner v later.

Mention of proposal in a CG on Media Capabilities. May be too early and need more incubation.

[slide7]Longer term standardization targets?

Lot of discussion around declaratives around mark-up. An object model. Does that impact @

360 media streaming we heard some approaches; open issues there worth looking into

Navigation Transitions; notion of metadata

Discussion around having a unified user input model for VR the way pointer events simplified touch. Maybe get something similar here

Gesture recognition framework may be further away; some cultural aspects; explore some primitives

Out of my league for Web fonts for 3D context

Several conversations touched on scheduling what happens very precisely. We have some mechanisms. Had some suggestions on how to inspire @

For accessibility can we imagine applying ARIA.

Annotating VR entities; maybe touch metadata

Identity and Avatar management is at the frontier between maybe and not at all

[Slide 8]

Help?

Things where I am unclear

DOM to WebGL?

What would need to standardized for 2D-Web browsing in VR

We want innovation in this space but there is a cost; not have multiple viewports. Needs some coordination

UI patterns. We don’t want to standardize, but will need to be some kind of agreement on how you interact with this space

Social discussion around binding of real and literal worlds in terms of authenticity, geographic control. That is not necessarily ready for standardization, but worth watching

My proposal is that we extend by a little bit. You give us feedback on the priorities, what are the gaps. Idea is we use this white board to more easily brainstorm.

Rob: one thing missing is the permissions, especially in mobile. There needs to be a broader forum for discussion. Fusion brings APIs together. How to integrate as a whole.

Dom: Any other feedback on standardization landscape? Another interesting topic is there is somewhat clear separation between what SDOs do. Are there specific overlaps to pay attention to or carefully avoid. That would be useful.

Did we call out security explicitly?

Dom: Wendy, do you want to talk to that?

Wendy Seltzer: Thank you. Security is one of the areas W3C considers among our horizontal review areas along with accessibility, internationalization. We look at all the specs from this POV. Interesting to hear ways VR and Security intersect. About authenticity and integrity of environments. How do we give them indicators. How to deal with social interactions and recogniztions. Great area to keep looking at.

Feature of device detection capability.

Dom: there was a media capabilities proposal. Something broader like head sets?

Olivier: Mention of media capabilitiy. There are broader questions, is this a 3D environment ready device I am using. Seems to be a need for that.

Dom: Capabilities. Any other input. What do you folks think we need to standardize tomorrow or should have standardized yesterday?

David: WebGL is blocking certain things you might prototype. Do it in WebVR

Dom: any ongoing work?

Tony: I saw a Chrome prototype from four years ago. That’s been stuck in mud for years.

Brandon: this comes up a lot. We all need it but have higher priorities to work on like WebGL 2. Hope to give more attention to. Would be a nice to have. Was not so much of a pressing need.

Tony: That TripAdvisory demo I did. Do a rollover and you get a divet. I had to code every pop up twice. I just wanted an element …

Brandon: I have made my WebGL voice known and said we need to do that now.

Dom: Any other input?

Anssi: Either people are tired or your proposals are near perfect

Tony: I’d like to talk about process

How to best participate; tooling, etc. We can use W3C infrastructure. What would you prefer?

How many people use Slack? Pretty many. I guess Slack, combined with Neil’s idea that we do this again sooner v later.

Dom: should we do another event like this a year from now? [hands go up]

Olivier: This felt like a weird workshop. Usually people come with ‘I have a spec here’ and kind of seeing where it sticks. This workshop a lot of topics but not a lot of new specs. So it sounds like six months time would be good where people come with these specs.

Wendy, W3C: I’m thinking about our strategy. We work well with proposals for concrete work. And we have places to help incubate things not yet at that phase. When we start a WG we draft a charter and get patent commitments. That is a great way to work on something that is ready to be specified and where we can describe to your lawyers what we are going to do. When we are not ready to do that, we have CG for incubation. We have discussions and repositories. And help discussions broader than W3C membership and look for when they are ready to become more solid standards. We’re seeing there is lots of other activity and we aim to be good colleagues with other SDOs, where we can be helpful to others. I am excited by the energy and cooperation that we see here.

Shannon: talk about follow-up. February 27 or 28 there will be a WebGL meetup on VR. I have not yet posted the event. We’ll have someone from W3C give us an update.

David: Seconding incubation and use in WIGC and GitHub. Start with an explainer and not go directly to IDL.

Phillipp: There is high performance graphics conference covering issues that we are doing here. Consider contributing to this. Will be be track on compiler technology. Call for papers will be in the next month or so, deadline in April. We invite not only academic but also industry. It’s a great venue for a lot of the stuff we have talked about here.

Dom: Any last comments before we wrap up?

Anssi: Progressive enhancement or mainstream to look at? How do you apply this world?

You have a large body of web content that is not VR.

Dom: Josh’s demo yesterday was a good direction. It think it’s time to stop. We have exhausted our brains and our energies. Thank everyone for coming. Thank the program committee, chairs, sponsors. Thank you Samsung for hosting. It would not happen without them.

The next concrete steps will be compiled in the report. Will be up on the website tonight or tomorrow. Let’s keep in touch by email and Slack. My email is dom@w3.org. Reply to me to get back in touch. Opportunity for another workshop or event in some time.

[applause]

Time for photo!

DRAFT Minutes of the W3C Workshop on Web & Virtual Reality

#Sean White, keynote.

#WebVR Intro

#implementation status, obstacles, future plans

#Browser updates

#VR user interactions in browsers

#Designing the browser (Josh Carpenter, Google)

#Link traversal in WebVR (Chris Van Wiemeersch, Casey Yee, Mozilla)

#Hand tracking and gesture for WebVR (Ningxin Hu, Intel)

#Coordinate Systems, Spatial Perception in Windows Holographic (Nell Waliczek, Microsoft)

#Ray Input: default WebVR interaction (Boris Smus, Google)

#Samsung VR Browser for GearVR learnings (Laszlo Gombos, Samsung)

#Group Discussion / Joint Summary

#Accessibility of VR experiences

#Accessibility Not An After Thought (Charles LaPierre, Benetech)

#Browser UX in VR (Justin Rogers, Oculus VR)

#Mixed reality (Rob Manson, awe.media)

#Discussion

#Multi-user VR experiences

#Internet-scale shared VR (Philip Rosedale, High Fidelity)

#From clickable pages to walkable spaces (Luc Courchesne, Society for Arts and Technology)

# Multimedia & multi-user VR (Simon Gunkel, TNO)

#Mixed reality service (Tony Parisi, Wevr, on behalf of Mark Pesce)

#Copresence in WebVR (Boris Smus, Google)

#Discussion:

#Authoring VR experiences on the Web

#HTML & CSS, (Josh Carpenter, Google)

#WebVR with Three.js (Ricardo Cabello, Three.js)

#A-Frame (Kevin Ngo, Mozilla)

#React VR (Amber Roy, Oculus VR)

#XML3D (Philipp Slusallek, DFKI)

#Webizing VR content (Sangchul Ahn, LetSee)

#gLTF (Tony Parisi, Wevr / Amanda Watson, Oculus VR)

#Vizor - visual authoring of VR in the browser (Jaakko Manninen, Pixelface)

#Discussion:

#High Performance VR on the Web

#360° video on the Web

#Louay Bassbouss, Fraunhofer, 360° video cloud streaming and HTMLVideoElement extensions

#Laszlo Gombos, Samsung, Encode, tag, control 360° video

#David Dorwin, Google, Path to native 360° video support

#Q&A

#Immersive Audio

#Web Audio API Overview

Q&A:

#Spatial Audio Renderer

Q&A

#Object-based Audio for VR

Q&A

#Breakouts

#Making a HMD WebVR ready

#Link traversal, are we there yet?

#Halving the draw calls with WEBGL_multiview

#Declarative 3D

#Scene Object Model Discussion

#Survey Questions

#Declarative 3D Discussion

#Survey Questions

#Accessibility for WebVR

#High-Performance Processing on the Web

#Depth Sensing on the web

#Standardization Landscape

#
Multimedia & multi-user VR (Simon Gunkel, TNO)