Transcript
Hi, I'm Sangwhan Moon from the Technical Architecture Group also known as the TAG.
And today I will be talking about how machine learning today will ideally become a part of the web platform.
To briefly introduce myself as noted earlier I am an elected member of the TAG, team leader then E-commerce Oriented Computers Startup, and finally a PhD student in Natural Language Processing at the Tokyo Institute of Technology.
I would like to note that the presentation today carries more weight from a personal perspective on the topic rather than the formal group message from the TAG.
Machine learning, namely the sub-field, which involves using large neural networks to approximate various functions through large amounts of training that have been a hot topic in the last couple of years.
Due to the social technical impact this has brought to our lives developers and users alike expecting this functionality to become part of the web platform is a natural progression.
Many operating systems have started providing the APIs that allow you access to these capabilities, sometimes which wire directly to a neural network optimized core processor.
First and foremost we would like to work to have a stronger focus on maximizing impact with smaller incremental changes.
All these topic may still be open for debate, how we see this is to focus more on inference rather than try to tackle training and providing a full blown set of APIs from the get-go.
The reason for this is that we see a larger pool of users on the inference side rather than the training side.
Mainly because having a browser tab open for weeks doing a training run seems like a less likely scenario.
Eventually we'd like to see more capabilities added but incremental enhancements allow us as a platform to adapt to actual user needs in the world that focus on fixing easier problems first.
One of the complications in defining a mechanism to run pre-trained neural network models in the browser is to agree on the standard format for packaging and shipping these neural network models.
Machine learning academia and the ecosystem of frameworks have still not agreed on a common format which makes this challenging for us as a platform, as we must choose from one of the multiple competing proposals.
But we do not have a particular preference on the specific format, we'd like to see some form of convergence to make life easier for the users of the platform.
The same applies for the shader language for computation, which as we understand has also has yet to achieve consensus on.
Additionally while it is possible to do a lot of this without a permission as of today, through capabilities already available in the platform such as WebGL and WebAssembly, at the point of standardization as a platform API, we would like to see this behind the permission if possible.
The reasons for this is not only due to the potential privacy implications, but also due to the power requirements that these APIs may bring to the table.
This can have negative effects on the battery life for users who get out of a single charge.
So we believe that the users should have a choice to reject if they are in a situation where they would want to be conservative about power usage.
An extra point that we'd like to bring to the attention of the group is to open questions that remain to integrate these capabilities to the platform.
Due to the current limitations of JavaScript such as the lack of operator overloading it makes it challenging to implement an ergonomic API for vector matrix or tensor operations on the web platform.
From a user's perspective, it's a lot more straightforward to have infix functions than to chain the pilot function calls.
The last question we have is whether there is a path forward to converge with non-browser JavaScript runtimes such as Node.js.
These are points that are open for further discussion.
The TAG has two principles that might be useful for this group to move the work forward.
We have the base guidelines which is the Design Principles documents, which touches on how APIs for the web should be designed, but also the ethical web principles, which we recently released.
As capabilities such as computer vision, which can be achieved with the work that is being proposed by this group, can also have wider social and ethical implications, we'd like the group to look into the larger scale social impact by bringing these capabilities to the web, to mitigate any potential risks that it may bring to the end user's privacy.
As this is a powerful and complex capability that we have not previously done on the web platform, the TAG would be very interested in doing early reviews on the topic.
We would be happy to review early proposals from the contributors of the group, through our design reviews repository at github.com/w3ctag/design-reviews We'd also be more than happy to actively engage in any cross working group facilitation needed to make this work happen.
Thank you for listening to my presentation.
I'll be available in the live session for any questions you may have.
Keyboard shortcuts in the video player
- Play/pause: space
- Increase volume: up arrow
- Decrease volume: down arrow
- Seek forward: right arrow
- Seek backward: left arrow
- Captions on/off: C
- Fullscreen on/off: F
- Mute/unmute: M
- Seek percent: 0-9