Slide 1
Fast client-side ML with TensorFlow.js 1 Ann Yuan Software Engineer - TensorFlow.js , Google
Hi, my name is Ann, and I'm a software engineer at Google.
I work on tensorflow.js, a JavaScript library for machine learning.
Since we launched in March of 2018, we've seen tremendous adoption by the JavaScript community with over two million downloads on NPM NPM is a package management system for node.js .
Meanwhile, a distinctive class of machine learning applications has emerged that leverage the unique advantages of on-device computation, such as access to sensor data and preservation of user privacy.
In this talk, I'll discuss how TFJS TensorFlow.js (TFJS for short) is a JavaScript library to run Machine Learning models brings high performance, machine learning to a JavaScript through standard and emerging web technologies, including WebAssembly WebAssembly (WASM for short) is a format for programs that can be executed very fast (much faster than JavaScript) in browsers and that can be generated from existing code-base in non JavaScript languages (e.g. C, C++, rust) , WebGL WebGL is a JavaScript API designed to run GPU-accelerated 3D graphics in browsers and can also be used to take advantage of the parallel computing capabilities of GPUs in general, a much needed feature for running Machine Learning models , and WebGPU WebGPU is an emerging JavaScript API to interact with GPU capabilities with more in-depth integration than possible with WebGL. These capabilities include their fast parallel computing, a much needed feature for running Machine Learning models .
Slide 2
What is TensorFlow.js? 2 WebGL CPU WASM Headless GL TF CPU Browser / native Core API Backends Node.js TF GPU
TFJS defines an API for neural network operations, such as matrix matrices are a mathematical construct used throughout machine learning algorithms multiplication, exponentiation, etc.
These operations call into kernels, which implement the mathematical functions comprising the operation for a particular execution environment.
For example, WebAssembly or WebGL.
A TFJS backend is a collection of such kernels that are defined for the same environment.
Slide 3
Performance overview 3 WebGL WASM WASM+SIMD Plain JS iPhone XS 18.1 140 426.4 Pixel 3 77.3 266.2 2345.2 Desktop Linux 17.1 91.5 61.9 1049 Desktop Windows 41.6 123.1 37.2 1117 MacBook Pro 2018 19.6 98.4 30.2 893.5 Inference times for MobileNet in ms.
Before diving into the details of our different backends, I'd like to provide an overview of how they compare in terms of performance.
This table shows how our WebGL, WebAssembly, and plain JS backends compare when it comes to inference on MobileNet, a medium-sized model with a few hundred million multiply-add multiply-add is a frequently used primitive operation when running a machine learning model operations.
For this model, our WebAssembly backend is between 3 and 11 times faster than our plain JS backend.
Our WebAssembly backend is between 5 and 8 times slower than our WebGL backend.
Slide 4
Performance overview 4 WebGL WASM WASM+SIMD Plain JS iPhone XS 10.5 21.4 176.9 Pixel 3 31.8 40.7 535.2 Desktop Linux 12.7 12.6 249.5 Desktop Windows 7.1 16.2 7.5 270.9 MacBook Pro 2018 22.7 13.6 7.9 209.1 Inference times for FaceDetector in ms.
This next table shows how our backends compare when it comes to inference on face detector, a much smaller model with only around 20 million multiply-add operations.
In this case, our WebAssembly backend is between 8 and 20 times faster than our plain JS backend.
And it's actually comparable with our WebGL backend.
For example, on a 2018 MacBook Pro, our WebAssembly backend is twice as fast as our WebGL backend.
With SIMD SIMD stands for Single Instruction Multiple Data, an approach to accelerate parallel computing operations on CPUs - a particularly needed feature for running Machine Learning models enabled, it's 3 times faster.
These benchmarks demonstrate that there is no one size-fits-all technology for machine learning on the web.
The best choice of execution environment depends on many factors, including the model architecture and the device.
Technologies such as WebAssembly and WebGL address different use cases.
And we as TFJS developers must invest in a wide swath of technologies in order to meet our users' needs.
Slide 5
The WebAssembly backend How it works: ● Language: C++ ● Compiler: Emscripten ● Accelerator: XNNPACK 5
Now I'll go into the details of some of our backends, starting with WebAssembly.
Our WebAssembly backend kernels are written in C++, and compiled with EMScripten EMScripten is a tool that can compiles code from various programming languages into JavaScript or WebAssembly .
We use XNNPACK, a highly optimized library of neural network operators for further acceleration.
As we've seen, our WebAssembly backend is ideally suited for lighter models.
And in the last year, we've seen a wave of such production quality models designed for Edge devices.
But WebAssembly is steadily closing the performance gap with WebGL for larger models as well.
A few weeks ago, we added support for SIMD instructions to our WebAssembly backend.
This led to a 3X performance boost for MobileNet, and a 2X performance boost for face detector.
We're also actively working on adding support for multithreading through SharedArrayBuffer.
According to our internal benchmarks, multithreading will provide an additional 3x performance boost for MobileNet, and 2x performance boost for face detector.
Slide 6
WebAssembly: our wishlist ● Broader SIMD, SharedArrayBuffer support ● Wider SIMD ● Quasi fused multiply-add ● Pseudo minimum / maximum ● ES6 module support 6
For these reasons, we expect adoption of our WebAssembly backend to continue to grow.
We're eager for more users to enjoy the benefits of SIMD and multithreading.
We're also closely following the progress of several evolving specifications in WebAssembly, including flexible vectors for Wider SIMD, quasi-fused multiply-add, and pseudo minimum and maximum instructions.
We're also looking forward to ES6 module support for WebAssembly modules.
Slide 7
The WebGL backend How it works: ● Data: GPU textures ● Computation: Fragment shaders 7
TFJS also offers a GPU accelerated backend built on top of the WebGL API. We repurpose this API for high performance numerical computation by representing data in the form of GPU textures, and using fragment shaders to execute neural network operations.
As we've seen, our WebGL backend is still the fastest for larger models containing wide operations that justify the fixed overhead costs of shader execution.
Slide 8
WebGL: our wishlist ● Improved portability ● Tools for memory management ● Callbacks for data download 8
Our WebGL backend is complex.
This complexity comes from many sources.
Firstly, WebGL implementations vary significantly across platforms, often with implications for numerical precision.
Much of our code is devoted to hiding these inconsistencies from our users.
Another significant source of complexity in our WebGL backend is manual memory management.
Because GPU resources are not garbage collected, we must explicitly manage resource disposal through reference counting.
To help our users avoid leaking memory, we expose a utility called tf.tidy that takes a function, executes it, and disposes any intermediate resources created by that function.
Despite these measures, memory management remains a source of error in our WebGL backend.
Therefore, we're excited about new proposals for user-defined finalizers that would give us more security against memory leaks.
Finally, the lack of callbacks for asynchronous WebGL texture downloading means we must pull for download completion.
This has implications for both code complexity and performance.
Slide 9
The WebGPU backend 9 How it works: ● Data: Storage buffers ● Computation: Compute shaders
TFJs also offers an experimental backend built on top of the emerging WebGPU standard.
WebGPU represents an exciting opportunity for addressing the pain points of WebGL.
WebGPU promises better performance and a dedicated API for GPU compute.
As the successor to WebGL, WebGPU is designed to operate directly with low-level graphics APIs, such as D3D, Metal, and Vulkan.
The WebGPU shading language will be directly ingested, and will hopefully offer faster shader compilation, compared to WebGL.
Slide 10
WebGPU: Performance 10 WebGPU WebGL Discrete GPU (Radeon Pro 555X) 41.7 51.1 Integrated GPU (Intel UHD Graphics 630) 119.5 106.5 Inference times for PoseNet (ResNet50 architecture) in ms.
This table shows inference speeds for the Posenet model, built on top of the resnet-50 architecture, a large model with several billion multiply-add operations.
These benchmarks demonstrate the reality that WebGPU has not delivered significant out-of-the-box performance gains.
However, the technology is rapidly evolving, and we're continuing to track progress closely.
Slide 11
Future web standards wishlist ● Portability ● Tools for memory management ● Tools for model obfuscation ● Detailed profiling ● Low-level API
We're excited about the potential for future machine learning web standards to address the recurring pain points we faced in developing TFJS, including lack of portability and manual memory management.
Such standards also represent an opportunity to address the distinctive needs of machine learning-powered web applications.
For example, TFJS users have increasingly asked for ways to obfuscate their models in order to protect intellectual property.
We also hope that future standards will preserve the features that have made our progress thus far possible, such as detailed profiling and access to low-level APIs that give us the ability to define and debug operations at a granular level.
Alright, that's it for me.
Thank you very much.