W3C Workshop on Web and Machine Learning

Opportunities and Challenges for TensorFlow.js and beyond - by Jason Mayes (Google)

Previous: Machine Learning and Web Media All talks Next: Machine Learning in Web Architecture

    1st

slideset

Slide 1 of 19
.jsW3C Workshop 2020Opportunities & Challenges for TensorFlow.js and beyondJason MayesSenior DA - TensorFlow.js, Google@jason_mayes1

Hello, everyone.

My name is Jason Mayes, I am the developer advocate for TensorFlow.js here at Google, and today we'd like to talk to you about some of the opportunities and challenges we've seen, whilst creating and maintaining TensorFlow.js, and we believe these things will be applicable to the wider Machine Learning and JavaScript community as well.

So let's get started.

TensorFlow.js is an open source library for machine learninginJavaScript.ML in the browser / client side means lower latency, high privacy, and lower serving cost. We also support Node.jsserver side for larger more complex models.

Now, for those of you who are not aware about us, essentially TensorFlow.js is an open source Machine Learning Library, that is built in JavaScript.

It allows you to do machine learning in the browser, on the client-side, which means you have lower latency, higher privacy, and lower serving cost of course.

And we also support other environments such as Node.js, which means we can execute in a whole bunch of places.

BrowserMobileServerDesktopWeChatElectronIoT(via Node)

And in fact, if we look at the next slide, you can see all the environments we run.

And the reason I bring this up, is because when we're defining web standards, often these things trickle into these other environments as well.

So, we've got all the common web browsers there, but also, Node.js on the back end, React Native for mobile native apps.

We've got Electron for desktop native apps, and of course Raspberry Pi for Internet of Things, which we can access via Node.js.

So, I just want to be mindful when we are thinking about ideas today that we are aware that these things could trickle through to these other areas when people try to use machine learning in JavaScript in these environments as well.

Core / Ops API(Eager)Layers APIModelsClientNode.jsWebGLCPUWASMHeadless GLTF CPUTF GPUBrowser / WeChat / React NativeServerKeras ModelTensorFlowSavedModelTFJS Converter

So for those of you who are not familiar with our architecture, this is the current stack.

Right now we have a bunch of pre-made models that sit at the very top there, that are super easy to use JavaScript classes.

Just below this, we have a Layers API, which is a high level API to allow you to do machine learning more easily, which is very similar to Keras in Python, if you're familiar with that.

Below these we have our core and Ops API, which is the more mathematical layer, so allows you to do the things like linear algebra, and so on and so forth.

And this can talk to different environments, such as the client-side or the server-side.

Now if you just focus on the client-side for a second, you can see things like the browser, WeChat, React Native sitting over there, and each one of these environments understands how to talk to different back-ends, such as the CPU, WebGL or web assembly.

Now, of course the CPU is always available, but it's the slowest form of execution.

If a graphics card is available, we can leverage WebGL to get higher performance on the graphics card, and if web assembly is available, we can leverage high performance on the CPU, by utilizing low level instructions.

Now, it should also be noted people can also convert models from Python into JavaScript using our converters, as you can see on the left-hand side, and this is something to bear in mind because people might try and load larger models or more complex models in the future via this method.

3 key user journeysRun, Retrain, Write5Run existing models Pre-packaged JavaScript or Converted from PythonRetrain existing modelsWith transfer learningWrite models in JS Train from scratch

Now we see three key user journeys right now, when people are using TensorFlow.js.

First one is the ability to run models that are pre-trained, that's the easiest route and what people often start with.

People then choose to try and retrain their models by transfer learning as their next step to work with their own custom data, and then of course a third point is to write their own models completely from scratch.

And this might be in the browser entirely.

Or, it could be a combination of Node.js and then running the resulting model in the browser.

For anything you may dream up6Augmented RealityGesture-based interactionSound recognitionAccessible web appsSentiment analysis, abuse detection, NLPConversational AIWeb-page optimizationAnd much more when combined with other web technologies...

And of course this can be used for anything you might dream up, and here's just a few examples of things people have been creating, that we've seen on the internet today.

Things like augmented reality, sound recognition, sentiment analysis, web page optimization, and much much more.

For anything you may dream up7

Well, almost anything.

.jsLimitations / RoadblocksWhat have we learnt from creating TensorFlow.js?8

And today, we'd like to talk to you about some of those limitations and roadblocks that we found whilst building and maintaining TensorFlow.js.

And we believe these, will be applicable to any Machine Learning library created going forward.

So the first point we want to talk about is Float32.

Now, this is great for many of the tasks, and I know Float64 is even supported in JavaScript.

Float 32JS / WASM support Float 32, but not 16In machine learning there are times you want to change the accuracy of the weights of the model to be say for example Float 16 instead of 32. This allows you to:1.Use less memory to store the model at runtime2.Increase execution speedCurrently JS / WASM only has support for Float 32 on the lower end of the scale.Right now TensorFlow.js quantization only has the effect of reducing the file size sent from server to client, but as soon as we load into memory we then need to use Float 32 again meaning we miss out on the runtime benefits of this quantization.9

However, when we're doing model quantization, we actually want to support Float16, and this currently does not exist in JavaScript or in Wasm.

And this is really important to us, so that we can execute models faster, and use less memory when doing so too.

And of course, you might get a 10% drop-off in your model accuracy by doing this, but for some environments that might be acceptable, especially on mobile on older devices, where you might not have the speed to begin with.

Right now, on the server-side, we can actually, store things in 16-bit.

However, when we load it into JavaScript memory, it then gets converted to Float32, and we end up using the same memory and have the same speed as before, which means no progress there for us.

Float 16What if we could support model quantization to use less memory and gain faster inference in JS at run time?Whilst the model accuracy may decrease, the 10% drop or so may be acceptable if it means it can run on devices that have less memory available and to run faster on lower end devices.This would need to be available in JS and WASM.10

So, what if we could support model quantization to use less memory and gain faster inference speeds in JavaScript at runtime.

This is the question we'd like to pose to you today.

Now, of course, to address this, we'd need to do this in both JavaScript and web assembly, so that supports all the environments we will execute in the foreseeable future, as we showed at the beginning of this presentation.

Garbage Collection - WebGLThe JS garbage collector is great, but it does not deal with WebGL memoryIn TensorFlow.js we have a tidy() function that understands when to clean up tensors that will no longer be used.People new to machine learning however may not be familiar with this concept, especially if are used to JS cleaning up automatically which can lead to memory leaks.11

Next up, garbage collection.

For WebGL. As you know, JavaScript is really great at cleaning up after itself when writing Vanilla JavaScript code.

However, the same is not so true for WebGL.

And, as you know, TensorFlow.js uses WebGL to get graphics card acceleration for our Machine Learning models in the web browser and beyond.

So, right now we have a function called TF.tidy() that we've created to clean up after ourselves if the user puts their code within this function.

However, not all users know about this at the very beginning especially beginners, and for that reason, it'd be really nice to have the same level of clean-up with graphics card memory as we do with the regular JavaScript code.

Garbage Collection - WebGLHow could we clean up WebGL memory too?Currently WebGL is the primary way we get graphics card acceleration for Machine Learning in JS until WebGPU is more widely available.Is this a situation that can be solved with future version of WebGPU spec or is this something we can address for WebGL too which may benefit other use cases too? 12

So the question is here, how can we clean up WebGL memory as well?

So we know that WebGPU is also coming down the line, but maybe this needs to be addressed in that specification as well.

Can we clean up graphics card memory both in WebGPU and WebGL.

And the latter, this also might benefit people working with 3D graphics and other things too beyond even the Machine Learning space.

Graphics card accelerationCurrently we use WebGL to execute ops in the machine learning modelIt would be more efficient if the browser exposed lower level APIs to the graphics card for more efficient utilisation of the hardware.13

Next up, graphics card acceleration.

Currently, we have WebGL to execute Ops in the machine learning model as we previously discussed, but it'd be much more efficient if the browser exposed lower level APIs to the graphics card so we could more efficiently leverage the hardware.

Graphics card accelerationWhat lower level support do we need for efficient ML when using the graphics card?Clearly WebGPU is on the way, but with an ML specific lens, are there any other specific needs that may need to be part of such an API in the future?14

Now, the question here is what lower level support do we need for efficient Machine Learning when using the graphics card.

And of course, WebGPU is on the way, but, what else needs to be added to that spec to ensure we have something that works well, specifically for machine learning.

Model SecurityMany production use cases require that their model can not be stolen and used elsewhere.We have seen a number of use cases where deploying to the front end is preferable for privacy / latency / offline usage but the bottleneck right now is the concern their hard work will be copied and used elsewhere.Some models can be a significant part of a companies’ IP and take a lot of money to develop. Whilst the resulting service may be free to use, companies are reluctant to give away the model itself.15

Next up, we've got Model Security.

Now we see a lot of production use cases, whereby they require the model to be securely delivered to the client, in a way that it can't be copied and used on other websites.

Especially for large corporate brands, they spend a lot of money and time creating these models, and where they won't just give away their IP for free.

Model SecurityHow can we prevent an end user from copying a front end deployed ML model in browser?Could we have a secure way to download a set of files / JS code such that it is not exposed for inspection / saving locally and instead provide a way to communicate with that code to get pass data to it and get results without being able to intercept the downloaded model / pre / post processing code in any way?16

So, our question is here, how can we deliver a Machine Learning model to the JavaScript environment in the Web browser, without revealing it, maybe there's a way that we can have a secure way to grab some arbitrary JavaScript code from the server, that is doing Machine Learning stuff along with the model as well and the assets you might need to execute, and all that can be downloaded by the browser behind the scenes, in private memory, which cannot be accessed by the JS developer on the front end.

However, they can do some kind of remote procedure call to that code, so that they can execute it and get results back, without exposing the model itself.

And this is up to discussion of course, that's just one example of how it could be solved which would require browser level implementation support to do that properly.

And currently, this is a big barrier for many people trying to go into production-use cases but still want the benefits of running on the client side, such as privacy and lower latency and cost savings on the server.

And of course, as soon as you put the model on the server-side, those benefits disappear because you have to then send the data from the client to the server.

Model WarmupOften it can take a couple of uses of the model before it runs at optimal inference speedCurrently developers have tried approaches to warm up the model on page load by sending zeros to the model as input so the next time it is called by the user for a real task it is ready to do so efficiently.We have also seen developers experiment with having more efficient models running initially in WASM, whilst a more complex model loads in WebGL which it then switches to when ready.17

And what about model warm-up, it can take a couple of uses before the model can actually run at the optimal speed in the browser environment.

Of course, first of all you need to download the model.

Secondly you need to load it into memory and pass all that stuff, and then thirdly, you need to just run some data for at once to get everything else set up.

And this can take a non trivial amount of time especially for larger models.

Model WarmupWhat if there was a standard way to specify a better model is available and should be prepared and swapped to when ready?Taking a hypothetical example of detecting an object in an image maybe the use case would be as follows:1)Initially download a lightweight model like COCO-SSD that loads fast but only gives us bounding box data2)In the background download a more advanced image segmentation model (but maybe takes several seconds to load up) to the upgrade the resolution of what is detected in the image and swap to that model when it is warmed up and ready to use.3)How could this be defined in a library agnostic way? What other considerations should there be here?18

So the question here becomes, what if there is a standardized way to specify that a better model is available and should be prepared and swapped to when ready kinda like progressive enhancement.

Now taking a very hypothetical example, maybe you've got an object recognition model, and this could be something like COCO SSD that gives you the bounding box data.

This loads really fast in the web browser right now and can be used very quickly and efficiently.

But maybe your end goal is to actually have some kind of image segmentation model which might be heavier to load.

So what if you could take that initial smaller model, load that, get some results coming in straight away, but once heavier model is actually ready, you can switch to that automatically.

And this could be very interesting as things progress and we start seeing larger models being used in the web environment in many years to come.

19See what the community has made#MadeWithTFJS

And with that, thanks for watching and I encourage people to check out the #MadeWithTFJS hashtag on Twitter or LinkedIn to see what our community has been making.

You can see a whole bunch of awesome stuff that our community has made, which might inspire you as well of other questions for this discussion.

People are really pushing the boundaries by combining TensorFlow.js with things like WebGL, WebRTC, WebXR, all these other web standards are combining with Machine Learning to do many many great things.

So, feel free to check that out, and we'd love to talk to you later on for the full discussion about this topic.

Thank you very much for watching.

Keyboard shortcuts in the video player
  • Play/pause: space
  • Increase volume: up arrow
  • Decrease volume: down arrow
  • Seek forward: right arrow
  • Seek backward: left arrow
  • Captions on/off: C
  • Fullscreen on/off: F
  • Mute/unmute: M
  • Seek percent: 0-9

Previous: Machine Learning and Web Media All talks Next: Machine Learning in Web Architecture

Thanks to Futurice for sponsoring the workshop!

futurice

Video hosted by WebCastor on their StreamFizz platform.