W3C Workshop on Web and Machine Learning

Privacy focused machine translation in Firefox - by Nikolay Bogoychev (University of Edinburgh)

Previous: Wreck a Nice Beach in the Browser: Getting the Browser to Recognize Speech All talks Next: AI-Powered Per-Scene Live Encoding

  1st

slideset

Slide 1 of 40

Hello everybody my name is Nikolai and I'm a postdoc at the University of Edinburgh.

Today, I'll be talking to you about our project Bergamot which is privacy focused machine translation inside Firefox.

This is a collaboration between Charles University, University of Edinburgh, University of Sheffield, Tartu University and Mozilla.

Let's start, so the web is huge.

If you look at the top 10 million webpages language, about 60% of them are in English, then you have Russian, Spanish, Turkish, Persian, and a very long tail of every other language.

On top of that, we have social media every second, there's thousands and thousands of social media posts in the multitude of languages.

And people nowadays they're used to consuming any content, regardless of the language that is been produced in.

How do they do that?

They rely on machine translation services, which are immensely popular everywhere.

There's Google translate, there's Microsoft Bing translate, there's Facebook translate, there's Baidu translate, there's many other translation services.

However, they do come with a few issues.

If you use an online translation services, you lose your privacy.

Your data goes on the cloud, even if you pay for it.

And when they have your data, they might accidentally leak it, like translate.com leaked Statoil's employment contracts.

That's quite embarrassing.

And also you don't know how good a translation is.

If you remember a few years ago, there was a post in Arabic that said, hello, and it was translated as kill them all.

That's not good, we shouldn't allow that to happen.

Can we do better than that?

Yes, we can.

We can run the machine translation service locally.

But how do we do that: as far as everybody knows, machine translation is slow, it requires heavy duty GPUs, lots of power, lots of disk space, but is that really the case?

Not anymore.

A lot of it has changed.

And this is where our project comes in: Bergamot about building a machine translation service inside Firefox.

Now for some live demo, let's go to the nightly build of Firefox, and we can go to any web page let's go to the Spanish BBC. We load the page and then we click translate and we get a translation.

And then we can click on any webpage.

And then we click translate and you have a translation in green, which means that the system is confident with the translation.

But sometimes when the system is not very confident, it shows the original in black, which means that if the user sees that something is not very certain, they can check it in a dictionary or something else.

They don't need to rely blinded on the machine translation software.

We can go to another web page, such as Wikipedia.

We go there, we load it, we click translate and there we have it.

And we have an article about Lovecraft we can go on it.

We can click translate, wait for a second or two, because as you can see, the webpage is pretty huge, but we do end up with a translation in fairly reasonable time.

And yeah, that's it, simple demo.

Let's go back to our presentation.

So, as you saw the translation is quite fast, but how fast is it exactly?

Well, we benchmark on various different hardware, and we measured how many source words per second we can translate.

And I'm running a 2019 desktop, which can do about 9,000 source words per second, and then older hardware tested various desktop, laptops and the slowest one from 2012, that's 8 years old laptop: it's about a thousand words per second.

Now I tried very hard to find what is the typical number of words in the webpage, but there were many conflicting sources.

A lot of them said between 600 and 800 words, it's probably a reasonable estimate.

And that means that even on an 8 years old hardware, we can still translate the webpage in about a second.

How we do that?

We did lots of complicated model and software optimizations.

You can check them out in our WNGT 2020 submission.

Another thing that I didn't demonstrate is foreign language forms.

A lot of times when you go to a foreign country and you want to do some service, you're presented with a form that you need to fill in, in the foreign language.

Now you can use any translation service to translate the text on the form, but you still need to include that language in order to use it.

So we are going to provide an outbound translation service, where you enter the text in English, and then it is going to be translated in the language that you need for the form.

We are also going to do the same quality estimates so that when you see that the system is not very confident with translating your input, you can rephrase it and it can try again.

And yeah, that's it.

Machine translation is no longer just in the cloud you can now use it locally and securely inside Firefox.

And that's it, thank you for your time.

Keyboard shortcuts in the video player
  • Play/pause: space
  • Increase volume: up arrow
  • Decrease volume: down arrow
  • Seek forward: right arrow
  • Seek backward: left arrow
  • Captions on/off: C
  • Fullscreen on/off: F
  • Mute/unmute: M
  • Seek percent: 0-9

Previous: Wreck a Nice Beach in the Browser: Getting the Browser to Recognize Speech All talks Next: AI-Powered Per-Scene Live Encoding

Thanks to Futurice for sponsoring the workshop!

futurice

Video hosted by WebCastor on their StreamFizz platform.