W3C Workshop on Web and Machine Learning

Privacy-first approach to machine learning - by Philip Laszkowicz ()

All talks Next: Machine Learning and Web Media

    1st

slideset

Slide 1 of 40

Hi, I'd like to dive straight into this discussion on privacy first machine learning for web applications, by outlining why I feel machine learning for the web is an essential feature.

Accessibility is part of a wider effort to make applications more inclusive.

By inclusively taking into account the various diverse individuals using our software and hardware and making sure there are equal and fair access without barriers.

It's inclusive as it takes in both short term and long term accessibility concerns, but when we look into these aspects, we're often looking at physical considerations only, such as visual or limb impairments.

Although considerations outside the physical are often ignored when we're examining inclusivity, and this can sometimes mean individuals with diverse backgrounds are left behind when we build software.

Take an application that is large and slow to load, we'll build it, we can consider the market demographics which devices, operating systems and browsers to support.

Well, this is at its most fundamental financially driven.

Of course, there's always a cost and overturn attributes to most development efforts, even if the cost is measured in time rather than money.

With the web we have a partial solution to this issue with progressive enhancement.

We can deliver content easily, and we have been for decades, enabling delivery of most web applications without it being necessary to bundle heavy assets.

Those assets are often enhancements to a product but not essential.

We can mitigate issues with underpowered hardware and underdelivered infrastructure by focusing on content delivery first and content enhancements second.

Although we can look at simple metrics like bandwidth, latency and feature availability, which typically is indicative of a browser version, we should be understanding the human side of this, typically smaller bandwidth, larger latency and older browsers support is indicative of less wealth.

So by excluding those demographics to reduce development costs no matter how small the market, you're excluding poor individuals from your content.

This is directly against web accessibility initiative's goals and accessibility, usability and inclusion.

The web is designed with accessibility in mind with varying degrees of success.

So this is where mobile applications differ.

Consumer electronics have a premium and an inherent exclusivity, sure they have great support for accessibility features, but they lack some of the core benefits of the web, they're typically not great at backwards compatibility, whereas web development is considered of not breaking experiences as standards evolve.

If we want to deliver on a lot of old or less expensive handsets with native code, we have to support all the APIs, and in some cases divergent code bases, which becomes incredibly costly.

Weighing up the cost benefits of developing to a particular handset is entirely justifiable to most projects, and mobile prices reflect this discrepancy in quality of content.

The solution to this legacy support issue for developers can often be to go web first: deliver the content, enhance the content and then deliver a native app if it makes sense for specific features or markets.

Of course, I am simplifying a complex discussion here and so much don't fit into that categorization.

And native apps sometimes come first.

And this is often the case with games for example.

Privacy is often a marketable feature of a product.

Company sell privacy as value added feature all the time.

There's pro versions of apps that are definitively apps that don't have advertising built in.

And we know advertisements often come with tracking codes.

So for the sake of simplicity, we can assume ads and trackers are one and the same.

This isn't a criticism of advertisements that advertisement is slow.

If ads were delivered in an absolutely privacy friendly way, and user experience wasn't reduced due to increased latency or broken accessibility features, then there'd be little to complain about.

When you position privacy as a luxury, then you're valuing it as a convenience.

This is a naive and privileged view of privacy and exclusionary at best.

Without going into real cases where weakened privacy has a lethal or material impact on lives, let's realize this view of privacy is often taken from people in the tech industry who are privileged enough to earn a decent salary, and can afford to make a decision on if and when to give up their data.

By default into the stance that privacy is a commodity and data is tradable like oil, we're defaulting our products to be exclusive and favoring privacy as a convenience which costs extra.

This is exploitative in most markets as in many cases, individuals will opt for a free service and give up their privacy if given the choice, and many don't have a choice in the first place.

Let's take a look at how privacy has negatively affected the user experience on the web for a moment.

With EU increasingly adding privacy protections into legislation, web developers have devised annoying modal dialogues which request numerous opt ins to enable trackers in a legally friendly way.

Many developers have taken issue with the EU law and have aggressively argued that it's only way to deliver web apps to the European market.

Many of the apps featuring these modal dialogues have nothing to do with advertisements or e-commerce.

Developers are simply defaulting to tracking users on a first load, and are reducing the quality of the user experience of your apps along the way, due to laziness, not legislation.

You can't deliver advertisements without tracking.

So if you're monetized by ads, you don't need to destroy your user experience along the way.

If you're monetized by data and tracking, then it's time to rethink your business model.

How will the web look without base monetization, but no tracking or data gathering, for example?

Essentially it would look the same, but without the annoying needless modal dialogues.

As developers were to blame, not the EU, California or the end users, we as developers need to do better.

So how does this affect machine learning?

This is where mobile apps can only be much better.

We have the capability due to hardware access to deliver personalization on mobile apps using machine learning.

We can currently build offline enabled models with tooling and bundle them with our apps enabling personalized experiences by learning about the end user without sending data anywhere.

We can even add differential privacy and federation tool models, features also available in edge devices and internet of things.

This means we can do clever personalized analytics without looking at data.

And the worst we would need to do is send the model weights towards services for federation before updating the model again.

We're not transferring data here, we're transferring anonymous weights only.

This means mobile apps can create much better experiences than web apps when privacy is valued.

In this way the web is far behind in terms of privacy features.

It doesn't matter what cookie protections, tracking blockers or anonymization features you attempt to build in, if you're lazily transferring data to a central resource, you're probably not privacy enabled.

Take for example, those privacy aware solutions that hash an individual's email address @@@ storing it in plain text.

When the information leaks, there's no personal identifiable information, right?

Well, as most developers use the same hashing algorithms, there's a good chance you can connect disparate sources of data together to form a rather solid picture of someone, hashes are not the finishing line for privacy, they're hardly the starting line.

Let's segue into looking at most machine learning infrastructure currently powering moderately successful web apps for a moment.

Typically these days we're building data warehouses, data lakes, to support data aggregation, analysis, model building and deployment.

At various scales, this architecture is essentially ubiquitous and has been for some time, whether you're building one model or a 1000 models.

This architecture is lazy, most developers will take this blueprint and continue to implement it are lazy, it's commercially focused, it's driven by an @@@ of the data can be collected, and the models that can be built with the analytics dashboards that can be provided.

But it's not driven by the end user, it's not driven by individuals, and it's not driven by humans.

It's also expensive in the long term, but that's another discussion.

So what does good data science architecture look like for the web?

Firstly, like apps it should be privacy first.

We shouldn't be vacuuming data and then attempt to apply anonymization techniques to that data.

That's like committing a crime and then bleaching the hotel room.

I swear, if you go in with a UV light, you'll still find evidence, no matter how hard you worked.

So let's stop trying to fix the data and fix the architecture instead.

Differential privacy is a great tool if applied correctly, but it's still not the first feature we should be looking at.

Decentralization, distribution and federation are modern approaches to data science architecture.

They provide the best features to the end user, including protection by default, and they give just a little more work.

Now, this may be hard for a lot of developers and data scientists to hear, but we're meant to be doing that hard work.

That's what programming is, reducing the complexity of a set of tasks for the end user.

So what does this approach give us?

It enables us to run inferences on edge devices and in some cases it allows us to run some of the training, while some infrastructure is there to give you heavy lifting only.

If we distributed the training workload we can even reduce data IO by mitigating the need to centrally store in some gigantic data lake.

And data lakes have their uses, but they're often overuses, when all we really need to know is about events and weights.

Whenever ML operations are first class features of the modern web browser, we can add progressive enhancement to our apps by allowing models to be executed within the browser using the best available hardware.

Architecturally, our browser becomes our OS device in this ideal, and we can also offload some the inference from the legacy central platform to the end user, providing them with added privacy, more responsive resources and more personalized experiences.

We can also reuse the same models that we've been building in mobile apps.

By bringing browsers in line with native apps in terms of model inference capabilities, by optimizing usage of hardware like central graphics and neural processing units to process models, it allows us to build more inclusive solutions, without which we can't build truly privacy friendly and machine learning powered apps.

It lowers the cost of entry from a latest luxury electronic device to a favorite free browser download, and this is key to ensuring the web is a free and safe ecosystem for everyone.

Thank you for listening.

Keyboard shortcuts in the video player
  • Play/pause: space
  • Increase volume: up arrow
  • Decrease volume: down arrow
  • Seek forward: right arrow
  • Seek backward: left arrow
  • Captions on/off: C
  • Fullscreen on/off: F
  • Mute/unmute: M
  • Seek percent: 0-9

All talks Next: Machine Learning and Web Media

Thanks to Futurice for sponsoring the workshop!

futurice

Video hosted by WebCastor on their StreamFizz platform.