Sync video and hide transcript

Slide 1 of 21

Empowering Musicians and Artists using Machine Learning to Build Their Own Tools in the BrowserLouis McCallum@louismccallumlouismccallum.com

Hi, my name is Louis McCallum, welcome to our talk with the W3C workshop on web and machine learning, covering the topic of empowering musicians and artists using machine learning to build their own tools in the browser.

https://mimicproject.comUse Casehttps://sema.codesmimicproject.comLouis McCallum

Over the past two years, as part of the RCUK AHRC funded Mimic Project, we've provided platforms and libraries for musicians and artists to use, perform, and collaborate online using machine learning.

Although it has a lot to offer these communities, their skill sets and requirements often diverge from more conventional machine learning use cases.

Real Time End User InteractiveMachine LearningUse Casemimicproject.comLouis McCallum

During this short talk, we will address three key requirements when designing for these users.

Primarily we're describing machine learning programs that run in real time.

That is, they receive rich data from microphones, cameras, external sensors and controllers in real time, whilst concurrently running inference and generating media output within the browser without audible and visible interference.

Non ExpertArtistMusicianEnd UserReal Time End User InteractiveMachine Learningmimicproject.comUse CaseLouis McCallum

Secondly, we focus on supporting the needs of end user machine learning, where end users themselves collect data and train and evaluate models on their own browsers.

This is a distinct use case to other web projects that seek to provide pre-trained models for using and generating media in the browser.

Real Time End User Interactive Machine LearningUse Casemimicproject.comLouis McCallumTrainEvaluateMake Data[x1, x2, x3, x4,......y1][x1, x2, x3, x4,......y1][x1, x2, x3, x4,......y1][x1, x2, x3, x4,......y1]........

Thirdly, we support an iterative approach to training and evaluating in real time or near real time feedback loop.

This is sometimes known as interactive machine learning.

It allows for users from a wide range of technical, nontechnical, and creative backgrounds to develop interactive systems with complex media data that they'd have struggled to program by hand, perhaps also with behaviors they wouldn't have thought to program by hand.

Beyond this it's been useful for working with children with special educational needs and disabilities.

For example, the Sound Control project, and in gaming in Unity, you can check out the InteractML project.

This approach to machine learning is also excellent, in the educational settings.

Working with interactive media can be a really accessible stepping stone to data literacy.

mimicproject.comWhy Browsers?No Installation👍Out of the boxLouis McCallum

So why would we want to do this in a browser?

Building browser based tools is useful because they require little of the installation and dependencies that plenty other data science endeavors.

mimicproject.comWhy Browsers?Javascript👍Its really easyLouis McCallum

Additionally, JavaScript is a fast growing language with low barriers to entry.

It's often taught to computer science and creative computing students, which is useful for us.

mimicproject.comWhy Browsers?Networked👍Louis McCallum

Moreover, developing code in browsers and embracing technologies like web sockets opens up great opportunities for real time, remote code and non-code based collaborations.

Latencymimicproject.comWhy Browsers?👎Louis McCallum

While some solutions for increasing efficiency of in-browser machine learning rely on a client server model, it's not suitable for us, for reasons of both performance and privacy.

The latency of using remote backend to do machine learning will almost certainly be inappropriate for many real time performance use cases.

mimicproject.comWhy Browsers?Privacy Issues👎Louis McCallum

Further, as users will be recording their own datasets sensitive data may be required to remain on their local machines.

mimicproject.comWhy Browsers?Unreliable Internet👎Louis McCallum

Finally, our own experience, as well as user feedback has informed us that in situations of live performance, educational settings, and long running installations, relying on internet connections can be infeasible.

As such, we see in-browser solutions for training, data storage and inference highly, highly, highly preferable.

Technical Issuesmimicproject.comTechnical IssuesLouis McCallum

So, with these clear advantages to developing end user machine learning tools in the browser, we also seek to address the non-trivial technical challenges of connecting media and sensory inputs from a variety of sources, running these alongside potentially computationally expensive feature extractors, also running lightweight machine learning models and generating audio and visual output, all in real time, all concurrently, all without interference.

mimicproject.comTechnical IssuesLouis McCallum

For example, we might want to use the BodyPix feature extractor to get a skeleton data from webcam, and use a regression model to control multiple synthesizers.

mimicproject.comTechnical IssuesLouis McCallum

Alternatively, we might want to use a neural network to generate frames of spectral data, and then turn this into audio.

AudioWorklets Implementation Adoptionmimicproject.comTechnical IssuesLouis McCallum

Whilst technologies like AudioWorklets address this to some extent, and we welcome the recent introduction of it into Firefox, there remains some issues with implementation and adoption.

For example, issues with garbage collection created by the worker thread messaging system, caused wide-scale disruption to many developers using AudioWorklets, and was only really addressed by a ring buffer solution that developers had to integrate themselves outside of the core API.

At the time, because of this bug in chromium, there were two to three months over the winter of 2019, where running computationally heavy machine learning processes and generating audio at the same time caused these horrible pops and crackles and stuff.

And because AudioWorklets was only implemented in one browser, that's Chrome, we were unable to offer our users any alternatives.

WebGPU API mimicproject.comTechnical IssuesLouis McCallum

We welcome the efficiency and usability increases to in-browser computation made available through the WebGPU API. It's crucial that it's adopted as a standard across all browsers, and that the API itself and any machine learning libraries using it take real time media into account when implementing themselves.

Connectivity WebBLE WebMIDI mimicproject.comTechnical IssuesLouis McCallum

Finally, although the capabilities of in-browser media creation are expanding, the majority of practitioners remain using software tools outside of the browser to generate sound and visuals.

Until browser support for media generation is improved to allow an ecosystem of tools similar to that existing outside of the browser, this is going to continue to be the case.

So, serving the dual purposes of getting data into the browser to build data sets to train models, and outputting controlled data generated by machine learning models, we seek to further increase the adoption of connectivity technologies, such as Web MIDI and WebBLE.

The WorldWide Developer Conference in 2020 actually sort of Safari disavow both of these as a fingerprinting security precaution.

Safari has also made little suggestion it's going to adopt the AudioWorklets infrastructure.

Real Time End User Interactive Machine Learningmimicproject.comLouis McCallum

So, to wrap it up, we seek to prioritize accessible machine learning.

There's a focus on end users building their own datasets dynamically and training their own models on the fly, in conjunction with real time media analysis and generation.

There's a community of developers, researchers, educators, and creatives, who are able to produce software and resources to enable end users who want to use machine learning in this manner.

However, we're looking at the W3C, and developers of web browsers to allow the performance and the connectivity to make these techniques viable, sustainable, and accessible, and to just keep us in mind, and this particular use case in mind when defining standards or choosing to adopt standards.

@black_in_ai @citeblackwomen #1 - Read Black women's work #2 - Integrate Black women into the CORE of your syllabus (in life & in the classroom). #3 - Acknowledge Black women's intellectual production.#4 - Make space for Black women to speak.#5 - Give Black women the space and time to breathe.mimicproject.com

We'd like to call on those watching this video with an interest in the web and machine learning to actively work to uplift black voices within the research community.

A good place to start is checking out the Black in AI, online community, as well as adhering to the praxis of The Cite Black Women Collective; read black women's work, integrate black women's work into the core of your syllabus, acknowledge black women's intellectual production, make space for black women to speak, give black women the space and time to breathe.

mimicproject.comTHANKSCheck out the FutureLearn Course https://www.futurelearn.com/courses/apply-creative-machine-learning

Okay, thanks for listening.

Check out the MIMIC website, enroll in the free FutureLearn course, if you want to learn more.

You can also read some of the publications, I think we'll put up a slide at the end with some of that information on.

Relevant MIMIC / RapidMIX Publications McCallum, L. And Grierson M. Using Using Machine Learning to Build Musical Instruments in the Browser with MIMIC, NIME 2020, Birmingham, UK, 2020. Grierson, M; Yee-King, M; McCallum, L; Kiefer, C and Zbyszynski, M. Contemporary Machine Learning for Audio and Music Generation on the Web: Current Challenges and Potential Solutions In: ICMC/NYCEMF 2019. New York, United States 16-23 June 2019. Bernardo, F., Kiefer, C., Magnusson, T. (2019). An AudioWorklet-based Signal Engine for a Live Coding Language Ecosystem. In Proceedings of Web Audio Conference 2019, Norwegian University of Science and Technology (NTNU), Trondheim, Norway (Best Paper Award at Web Audio Conference 2019) Bernardo, F., M. Zbyszynski, M. Grierson, and R. Fiebrink. 2020. “Designing and evaluating the usability of a machine learning API for prototyping music technology.” Frontiers in Artificial Intelligence 3(13): 1–18. (journal article) web (with link to download open-access PDF for free) Parke-Wolfe, S. T., H. Scurto, and R. Fiebrink. 2019. “Sound Control: Supporting Custom Musical Interface Design for Children with Disabilities.” Proceedings of the International Conference on New Instruments for Musical Expression (NIME). 4–6 June 2019. (paper with oral presentation) mimicproject.com

Thank you.

Keyboard shortcuts in the video player

Play/pause: space
Increase volume: up arrow
Decrease volume: down arrow
Seek forward: right arrow
Seek backward: left arrow
Captions on/off: C
Fullscreen on/off: F
Mute/unmute: M
Seek percent: 0-9

Previous: RNNoise, Neural Speech Enhancement, and the Browser All talks

Thanks to Futurice for sponsoring the workshop!

Video hosted by WebCastor on their StreamFizz platform.

W3C Workshop on Web and Machine Learning

Empowering Musicians and Artists using Machine Learning to Build Their Own Tools in the Browser - by Louis McCallum (University of London)