Participants

Alex N. Jose, Pat Meenan, Andrew Galloni, Nic Jansma, Alex Christensen, Sean Feng, Ian Clelland, Marcel Duran, John Engebretson, Giacomo Zechini, Nitish Mittai, Ankit Jain, Alon Kochba, Hardien Raffali, Michal Mocny, Philip Walton, Noam Helfman, Yoav Weiss, Neil Craig, Tim Kadlec, Carine Bournez, Philip Tellis, Xiaochen Hu

Minutes

A/B testing - Alex N. Jose

Presentation recording

Alex: Discussions last year about this
A/B testing is about applying changes to a web application on the browser - typically cosmetic changes
Using JS to perform those changes which often results in a performance cost - scripts is fetched, parsed and executed and if done in a blocking way that incurs a penalty in FP/FCP
If we’re not blocking, the result is a flash on pre-modification document
It’s a flexible solution that enables non-technical folks to run experiments and is very popular
Even for teams that do server-side A/B testing, client-side enables them to offload some cosmetic efforts
Discussed last year. Basically we want the outcomes of A/B testing without the performance penalties
Had ideas around standardizing a transformation language, using edge as the insertion point or to block just rendering in the browser without parser blocking
Took some of those ideas and explored how that would function, resulting in a spec sketch and a prototype. Hoping to get feedback on that.
Conceptually, A/B testing is composed of a control document and a set of transformations, expected to be idempotent, so if you apply them multiple times, there are no side effects. E.g. if you have a list of items and the user goes back to a page, the items won’t be duplicated.
We don’t define in the spec how these transformations are idempotent, only that they should be
We broadly classify the transformation to be of 2 types: pre_ua and on_ua
The pre_ua are equivalent to a server-side transforms
On_ua ones are ones that require client-side JS - e.g. for SPAs
Want to apply both without performance degradations
What would a transform look like?
Examples include color changes, title changes, different text, coloring the first item in a list, etc
Sequence diagram of a prototype through a CloudFlare worker
For the prototype I’m using a git based mechanism, not a real A/B provider
Code + on_ua transforms are injected to the head of the document
The prototype is using a mutation observer that listens to all the DOM changes, and that’s the only blocking work
The rest of the work happens async, as the browser parses the document and when JS is making changes to the DOM. Each matching element gets processed through the transform functions.
Transform functions are current generic JS, want to open that for debate
Demo is using Cloudflare workers with Low Latency HTML parser and using a GH gist as the JSON source.
Used React MVC for the demo.
Title is created in JS along with the rest of the app. Want to change it as it’s created.
Control json
Includes the 2 variants
The transformation itself is JS
Could create issues because the language for on_ua and pre_ua is not the same, and edge may not run arbitrary JS
We have an ongoing test, so that as the list gets added more items, the styles get re-applied to what’s now the first - done for demo purposes
*demos that page is switching between variant A and B*
No flash of pre-variant content, and variations are applied on an ongoing basis as items are added to the list
From a perf characteristics, there’s no visible cost to running the experiment
In repeated tests, it doesn’t show significant perf costs
Key lessons - The edge/CDN based approach enables parallel A/B configuration fetch
Having the transformations before document creation is key to performance - head injection
CDNs may not be excited about arbitrary JS
Where do we go from here?
Could standardize the transforms and move away from JS, even though that’s different from what A/B testers are doing today.

Could use mutation records
They would be more verbose - could avoid some problems by moving the implementation of the “applicator” into the browser

Could move the “edge” part to the origin - doesn’t have to be a CDN, which may an architectural change
If we move this to the browser, there would be a cost, because the browser can only fetch the config after seeing the document, rather than in parallel
Spec in progress - “blocking=render” implementation, that could enable us to do the same thing without the edge, would block rendering without blocking parsing

Could be a significant benefit for current A/B testing

Interested in opinions on how we can take it from here
Michal: Small question. In the TODO MVC example, removing item 1 made item 2 become red only after a while, because MutationObservers take time. Is that inherent to the prototype or is this how client side A/B testing works?
Alex: I haven’t seen the flash. The mutations are deferred to the next animation frame, so maybe there’s a one frame lag. Should look at that in detail. In my testing I haven’t seen such flashes
Michal: There’s how long it takes for mutation observers to be applied
Alex: Hoping to find out how complex transformations can be
Pat: When prototyping were there browser capabilities you found were missing? Are mutationObservers good enough? Needed to polyfill something?
Alex: Capability to install a SW as part of the first request could have avoided the need for the Edge here. But there are some challenges with that
… For mutation observers specifically, that part is working great, but need to test that with complex pages
Pat: Didn’t run into any transforms that you couldn’t do?
Alex: No because I allowed transformations to be arbitrary JS. If we move away from JS, then the capabilities may be limited, and then we could run into problems when doing complex things.
Nitish: Represent A/B testing provider. Can tell that using MutationObservers on a complex page, we’ve seen deadlocks created - the application trying to create components and the scripts tries to create changes, where the 2 parties try both to apply changes. Seen this as a practical limitation. Would definitely need a way to avoid conflicts when 2 parties try to apply changes to the same element.
Alex: Is that because we’re applying changes sync? One of the things I’ve done is to defer all the required changes to the next paint cycle. But I can see the problem you mention if the application depends on DOM state
Nitish: Seen changes get delayed, even if they are applied in the next cycle. There’s a chance of users seeing portions of the control page.
Alex: We could use a different construct in the browser that helps to give priority to these mutations.
Nitish: When applying mutationObserver on the body, the number of mutations is huge and they happen very frequently. If every mutation would apply a DOM change, that could be very expensive in terms of computation
Alex: The performance will depend on the transform code, and selector performance. Prototype defers transforms until the next paint cycle, to reduce computation and acting on each MutationRecord.
Pat: That’s what I was getting to - need a mutation observer with the selectors built in. That lets React to let you hook into the VDOM rather than the real DOM
Alex: Standardization could help on that front. Having the transformations applied by the browser
Nitish: If the operations are applied again and again, their results should be the same. If I add items to a list, the end output needs to remain the same regardless of how many times the function was applied
Alex: Yeah, that’s one of the requirements. The spec is expecting that you check the conditions to make these transforms idempotent. One can shoot themselves in the foot if, say, a navigation item was added multiple times as the user is navigating back and forth between pages in an SPA. Transform has to ensure that they are idempotent. in
Nitish: If you’re applying the changes to a DOM attribute. It might re-render itself entirely or partially.
Alex: Should have an example of that. Could check if we’re really applying the transformation. You can have conditional checks as part of the transformations. The expectation is that the A/B transform is created in a way that’s idempotent.
That’s harder in the version where we have dedicated transformation language
Hadrien: PM for Google Optimize. Excited about this. Probably our biggest problem
… would it be helpful if we sent you examples of pages that customers complain about performance? What would help stress-test this? Especially over mobile
Alex: That’d be awesome. I chose to test this over 3G/Mumbai. But more complex pages would be great
Michal: Is there a tool that creates these configs?
Hadrien: Google Optimize product has a wysiwyg editor. Sophisticated customers would write code inside the editor, but most don’t
Michal: Can we create an experiment where this transforms are the output
Hadrien: probably
Alex: Optimize currently transforms everything to JS, right?
… How much JS do we need? Would a finite operation set be sufficient?
Hadrien: Not sure what the limitations would be for a more restricted set. Can you walk through both cases?
Alex: on_ua you have a fully capable JS, but if you wanted to define it as a set of DOM operations, these operations limit your capability. E.g. click adds something to cart, which modified the JS state, not a DOM operation. It’s theoretically possible, but not sure if that’s something someone is doing
Hadrien: We would go for more capabilities. I imagine it comes at a cost. Customers do a wide range of things. Instinctively, arbitrary JS.
Pat: Wondering if we’re trying to standardize the entire package, or are there atomic pieces that we should standardize and not try to specify the client-side transformations themselves? i.e. A spec for edge HTML rewrites that all CDN’s (and servers) could implement that would allow for fetching of the experiment definition, group selection and HTML transforms (one of which would be embedding any client-side JS and transform definitions that need to be written but that should be provider-specific). Are there browser-side API’s that need to be improved to make the client-side transforms more efficient?
… How do we standardize the HTML rewriting on the Edge feels important.
Hadrien: Our customers are moving to server-side experiments which are much more costly
Neil: Work at the BBC. Going through a process of site-wide A/B testing. What Pat just said - we’re really struggling because we have to do everything on the server side, which explodes the cache.
… Mandating edge service - would be great for us to be able to do everything client side without any edge servers
… otherwise, can the A/B policy be cached client side?
… Struggle around making experiments sticky, where users switch groups. Ensuring that the A/B group is sticky would be great.
Nitish: Thoughts on standardization. Full JS power gives you a ton of flexibility - hybrid approach would be great, enabling custom JS for some things, but standard operations for most things.
Andrew: Thoughts - why would you need differences between pre_ua and on_ua? Any edge hop in between can perform transformations based on capabilities
Alex: A/B provider is best to make that judgement, some have to be applied on the client e.g. for an SPA. Some DOM elements are only created on the client.
Another example is swapping a stylesheet - needs to be done on the server before the client fetched the control stylesheet
An instance where we need both ON_UA and PRE_UA on the same transform is a server side rendered SPA, where markup will be present during PRE_UA, that needs to be transformed, and later ON_UA, where it would be potentially re-rendered. Thus transforms at both places are needed.
… But I like the direction we’re suggesting to take a hybrid approach. Have a defined spec that can be applied in a performant way by the Edge/UA, and also having the spec allowing JS
Andrew: Any API you want on the Edge that’d make this easier?
Alex: Asking an Edge impl from someone that doesn’t have one requiring arch change. Browser changes could enable avoiding edge mandats - moving the edge to the client
Nic: best forum?
Yoav: WICG repo, probably

Chat Log

You1:03 PM

https://forms.gle/QDUYYAQBrhh1q13A8

You1:05 PM

https://docs.google.com/presentation/d/1-cxHITwVtWJ5x3ev0__XzDtDtJn2cB9CAgN9Mkia3Ag/edit#slide=id.g11de5b0bf6b_0_304

Hadrien Raffalli1:30 PM

sorry mic problems

will rejoin

Hadrien Raffalli1:37 PM

https://support.google.com/ads-help/answer/7367525?hl=en

Just to give you an idea of how much A/B testers allow in term of code change size, max container limit on Google Optimize is 400kb

Hadrien Raffalli1:39 PM

Changes are not just changing css + text, customers might inject javascript for new kinds of interactivity

In rare cases, we grant customers container size increases

Ankit Jain1:52 PM

Hey, I work with VWO (an A/B Testing provider). For creating idempotent operations, and supporting visual changes, full JS would be required.

Ankit Jain1:58 PM

For Pre-UA, limiting it to specific operations is a good choice. For Post-UA, if we limit it to specific methods, I for-see a lot of custom JS being appended to the head to counter it.

Nitish Mittal2:00 PM

I also work with VWO, and can help in providing practical usecases which can be helpful

Hadrien Raffalli2:02 PM

Thanks for inviting me! This was/ is very exciting 👋

Neil Craig2:02 PM

Avoiding the necessity for CDN would be great,w e use a mixture of commercial CDN and in-house - for cost reasons

Ankit Jain2:02 PM

Tim Kadlec2:02 PM

Gotta drop, but super interesting stuff here. Thanks for presenting, Alex.