WebPerfWG call - September 16th 2021

Participants

Yoav Weiss, Nic Jansma, Alex Christensen, Andrew Galloni, Andy Davies, Behdad Bakhshinategh, Benjamin De Kosnik, Cliff Crocker, Corentin Pescheloche, Giacomo Zecchini, Michal Mocny, Michelle Vu, Noah Lemen, Sean Feng, Andrew Comminos, Lan Wei, Annie Sullivan, Carine, Steven Bougon, Patrick Menan, Jansen Tolle, CP Clermont

Next Meeting

September 30, 2021 @ normal time

Topics

Diagnostics for Animation Smoothness - Michal Mocny

Michal: Discuss efforts on diagnosing animation smoothness
... Link to previous presentation on motivation and examples
... Explain how it works in Chromium
... Animation smoothness goal is visually complete animation frames, during animations, in a timely manner
... "No jank"
... We measure "missed opportunities" to show expected animation updates
... Very common feedback from developers:

... Most of these questions are not possible to answer with web platform APIs today
... And can be difficult to diagnose even with local tools
... [links to where to get more details in deck]
... View a single frame takes:

... GPU process get signal and decides whether it's worth attempting. We request renderer to begin frame
... Compositor gets signal and updates data structures. Compositor could decide to not start main thread
... In parallel Compositor and MainThread starting to update a frame
... Lot of work goes into painting, skipping a lot of details here
... Eventually renderer gets signal Main Thread is ready to commit
... We raster a bunch of pixels that's pushed to GPU
... Swapt to GPU
... If we get everything before Vsync timeline, that's when a frame is presented
... Lots of reasons we may not get a frame out in time. One example is main thread updates takes too long

... Still produce a new frame update but it may not have all render changes
... Eventually that frame will complete, and whatever Compositor is available we'll pick it up
... Often there's a steady stream but it's just not complete
... What if Main doesn't have any visual updates
... All past, partial frames are complete (in hindsight)
... What if we get a new, but slightly delayed frame update, but for every frame?
... What does that update look like to the user. If it's driven by user input, it may feel sticky (e.g. scroll not smooth). But for animation it may look buttery smooth.
... Tools

... Sample rendering trace

... Pipeline here with messages across boundaries, can be daunting to follow
... Effort to simplify diagnostics in Chromium, "squashing the layers"

... PipelineReporter events describe the path of a single frame end to end
... Duration for an stage within that pipeline, so you can draw conclusions where frames end up on vsync boundaries
... We have to wait for the future to report about the past. Easier to look at traces recorded to make it understandable, but at the time they're recorded you can't know in real-time
... Defining animations:

... Lot of data for each frame:

... We can look at animations and the frames they produce

... TouchScroll animation and how many frames were expected vs. produced
... Took 4,097ms but at 60FPS you'd expect ~ 240 frames
... For this scroll, might not be producing a steady state of movement the entire time
... Dropped frames

... an imperfect frame, when it mattered to the user
... Lots of nuances
... Type of interaction, idle periods
... Could have perfect animation but poor quality (e.g. video poor network quality)
... Another example at Edge, attempt to make scrolling buttery smooth, it wouldn't be as sticky. There's a tradeoff.
... Some icons are barely important icons size-wise
... It's hard to know which animations are important to the user
... From these, dropped frames isn't a boolean value -- it's fractional, or a probability that it mattered to the user
... We know there were missed updates, but what is the likelihood there was noticeable jank to the user
... I doubt there's a single correct answer. If goal is pixel perfection, you want to know every detail on dropped frames. If your goal is to identify user pain from real-user data, that's different data to gather
... Quick recap:

... Why FrameTiming v1 doesn't cut it

... No coverage for compositor updates
... No notion of active animations at all
... Seeking an alternative to Frame Timing v1
... Screenshot of how this looks on real-life sites

... always frames in flight
... PipelineReported to simplify these things but it's still very complicated
Some of the post-processing I've worked on

... Large red area is missing updates from main thread animations
... Yellow is similar but it ignores ???
... Blue is when no animations
... Green is checkerboarding
... Later on, scrolling is smooth and that's what the user is doing
... Plenty of imperfection, but the page experience felt pretty good to users
... On the other hand, there's a lot of issues that felt worse to users:

... Big spike at end is Android notification tray being swiped down, user wasn't paying attention to screen update
... We try to combine these signals with different weights and area. Merged timeline. Has to be opinionated.
... We can try to convert that timeline into a single score, has to be a bit opinionated, trying a couple thing
... In Chrome Canary you can see a Performance HUD

... CWV scores plus information about dropped frames
... All of this continues to evolve. You can play with early revisions to see how experience matches to expectations
... Early entries into Chrome UKM field data

Some details into Dev Tools now

... Interpretation of these depends on your goals:

... If your goal is field data, we have a lot left to learn
... Last time I presented Ryosuke asked a question of why we're focused on dropped frames, especially in a world of variable refresh rates
... Very good question and we've been talking about it into a percentage of dropped frames, because idle time we consider good (smooth)
... Percent dropped is kind of like FPS, but even within games there's a trend towards more accurate measurements. Like frame-to-frame delays
... If your goal is to require frames take no longer than 10ms, you could take this data and interpret it for your goals
... Seeking feedback

Ben: context on percentile metrics?

...not in chrome, instead uses specific definition of dropped frames evolved from that
...max dropped frame is “worst window”, 95th percentile avoids extrema
…similar to CLS (cumulative layout shift)
Andy Davies: In RUM world, we would want less detail rather than more. Although we'll want to know whether the user was trying to interact with the page at the point
Michal: We've thought about user-input in two ways, user animations are obvious. Immediately after an interaction the user may be paying attention more. Should we weigh those more heavily?
Andrew Comminos: We care a lot about direct animations as a result of user interactions. Responsiveness metrics based on Long Tasks, Element Timing. Little insight into Compositor side.
CP Clermont: Are we going to have a way to figure out what caused or labelling sources of janks. I tracked FPS after clicks, so I could track this click caused this jank. Is there something similar here?
Michal: Probably too early to know what we could do as far as attribution. I've showcased you can slice on the frame (for what actually showed) but you can also slice on the animation over time. One way to to say "this" animation is particularly janky. So you might know which animation and which times. Options here. In terms of what lead to skipping of frames that's tough.

Priority Hints - Patrick Meenan

Patrick: Two/three years ago since Priority Hints were active in this group. Looking forward to resurrecting that work and pushing it forwards in standards.
... Ability to tag elements, resources, link rel=prefetch, images, fetch API with importance high/low/auto.
... For browser to "maybe" do something specific with that
... In Chrome 96 we're hoping to bring it back as experiment with Origin Trial
... In Chrome there are 5 priority levels, images are at the bottom of that scale until they're visible in layout.
... Ability for you to tag LCP is important to load this one during early phase of page loading
... Ability for scripts to boost priority
... Current implementation extends it to virtually all content types and Fetch API
... Keyboard input, autocomplete can be high priority calls
... Wanted to see if there were any concerns
... In Chrome, high/low will end up in same buckets
... Exception being render-blocking CSS the most you can lower it is one notch
... Lets you straddle the early vs late part of the waterfall
... Gives you fine-grained control over order
... Priorities go out the window over multiple origins
... Chrome has concurrent request limits (e.g. 10+ low pri) will throttle
... For the most part prioritization mostly matters same-origin
... For Origin Trial, tagging LCP images to boost priority and reduce link rel=preload hacks for async scripts
... Is there interest for developers?
... Are current priority hints of high/low enough?
... Will see followups thru spec process to get it standardizes
... Want to make sure we're going in right direction after pause of work
Yoav: Interest in feedback from tools folks and RUM providers
... Would we be able to recommend this to customers and users
... Or are there missing pieces outside of priority hints that would be helpful
Cliff Crocker: Interesting for synthetic for advice we can give users about how they can improve. Can be more challenging for RUM maybe. Show up as lighthouse audit for improving LCP. Useful for our customers
Patrick: In RUM, you should get LCP element information, and RT could let you figure out if it's delayed.
Cliff: No-brainer for synthetic, more challenging for RUM
Andy: Giving developers another tool other than preload is the important thing. We all see preload overused. Preload is causing more problems than it's solving.
Patrick: ???
Boris: Preload mis-use is causing issues, priority hints mis-use could cause the same
Patrick: Similar to lazy-load. Hopefully hints in two directions (important or not) can limit amount of damage that can be done. If you flag every image as important, nothing is important. Hopefully we have enough gates around TTI, LCP, FCP that that kind of mistake will get caught very quickly.
... Does provide a little bit of a footgun
... Some protections in Chrome at least if you de-prioritize parser-blocker scripts, as soon as it gets blocked, it will increase the priority itself to continue parsing
... Everyone has broken their prioritization support in H2, so hopefully with H3 that'll be better
Yoav: In terms of other browser folks, I suspect that Chromium has more client-side knobs to play around with priority than other browsers. But I believe all browsers support on-the-wire priorities?
Benjamin: I'm a little skeptical this won't be a bigger gun to cause more problems. If we could show this is fixing some of the problems of overuse of preload I'd be more interested. Right now I feel like this is giving 1-5 more ways to create confusion.
... Anything that's focused on reducing damage would be great
Patrick: Webkit and Gecko have priority schemes to some extent. Once it gets to the wire on H2/H3, very similar to Chrome. Hopefully useful in Firefox as well.
... Particularly usefulness in Fetch for use-responsive API calls vs backgrounds activity in JS
Alex Christensen: I think if we expose only high/low priority, we are quickly going to have people saying they want a medium priority, or a medium-high priority, etc.
... The biggest issues we've seen with priorities relates to low priority requests not actually being sent
Patrick: From Chrome's perspective the only reason to have two directions to go in, do you load while in render-blocking mode or not. Otherwise in discovery order. You have some control, or put in markup in order you want things loading. With more layers you add more complexity than you can trust developers to do with.
... With H3, it has levels and/or exclusive requests. Do we need a mechanism at markup level or can we leave at protocol level to figure out.
Yoav: For images, for some formats but not others. Progressive you may want one mode, for hero images you want them fully downloaded versus half-downloaded each.
Patrick: I'm not sure either developers or browsers can make an intelligent decision about that
Michelle: What was the advantage of using priority high async scripts instead of preloading the scripts again? or is the advantage mostly to encourage developers to distinguish high vs. low priority
Patrick: I'm hoping the semantics for preload hints is right way to hint priority, and priority would be for things the parser didn't figure out otherwise. Usually scripts get high-priority, unless they're async. Preload as=script implies to Chrome it's a high-priority script. Feels like one of those things that it works, but is Chrome specific and isn't explicitly intentional markup.