WebPerfWG call - June 4th 2020

Participants

Nic Jansma, Yoav Weiss, Benjamin De Kosnik, Gilles Dubuc, Peter Perl. Andy Davies, Michal Mocny, Nicolás Peña, Alex Christensen, Philip Walton, Annie Sullivan, Steven Bougon, Michelle Vu, Tom Maccabe, Ilya Grigorik, plh

Next Meeting

June 18th @ 10am PST / 1pm EST

Presentations

Web Vitals

Ilya: What is Web Vitals?

Announced a few weeks back. A program with multiple components:
We’ve been evangelizing web performance, but there are too many tools and metrics.
Google is committing to make the tools more coherent, focusing on a small set of metrics
Providing more predictable cadence, so the ecosystem can know when key metrics are likely to change - Core Web Vitals
Roughly annual update cycle
For 2020, we focused on 3 metrics: LCP, FID and CLS
Also providing guidance on good, need improvement and poor experiences / thresholds
We wrote down how we arrived at these thresholds, based on research and feasibility - couldn’t start out too aggressive, but that might change in the future
Created a lot of content to educate folks: web.dev/webvitals
Last week, Search announced that they’re evaluating page experience as part of ranking
That’s not new in itself, but unifying CWV as the metrics for that is
Strong push to unify all the tools: PSI, CRUX, SC, Dev Tools, Lighthouse & Web Vitals extension
Philip open-sources web-vitals.js to collect those metrics, which includes all the best practices for collection

Encourage everyone to look into the implementation

What about WebVitals 2021?
SPA are a common theme where current metrics lack
FID is great, but we care about responsiveness in general and all input
CLS actually captures complete coverage, but creates other challenges: long lived pages are penalized. Maybe we can normalize it?
Frame Timing, Event Timing, normalization are potential answers
Would be great to get consensus on how to present metrics in different performance tools
SPA are lacking a standard API for page transitions, which makes it hard for e.g. CRUX to collect that data. Some telemetry may look worse because of an SPA and we don't want to dis-incentivize from our metrics.
Priorities
Questions?
Steven: For us SPAs are the most important one. Seems like you ditched TTI for Input Delay. Why?
Annie: TTI can be used in both lab and field. In the field, TTI can take a long time to complete, the user may leave the page before it completes - abort bias
... FID gives us more direct measure of UX
Steven: But FID depends on user behavior, how can we detect regressions in the lab?
Annie: Compared approached to a lab metric - Total Blocking Time was highly correlated
... I think Wikipedia did something similar
Michal: TBT is defined in terms of TTI, so it gives you a measurement of how busy the page was before it stopped loading. More elasitc
Annie: We have studies on that, I can share in the chat
... TBT is bounded by TTI, but is much more elastic

Interactions, normalization and SPA Navs

Michal: We want to move from tracking input delay to responsiveness in general
Not just first input, but all input
Event Timing gives us input responsiveness, but cannot handle async work that results from the event
Resource fetches, API calls, queued tasks
Event Timing shipping in Chrome 85
Also exploring heuristics for async tasks

Resource fetches, XHR, scheduled tasks
Can paints be attributed to tasks?

Normalization
As we cover more of the page lifecycle, we don’t want the most interacted with pages to be judged to be worse
Options: sum of all durations, average, max?
CLS already has some of these problems
Long lived sessions with repeated shifts may have large scores
Maybe separate CLS scores for each section?
Adjusted vs, non-adjusted - similar to long tasks, maybe we can define a budget for each interaction and count things that surpass budget
Per page load/ per session
Important to consider SPA routes, and match metrics to specific routes
Goal is to have accurate heuristics
Need real world data
Want to start with annotation by frameworks
Can leverage user timing without changing the spec/implementations using a convention
Use "detail" field to add a navigation attribute
Considering passing a Promise to signal the end of a navigation, to measure things during the transition

Not yet proposals, but considering

How do we add this to frameworks?
Usually easy to track start, but automatic end conditions are hard
An approach similar to TTI can help us to know the page is idle
Questions?
Steven: if we add an API for SPA navigation that resets all the metrics, would that cover most of the use cases? Routing frameworks would call it
Michal: Worried about how we’re introducing it. With SPAs there are partial loads, so it’s harder to naively reset paint timing.
... If we rely on developer hints on resetting time, it can introduce bias
... We want to capture user experience. So want the hints to inform the metrics
Nic: You were talking about offering a way to measure, 3P analytics want to know when things are starting. So having a way to hint at the beginning can be great for 3P analytics
... We’re looking at the network. Can also be interesting to look at visual input. Akamai’s approach doesn’t always get it right in complex pages
... Not sure if the browser can know more, or if the developer needs to annotate
... Would be great to do smarter things automatically
Michal: If we have good metrics that give us a best guess
Reporting at the end has issues with abandonment
Reporting at the beginning and updating at the end will have issues with the UserTiming API as it is now
Benjamin: Should implementers and developers think of this as a separate and distinct from page navigation?
Second question is for implementors to do the same thing?
Michal: We’re not currently thinking of resetting metrics
... But important to be able to aggregate normally.
... Because SPAs have page transition it’s not trivial to reset metrics
Benjamin: Previously undecided in the group, but I tend to agree

Measuring smoothness on the web

Michal: *presenting*
Goal: track smoothness of scrolling and animations
Number of dropped frames, variation between frames, stale frames
Old Frame Timing API was more about the event loop and frames that exceeded time budgets. This is a higher level API
Two different cases among dropped frames that we need to address
Same number of dropped frames, but different user experiences
Sometimes, dropped frames are not an issue, because there’s nothing to show
API proposal
Entries contain refresh rate, attribution data and a list of frames.
Links to detailed proposal
Example usage:
Potential extensions to that:

Support custom recordings
Add attribution to make it more actionable

Questions?
Nic: I like attribution out the gate
Michal: It’s still early but want folks to look at it
Gilles: For previous APIs, the buffered flag were in the 100s. Here we’d only have a few seconds
Michal: You don’t get one entry per frame, but an entry per animation
... How would we report things with very long lifelines? Maybe entry every few seconds?
... But shouldn’t have too many entries
Ilya: How would scrolling be manifested in the timeline?
Michal: At the end of the scroll you’d get an entry with a subtype of “scroll”, and more details per frame in the list of frames
Unlike the previous proposal, things like scroll can be handled on the compositor, so scrolls can run even when the main frame is blocked
A whole bunch of animations having at a different pace, like rAF vs scroll, etc
Ilya: Would we want to emit these for every animation? Or only for bad ones?
Michal: The proposal is for everything, but it’s an interesting question
... Do we want to confirm everything is going well? Important to track surprises?
Ilya: One consideration, What’s the overhead of measuring this? For LT, we only surface the bad stuff, so hard to understand the ratio of good to bad tasks.
Michal: No intuition yet on the overhead
Nicolás: The overhead is not just with graphics, as you’d be accumulating JS objects. Maybe we can use counts, like we do in EventTiming, and avoid emitting the good animations
Ilya: Ben and Alex - thoughts?
Alex: Already expressed my concerns about async scrolling and overhead
Ilya: We can get more data on that and come back
Ilya: What’s the plan for Frame Timing? Resurrect that? New proposal?
Michal: Decided to file the issues in the original repo, and this is related enough. OTOH, it’s rather different. Should we keep it in the same repo or move it elsewhere?
Ilya: Would we still call it Frame Timing?
Michal: Good question
... Frame Timing for scrolls and animations
Ilya: If we are using Frame Timing, just making sure we point people at the latest proposal and not getting buried in the old discussions
Yoav: Because the word frame is so overloaded (iframes, etc) and since this focuses on animation, maybe its worthwhile to rename to AnimationTiming (as a means to measure smoothness)
Alex: Not sure if that’ll allow you to see across origins
Michal: We’ll want to cover security implications more deeply
Alex: Can also indicate things about the user’s system, e.g. compiling in the background