WebPerfWG call - Feb 2nd 2023

Participants

Amiya Gupta, Andy Davies, Annie Sullivan, Barry Pollard, Giacomo Zecchini, Noam Helfman, Patrick Meenan, Philip Tellis, Rafael Lebre, Sean Feng, Yoav Weiss, Nic Jansma, Abhishek Ghosh, Alex Jose, Michal Mocny, Dan Shappir, Lucas Pardue, Carine Bournez, Sia Karamalegos, Nicolás Peña Moreno, Jennifer S, Hao Liu, Alex Christensen + George (dog), Ian Clelland, Steven Bougon, Boris Schapira, Aoyuan Zuo, Mike Henniger,

Admin

Next Meeting: March 2nd, 2023 @ 11am EST / 8am PST
Charter extension

Likely to get a charter extension until the end of August
Asked to move charter-related work to Github (currently being worked on in GDoc)
We’ll explicitly state open privacy issues with PING in order to drive discussions

https://github.com/w3c/resource-timing/pull/343

Minutes

Rage clicks - Annie Sullivan

Recording

Annie: Talk about metric, not considering for Core Web Vitals
… Really interested in Chrome
… Wondering if we should standardize it, and in this group?
… Reported in many tools already
… Implementation of Rage Clicks in Chrome already
… Correlates with CLS and INP
… We think Rage Clicks are indicative of a poor user experience
… Open questions about how they should be defined:
… Chrome is 500ms max delay between clicks, other don’t have any delay?
… Same area? Same X/Y or same Element?
… Should it be counted through whole lifecycle or beginning?
… Is triple-click too mouse-centric?
… Was able to use RUM Archive to understand how definition affects what we see
… Chrome vs. RUM Archive for mPulse
… Rage Clicks on Android are different between the two
… mPulse tracks the element differently, timing is different, causing differences in what we see between implementations of Rage Clicks
… Chrome will be experimenting with some definition differences
… See if what we measure lines up more on mobile w/ desktop
… When talking to people, some people think Rage Clicks are a perfect measure of UX
… But I don’t think it will be
… “Worse” webpages can be things like cookie clickers
… Not a CWV candidate metric, but a useful signal
… Are Rage Clicks a good topic for WebPerf WG
… Standard? Position/timing?
Sia: What do we consider good topics for WebPerf WG?
Yoav: Topics where this group would have good feedback on
… Eventually standardize or decide not to
… Any web standard
… Decide to standardize this as part of WG, even if it’s not something that’s an API – higher-level metric that our APIs already enable, but standardize for RUM providers
… So they can make sense of the metric even if it’s reported by different vendors
Nic: quick comments. Glad that we’’re talking about this. mPulse does this, with a slightly different definition. Would love to see guidance or definition we can all align on. Otherwise it’s confusing to customers where they see discrepancy.
… Not sure it needs to be a standard but a good topic
… Maybe can provide a recommendation but not a spec
… FWIW, definition in mPulse hasn’t been recently worked on, so we haven’t evolved it but would be interested in a standardized, more reliable, etc
Dan: 2 comments. Excited about the metric in the context of hydration, rage clicks can be an important metric. Hydration blocks the main thread, or elides the interaction. So can cause rage clicks. Great to pursue the metric
… maybe useful to couple this with INP - multiple clicks before next paint would indicate no response to the first click. Indicate performance issue rather than bad design
Annie: Seeing big correlation between rage clicks and poor CLS
Yoav: Annie you meant CLS or INP?
Annie: CLS
Dan: Something I’d be happy to talk more about
Ian: +1 to previous discussions.
… Not sure we’d want to standardize
… But we could come up with best practices
… e.g. this is the best way to track Rage Clicks in 2023
… Useful resource for people to standardize around
Michal: Sean just asked if it correlates with INP?
… That’s the chart on the right, but INP has a unique issue where the more you spam the page, it can lead to longer INP
… To Nic’s point maybe this becomes a convention that we publish, advice instead of creating a spec.
… If it’s inefficient or not performant and we didn’t want to see websites do something
… i.e. EventTiming where we don’t report over X threshold
… Rage Clicks won’t be reported to EventTiming – we increment click counts on page, but not where they were
… If EventTiming entries reported how many events targeted the same element in the window
… Could make this easier to do
… I think there’s room personally
… Finally, in terms of CLS and metric definition. If I’m about to hit a button and it moves out of the way, it won’t be an event target. I then click elsewhere to the correct target.
… Could we know what the target would have been had we not done the paint right at that moment? Even though it was switched on them?
Ian: With event targets, there’s a pitfall with registering handlers everywhere
Michal: Events have original target and current target. Handler at top-level but have descendent target. But you can differentiate whole-page document bubbles.
Ian: Technically user did click on this tiny box, vs. the one next to it
Michal: Wondering if we can use LayoutShift data itself w/ its bounding box
… Maybe no existing Rage Click metric handles CLS
Philip: Two points. One is I have seen correlation between when Rage Clicks happen and long frames. We don’t look at it as a primary metric, but you’ve had long frames and had rage clicks so something to look at.
… Second is that we shouldn’t look at Rage Clicks directly as a signal. One signal, good/needs/poor but that may differ by user. Rather than using static thresholds, we look at all of these signals. JS Errors, CLS, etc all feeds into whether that experience was good or not.
Dan: Distinction between lumping several things under the category of Rage Clicks. Don’t consider clicking repeatedly on a button that does nothing, same as clicking but missed a button the first time. Second I wouldn’t be happy about it, but not Rage Clicking. Third scenario is rage clicking between clickable regions (UI/design issue and not a performance issue).
… Lumping a lot of things into a single category and that’s problematic
Annie: We don’t really understand why the user rage clicks, it’s just correlated with poor user experience
Noam: It seems like if we want to better understand the context of EventTiming entries, it seems like more and more cases are coming where we would like to have either a reference or a copy to the original events that trigger the ET entries. We passed the target, original information, timestamp – do we want to pass the whole event?
Yoav: Or Event ID?
Noam: Coming up in various cases
Alex: Jump into measuring what happens or how often page becomes completely unresponsive.
Yoav: I wonder for the case Alex mentioned, if page is frozen, we won’t have any web-exposed data. At best we can report that internally in browser process but not web.
Nic: Unless it’s a Reporting API thing.
Yoav: It could be either a browser issue, or an infinite loop in script issue. Triggers an unresponsive page.
Dan: You’re causing me to miss the Page Stopped issue from IE
Barry: Is Rage Clicks also a way to measure before JS is loaded, event handlers not registered. Hamburger menus, etc.
Michal: We still report EventTiming before they’re attached
… Browser has default actions on a bunch of things
Barry: So if JS takes a while, it might be reported. If it’s done loading, the next paint could happen and it would be missed.
Philip:
Dan: My big hope for INP is that it would resolve it. Scenario happens with hydration and pages that use a lot of JavAScript. Users try to interact with the page before it starts.
… Page responded really quickly to even that does nothing

Event Timing updates - Michal Mocny

Recording

Michal: Bunch of little EventTiming updates
… Issues filed that aren’t getting attention so I’ll do a rapid-fire walkthrough
… Lots of topics revolve around interactions
… EventTiming reports on timing of individual events
… Interactions trigger variable number of events, different timestamps and render times, different animation frames
… Isn’t always a simple mapping of events to interactions, it’s context dependent
… Wanted to represent ways users interact with page to have N interactions based on their human events
… As we looked, we realize there’s a lot of overlapping events in time
… Originating input, all of them update the DOM and appear in the next paint
… If you care about counting interactions, hand-pick some important ones and label them
… interactionID in Chromium
… Sufficient to highlight those important events
… “ to measure INP
… But when you want to dig deeper, not always sufficient
… Every event or task on main thread plays a role in responsiveness
… Multiple rage-clicks on a page can affect interactiveness
… Depend on target of event and state on page
… What is part of an interaction
… e.g. on Mobile you can’t hover over elements, the moment you click on an element we start with a hover event. Not a discrete interaction but always fire first before click event
… Some sites add animations and it delays the interaction
… Not part of the interaction strictly by our definition btu the source of many performance issues
… e..g this pointer down was part of an interaction but it played an important role
… We considered exposing more events with an interactionID
… But there are a lot of and they can differ, e.g. multiple interactions in flight and it’s not clear what caused what
… Treat everything as part of an interaction? But there can be overlaps
… Three different pages interacted with, shows a list of events that fire
[chart]
… Another example where one is an interaction and the other isn’t (but they look the same
… https://github.com/w3c/event-timing/issues/124
… Only report durationThreshold of 104 by default
… Problem is default buffer from Page Load will always be 104ms and you can’t overwrite
… I’ll drop a snippet on a page that has already been around, and I’ll expect to see events but get none, but you can’t apply threshold retroactively
… One option is to decrease default threshold, or to exempt interactions
… Another issue is performance.interactionCount
… Simplified, recently added to Chromium.
… Useful for INP, i.e. ignore 1 outlier for every 50 interactions
… Another topic is FID vs. Interactions
… EventTiming exposes both “first-input” and “event”
… Former is always shown, regardless of duration threshold
… FID criteria is different than what the definition of an interaction is
… Situations where FID is not reported, vice-versa
… Can we redefine FID to be delay of first interaction
… https://github.com/w3c/event-timing/issues/131
… Events and RenderTime
… Our Paint APIs expose renderTime property
… Meant to be most appropriate of animation time
… Entries expose startTime, and renderTiming if passing TAO
… EventTiming doesn’t epose renderTime but has something similar, renderTime=startTime+duration
… However duration is rounded to 8ms, so you can get renderTime < processingEnd
… Events are hard to group
… +/- 8ms for 60 Hz, 120Hz worse
[chart comparing PerformanceObserver vs. Chrome Tracing]
… But if you measure via PerfTimeline the end of events stagger
… Proposal to expose renderTime to match paint timings, rounding to nearest 8ms is fine
… https://github.com/w3c/event-timing/issues/123
… Issues related to “What is the correct timestamp”?
… Some times where no next render/paint, we fallback to measuring a lesser time, but spec isn’t clear on this
… Sometimes there are alternative endpoints, e.g. visibility change for tab switch
… There was a next paint requested, it doesn’t get there
… Arbitrary long time from what it was in the background
… Also Page Unload, might not get next paint
… https://github.com/w3c/event-timing/issues/130
… Requests for adding more event types
… focus, forms, discrete events triggered by interactions
… Other types of discrete events but could be considered new interactions
… e.g. hitting back button in browser
… drag-and-drop API
Noam: Regarding the last issue around application events, I’m not sure we requested add focus, just whether understanding if focus should be an event that was tracked.
… Same regard, mouse ‘click’ event shouldn’t be tracked, it’s a logical event from others
… Either be more inclusive to be all application logic events, focus, reset, hover
… Or just reduce it to just user events (and not engineering events)
… User can invoke some interaction w/ some hardware and expect some response
Michal: On ‘click’, it might not always be dispatched with pointerup, if you have touch event listeners (deprecated), you could preventDefault and it won’t be dispatched, so we have to wait to run touchend to check if pointerup was called, before we can dispatch ‘click’
… Often means you’ll miss a frame, different EventTiming w/ different start time etc
… Commonly used
… Amount of time spent on main thread in processing event
… That’s uniquely important
… Hover are commonly causes of interaction issues on many sites that I trace locally
… Slippery slope, what constitutes an interaction?
… Do you measure all LongTasks that aren’t event based, and smush into a sandwich that relates to an interactions
Noam: Keypress, Keydown, Keyup is a similar story
Michal: If you actually go through many traces on live sites, it’s not uncommon for hte types of events we chose to represent interactions will overlap time, but the reason isn’t directly attributable to them
… Hard to know what’s important to focus on
Yoav: Regarding duration vs. renderTime
… Motivation for 8ms rounding was security reason
… Yes we’re exposing for PaintTimings, but there are significantly more events than PaintTiming entries
… Security was concerned around exposing for every paint
… If we do rounding from renderTime, all events could be coalesced, vs today where they have similar start times but durations are all different
Michal: We can still round but same rounding so they’re consistent
Yoav: Consistency, otherwise people wonder why startTime+duration!=renderTime
… Overall feels reasonable to me
Noam: I concur with Yoav
… To address this issue we do heuristics and do our own rounding so we can correlate events across
… If it was a part of the API and spec it would simplify a lot
Michal: …
Noam: Discrepancy between rounded number and duration could be confusing
Yoav: Round render time and align duration to be correct calculation
Michal: To Dan’s question, if a page is loading and it looks visually complete, you may choose to interact with it but you’re not running JS yet. I think that problem exists in PHP pages that use jQuery to decorate and not just a problem w.r.t. Hydration
… INP on its own is just a performance metric, saying there’s no chance there’s any feedback because it was impossible to get stuff on screen
… Doesn’t say this button was useful or what did user expected
… One example is event handlers didn’t load yet, or code or backend was broken
… Very early interactions w/ repeat clicks could answer those types of user quality questions, different tools for different jobs
… INP can’t answer some of those questions
Dan: Do think it’s related to hydration and SPAs, it’s a situation where order of magnitude where things with MPAs and jQuery: we’re causing significantly more quantity of JavaScript, with frames and SPAs, unless you use progressive enhancement, you won’t get any interactions defined in case the JS isn’t run yet.
… Realistically, it’s dramatically more significant in modern frameworks
… Problematic where if the page is slower, and JS is longer to download, we may seem to get better metric results than we would otherwise. One of the reasons I consider FID useless, it’s because of this
… In some cases people click on something before hydration starts
Michal: I don’t have hard numbers, but in traces I find that JS begins to load quickly and modern frameworks use tasks schedulers that yield often, and don't capture end-to-end delays.
… Recent perf.now() talk about this problem
… Worked around the FID problem but fallback to a responsiveness issue
… More modern frameworks are yielding rendering
… Take event, queue in dispatch, INP also doesn’t capture that
… By the numbers, is the majority?
… Haven’t studied that
... Not sure if Rage Clicks would address directly
Philip: “Dead clicks” where if you click on something and nothing happens