WebPerfWG call - May 12 2022

Participants

Nic Jansma, Yoav Weiss, Boris Schapira, Mike Henniger, Michal Mocny, Boris Schapira, Amiya Gupta, Jeremy Rose, Noam Helfman, Alex Christensen, Giacomo Zecchini, Pat Meenan, Andrew Galoni, Carine Bournez, Alex N Jose, Ian Clelland

Admin

Next meeting - skipping due to holiday conflict
Back on June 9th, later slot
Nic: Sent a survey around meeting times, on the alternating slots
... Please respond and we’ll present the aggregate feedback
... https://forms.gle/U6LgVSZhJZEEPSwBA
Nic: For TPAC - planning to do a hybrid meetup
… Chairs will be there
… Thinking of doing 9am-12pm pacific. 3 hours slots Mon-Thu
… Same as what we’ve done in the last few years
… Let us know of feedback, we’ll do our best to support remote
… Try to get someone to champion for the remotes in the room
Amiya: Do you have a venue that would accommodate all that
Nic: The conf is at the Sheraton. TPAC org will provide screens, camera and microphones
… We’ll make sure it works
Amiya: The MS office is 5 blocks from there, if needed.
Nic: We’re trying to get more details, but good offer. Thanks!
Michal: Are all tracks in parallel again? Or will they split it?
Nic: Not sure. They sent out a Survey and letting groups decide when they want to meet.
… With everyone in the same time zone there may be conflicts
… If there are any conflicts with other groups, let us know so we can try to shuffle things around
… Same for combined session, if someone thinks it’d be helpful
Yoav: Otherwise, published LCP and Event Timing as FPWD
... Also see announcements on the agenda

Misc

Michal: Blog post on a responsiveness metric INP
… If you haven’t had a chance: https://web.dev/inp
… Recently added to page speed insights
... WebPageTest added support.
… Wanted to make sure it on folks’ radars
… Also, BlinkOn is next week and anyone can sign up
… We’d have slots on those discussions
Nic: Also, report back to the group about those discussions
Michal: Would be recorded but we’ll summarize

Minutes

https://github.com/w3c/user-timing/issues/88

Nic: Filed a month ago. Uses user timing to track application performance, but sees a blind spot in how layout performance is captured
… Have a workaround to try and capture some of it. Good discussion on what we could do there
Michal: I assume the reporter is not here, so try to summarize. I have had almost the exact request. On real sites it’s less often that input delay is causing responsiveness issues.
… Time spent inside the event handlers is easy to tackle
… Biggest mystery is you throw work at the DOM, cause huge layout update and this is magic - there’s no insight into that
… There’s some way to fix the snippet, but wanted to know - is there an interaction that takes a lot of time due to layout. It’s not user timing, but maybe event timing and paint timing
… We can try to address this directly - expose those times around event timing and JS profiling.
… So Is this broadly related to rAF? Is it specific to user timing?
NoamH: An observation - if we adopt that proposal the way it’s proposed, it’d be possible to implement event timing in JS. Would give us the option from the interaction to figure out the next frame
… Adding some of the information inside events might be helpful. There were proposals around that as well.
Michal: Linking the event handler to the event timing entry is interesting
… The way we expose duration is pixels on screen, not the time to rendering
… Time to layout is a different period of rendering work, then there’s compositor thread and gpu work
Exposing this time can help with interop as well
… We’d be able to be explicit about what’s exposed. Could be a secondary benefit
Jeremy: This is specific to layout timing, but maybe there’s also future work that can pinpoint specific slow selectors, etc that caused the slowness.
Michal: Not sure how feasible, but can be useful
... Not clear how feasible that is
… Layout can happen at any time. It can happen in an idle task, later on, etc
NoamH: We could split the layout thrashing from regular layout. Our measurement show that input delay is not the main contributor, like you said.
Michal: it isn’t all, but when input delay is the problem, it’s easy. That last part is difficult to tackle and requires re-architecture.
… Need to ask the browser for less
NoamH: More information about the root cause is indeed important. Was investigating today clear layout thrashing, but still it was a very hard problem to determine the cause. A real problem.
Yoav: I don't think anyone disagrees about the importance of this info, and benefits that would be in providing that information to developers.
... For the general case, it would be hard for us to link DOM operations that happen in various tasks that were up to the point layout was happening to the layout that happens after that.
... Would essentially need to tie information to various elements being touched and would need to read it back. Not sure how one would go about that. Would need to talk to someone who knows about layout more.
... Maybe there's a subset of layout thrashing and sync layout in the middle of a task that's an easier problem. Detect while it's happening, we know which task.
... Find cause of which script injected it, which function triggered the thrashing.
... Are there two different problems here, one easy and one hard? Or do we need to solve the whole thing to get value?
Michal: As a followup to that question, I wonder what needs to be exposed to RUM vs. what's sufficient to expose to lab/tracing
... Developer needs to see wide signals for where to focus, then follow up locally to dive into specifics
NoamH: What’s the use case we’re solving? Monitoring vs. investigation tools?
… Using these APIs for both. Not a separate problem
Michal: A tangible thing we’re seeing with Event Timing. You get the time inside handlers and can get a point where all handlers finished. If there’s a diff to the next paint, it could be more task, rendering task that wasn't scheduled, layout thrashing in other tasks, etc.
… You have no idea which is it.
… Marking which stage would already be valuable
Noiamh: It would be valuable. We patched all the scheduling options and crossed that with event timing timestamps. For us it’s mainly the sync processing of event handlers. But there could be other cases. No solutions for the rendering phases, but assumed everything not JS is rendering
… If we had that info it’d have been much easier
Nic: What would be the next good step? Discussing with layout experts to understand what can be exposed? Try to solve the easy problem first?
Michal: For us there’s an appetite. Security and privacy review will need to approve. There will be concerns around exposing detailed paint timing
… Curious about the original requester. Are they running it all inside rAF loops or working with user interactions would be enough for them?
Nic: Tying it to user interactions can lead to the easy problem.
… We can summarize this discussion on the issue and continue the discussion there

https://github.com/w3c/largest-contentful-paint/issues/91

Yoav: Chrome is not marking LCP when when largest contentful element is not visible
... So when entire page is hidden by A?B testing snippets (one particular case)
... Issue where we were seeing lower LCP than FCP, and this is one case
... Reason is that there was a bug with FCP that was fixed (put that aside)
... LCP we only expose renderTime when same origin or TAO opt-in. Otherwise we expose load time
... When FCP element wasn't shown and then A/B snippet shows entire page after N seconds, we mark rendertime and FCP
... That means when LCP is sameOrigin all is well
... But when LCP is X-O and not marked with TAO, we just expose loadTime as startTime, which is an arbitrarily time earlier than it was displayed
... Reason we don't expose renderTime across origins is it can reveal details about the image itself, i.e. how long it took to expose. Can be a cross-origin leak
... But in case where renderTime is not function of image itself, but function of logic on page (invisible to visible), then I think it's safe to expose renderTime for these
... Doesn't reveal info about element itself, but about page
... Maybe there's a threshold where we need to ensure there's a large enough difference in times to make sure it's safe to expose
... And then avoid the case where RUM providers are seeing large discrepancies between LCP and FCP that are bizarre
... Needs a security review. Initial replies indicated that this seems fine.
... Looking for a general feel on whether this is fine or not
Alex: Could you repeat description of circumstances under which we can expose X-O timing?
Yoav: X-O image that is LCP. It is being loaded, "rendered" but page's styles are keeping it hidden. In those cases we report renderTime when image was displayed to the user, not when it was decoded.
... In these cases, the renderTime we'd expose is the "3 second timeout" that the A/B provider gave the page so it would be displayed once everything was settled and A/B testing magic had happened
... I believe exposing the renderTime as the value that was set by A/B testing script is safe because it doesn't exposes anything about the image itself. It exposes processing on the page.
... Then it's fine to set renderTime to when it's revealed
Alex: Thanks you've answered my question
Michal: If you load a page today and switch tabs to make that in the BG, then you switch back later, you'll get a paint time that's arbitrary. We mark that, the value of LCP is arbitrary, it's less useful.
... Analogy similar here, image was hidden and then later visible. Difference here is that it's not a user action that made it visible, it was the page. Makes sense the timing should be exposable and used for LCP.
Amiya: Does it always ensure that LCP will be >= FCP, or are there other cases?
Yoav: There shouldn't be cases where LCP < FCP. Not saying it's impossible there are bugs, but this would be considered a bug if so.
Amiya: There could be a race condition where image is being un-hidden but image has not yet decoded fully.
Yoav: From my perspective we cannot report, maybe even with some threshold of safety, from decoding to render time, we'll still report loadtime. In those cases we'll have a mis-reporting of LCP < FCP, but I think that's fine. But would significantly lower instance of those cases.
Nic: Why not just set FCP to LCP?
Yoav: Then we'd be mis-reporting LCP. Or just say not reporting renderTime at all and not loadTime
... So either we're lying and you as a RUM is not aware. Or if LCP < FCP and we just get loadTime, so I didn't get the actual render time.
Patrick: Does FCP leak the info you didn't want to expose with LCP
... Can it just be MAX(FCP, LCP) so it can never be lower
Yoav: FCP doesn't guarantee it's related to LCP element. We could set it to be the MAX() of both, but maybe we're hiding information.
... Reporting won't be any worse
Nic: Had a lot of LCP<FCP, having more reliability in the data would help
Michal: Yoav presented a nice simple case where no content was presented. It can be common that something contentful appeared after load time of the image. E.g. When sites are preloading images and only showing them later
… Max would be better as it’d be reliable
… Maybe we can just provide the render time in those cases
… Let’s say you’re using react and vdom, and the image is not attached to the dom up until later
… So the paint time of the image can be significantly later
… How can we know that there’s a delta? Can we always know when LCP would have rendered, so know we can expose that time?
Ian: Wondering if the simplest thing we could do is expose that this was the case. Then we don’t have to worry about clamping. If we just say that the load time is the start time?
Yoav: That’s already what we do..
Ian: Maybe we can expose the TAO status and let developers know that they can just use FCP instead?
Nic: Possibly. Having this knowledge, we probably could act on it differently. Wasn’t obvious initially when we captured this data
Michal: Not directly related. This exact problem of load/render time exposed as duration, related to the user timing issue. The same timestamps are used to mark different things, so this relates to the previous conversation. This can also relate to interop issues between browsers and reduce them.
Nic: heuristic that you’re proposing would automatically solve the issue for folks collecting this data. Ian said that folks could also look at the existing entries.
… Seems worthwhile to solve for existing users
… other question - should we synchronize LCP to FCP when it’s less then?
Michal: Makes sense to me in case of real LCP to have that startTime. But I worry about reusing renderTime to expose it. Wonder if we need a third property: “this LCP was large because it was hidden” vs “added it aggressively to the DOM”.
Yoav: The LCP was large because it was hidden, or what was the other option?
Michal: Because you were adding it to DOM
... This is a unique case where renderTime isn't affected by the bytes, but by user/site behavior
... A third timestamp could be useful
Yoav: loadTime, renderTime, imageReadyTime
Michal: Leave renderTime as it has an existing connotation