WebPerfWG call - September 30th 2021

Participants

Corentin Peschliche, Cliff Crocker, Andrew Comminos, Steven Bougon, Scott Gifford, Alex christensen, Ian Clelland, Yoav Weiss, Nic Jansma, Benjamin de Kosnik, Pat MEenan, Noam Helfman, Giacomo Zecchini, Michal Mocny, Carine, Time Tijhof, Nicolás Peña Moreno

Admin

Next meeting on October 14th
Last meeting before TPAC - OCT 25-29th, still room for ideas and topics
Trying to figure out best time slots, but there’s still room for more topics.
Free registration link: https://www.w3.org/2021/10/TPAC/Overview.html

Exposing VM state for JS Self-Profiling

Andrew: Working on JS profiling, chatting on updates and ideas on extending it
… Web exposed sampling JS profiler accessible by script. Sites can inspect JS execution locally or on the server.
… See real RUM metrics
… Unbiasing from high-end machine and seeing weird configurations in the field
… Shipped on Chrome 94
… Simple to use API - sync constructor. A lot of the logic you want to profile can’t wait for asynchronous, and the API didn’t require it
… 2 parameters, sampleInternal and maxBufferSize to ensure the developer can’t go overboard and collect more samples than they can process
… Example
... Particularly useful to debug input latency in combination with Event Timing
… One addition, using DocumentPolicy in order to avoid incurring the hit of getting the initial metadata, ensuring the document can start profiling really quickly
… Example for tracing format
… Trie representation
… Initial data is better than anticipated - Enabling the profiler slowed load time by less than 1%, so overhead is minimal.
… Provides a drop in solution for perf analysis
… Seeing strong adoption from industry (e.g. Microsoft)
… What could be better?
… Non-JS execution is hard to identify. UA work (paints, layouts, garbage collection) is hard to tell from idle time
… Some tasks are longer than they should be if they trigger GC
… Corentin is working on extending the profiler
Corentin: Want to update the spec
... Wants to highlight work performed by the UA
... Use information from traces available already on DevTools
... Want a marker to trigger profiling
… GC, parsing (HTML/JS?)
… Proposed modification
… We’d have longer traces that include more details (e.g. marking samples as impacted by GC)
… There are security and privacy concerns. We should not expose work done on a cross-origin document
… Need to be careful not to include information from other origins, may require cross-origin isolation
… Want to know if there’s interest in breaking down the paint marker into style, layout and paint
… Should we require COI?
… Finally, is this the right place to expose this information? Or should it be exposed as part of the performance timeline?
... PR https://github.com/WICG/js-self-profiling/pull/55
--- discussion ---
Yoav: You mentioned COI. Did you do analysis on the different markers and how each would be impacted by the inclusion or not of non-isolated cross-origin resources?
Andrew: Main concern is GC. It could be challenging to perform attribution in this case, so COI makes sense for exposing this kind of marker. In other cases, it’s easier to attribute. GC is hard because of opaque scripts. It seems most markers will require origin attribution. We’ve also looped in some security folks so we will have another update at TPAC.
Alex: You mentioned that enabling the API makes it less than 1% slower load. But have you measured battery life or CPU usage, FPS, etc?
Andrew: Hard to measure those, we hope to use smoothness API later on. But on local measurements we have not seen regressions on those metrics. Raw cycles seem to be an extra 1-3%. Perhaps you can assume that is a reasonable bound for the impact.
Alex: Assume this is for facebook.com.
Andrew: Yes
Alex: Curious if anyone has data on JS heavy canvas drawing game.
Timo: Exposing GC - it’s already exposed because you’re not receiving samples
Andrew: Spec defines stack unwinding, so that you’d see nothing during GC - you’d get an empty GC
Timo: I don’t think we can mask that it actually happened, unless we make up fake frames. So it would be trivial to deduce that this is what occurred during that time.
Andrew: Yea, true.
Michal: Followup to Alex. 1-3% is not quite clear, but I wonder about the flip side of that. At some point you catch a bug that pays for that permanently.
Andrew: In the OT we shaved a bunch of random issues on the site. e.g. Observing a script that we didn’t know was running, which gave us 4-5% win.
Michal: You sample the clients as well?
Andrew: Yea
Michal: At some point you may taper off, so is it worthwhile to reduce the number of samples at some point?
Andrew: Potentially. Also, code coverage and perf are different use cases and may require different sampling sizes
Michal: Do you feel you need as much level of detail from field data? Or would it be sufficient to know that layout has been exceeding a certain amount?
Andrew: 2 main use cases. 1) how much layout is taking 2) when is JS actually running. E.g. when is GC interrupting JS execution. If we have another API that provides aggregate metrics, that could also cover that use case
Michal: Yea. Layout is tricky as it could have an impact on stack samples. I guess others could have an impact as well.
Noam: followup on Alex’s question. We use the profiler on Excel Online. A lot of heavy JS is running. We have not observed substantial regressions in our measurements of long JS and long frames, but we are still evaluating and analyzing the data. It feels relatively performant. We use it on a sampling basis, i.e. periodically on some clients. The API has helped analyzing bottlenecks and it has helped us enhance our responsiveness. We found some issues that we would not be able to find in test environments. Still find cases with stack traces where we don’t understand what’s going on. So I think it’s very important to improve.
If this comes at the cost of COI, I think it may be problematic for us to implement. We couldn’t implement it before. That would mean that we won’t be able to leverage the capability.
Andrew: COI is a last resort measure.
Noam: Would be good to do a security analysis for it, and only enforce it when required.
Andrew: We are trying to start security conversations as soon as possible
Yoav: Yhere was recently a credential-less extension which allows you to incorporate third parties but send the requests without credentials. Does that help? Or what are the adoption hurdles?
Noam: Our limitation for implementation of COI is that once a resource has it, everything has to be restricted. For COOP especially, or one of them, it propagates. And we have a lot of scripts and resources, some are third party and we don’t have full control over them.
Yoav: I think it might be interesting for y’all to look at credentialless.
Noam: I think it is currently in Origin Trial, so we’d have to look into it. I don’t know if it supports credentialless?
Andrew: Well, it does not require COI.
Benjamin: Want to go back to this, to using this to find regressions. Examples ways where you were able to isolate specific hardware using this?
Andrew: In many cases, if we’re running an experiment on a feature, we want to dive into how something regressed. And it’s very hard to figure out what part of the code was responsible for a regression. So with this API we can know which stacks were most prominent. Profiler API coupled with looking at the P95s has been very helpful.
Michal: it sounds like in practice this API also helps you go over a bunch of fixes much faster.
Andrew: Yes, people will have more energy to address perf problems with the correct tools.

Landing navigations (Yoav)

Yoav: I have a cold
… Exposing landing navigation to RUM
… Reason I looked into this for SPAs and MPAs
… Looking at their performance along with Michal
… What we noticed was in terms of performance characteristics, we have two distinct types of navigation
… Landings and “follow-ups”
… Landings defined as first navigation in browsing session (HTML concept), session history that happens within a tab
… Correlates with what we define as landing
… Cross-origin navigation or opened in new tab
… One characteristic is they cannot be developer intercepted
… When looking at MPAs and SPAs, SPAs only have “landing navigations”. While MPAs have a mix of landing vs. followup navigation
… Different performance characteristics, via internal Chrome metrics, e.g. in terms of CWVs
… Expose to RUM?
… Split on that navigation type as Dimension, can sometimes be traded off on landing where it saves some work on followup
… Split on dimension can help reduce noise and clarify data
… For MPA and SPA comparisons, only landings are apples-to-apples comparisons when doing that comparison
… Would allow RUM providers to do those types of splits and reports
… Developers would be able to monitor migration between architectures, during rewrites (e.g. moving from MPA to SPA), make sure you’re not losing performance, enables developers to monitor that migration to new architecture.
… Keeping track of landings between both sides of that architectural fence
… Theoretically that is already available through session storage (polyfillable), but session storage is slow and busted and we don’t want anyone to use it (sync API)
… At the same time it’s safe to expose this information. Haven’t run this by security folks yet.
… In terms of API shape:
… We add a boolean ‘landing’ which won’t require updates once we start reporting other types of navigations.
... PR in progress: https://github.com/w3c/navigation-timing/pull/161
…This is relatively easy to add because these concepts are already defined in HTML.
--- discussion ---
Cliff: I think this is awesome, but I’m a bit confused about the differences between all these navigation types. Why are these first class but not other kinds such as bf cache navigations?
Yoav: Is that something where changing the API shape would help? Or is this a comment on the fact that we have many types of navigations that needs to be addressed?
Cliff: We need to be able to explain what all the kinds of navigations are and how we can view them.
Yoav: Another option considered was to add another value to an enum, but then it became two values because for history navigations, you’re in a landing, then move forward and backwards, I thought it still qualifies to be a landing.
Cliff: Right, a reload could still be a landing, a bf cache nav could still be a landing. So maybe we don’t call them navigations just to make it more clear.
Yoav: May be worth scheduling a TPAC session for name bikeshedding.
Nic: Yoav gave me a preview, and it was timely for us as RUM providers. We added a landing-or-not dimension to our data, which we implement with a cookie.
<Shares the screen>
Nic: This is the data from my own site. Segmenting the data by navigation type changes the data significantly, for every metric.
Of course this depends on the site too, but the metrics do differ significantly.
… For an individual user, this is one of the dimensions that changes user experience over time.
… First experience is probably worse than later experiences.
… I find this a very intriguing dimension to split data by, and some of our customers are using it for their own analysis.
… I think it would be valuable for us to have it standardized in our own RUM data.
… The definition described here seems similar to ours, but better.
Timo: Which combination of APIs do you use to approximate this today?
Nic: Our sessions are defined as a sliding 30 minute window.
… This is just a limitation of what we have today.
Timo: if you reenter from a cross-origin, it may appear as a followup even though it is a landing.
Nic: Yep
Timo: Navigation type preload, back-forward are pretty special. Do we want to integrate these or leave that up to clients?
… Offline cache: we tried to reliably detect this. A couple of hacks related to ResourceTiming and transferSize 0.
… We have a bug open to make this a random constant.
… Caching has a pretty big impact on perf characteristics
Yoav: for the first question, the reason I chose to split it out is because it intersects with all other types in the enum. So it makes sense to break it apart into its own boolean, and then have client side scripts combine that information.
… for the second question: this is already fixed in the spec. We are exposing it as 300 for header vs 0 for nothing. Feel free to open an issue to improve ergonomics.