WebPerfWG call - October 17 2024

Participants

Nic Jansma, Yoav Weiss, Guohui Deng, Jose Dapena Paz, Noam Rosenthal, Barry Pollard, Andy Davies, Ansh Mishra, Michal Mocny, Ian Clelland, Aoyuan Zuo, Sean Feng, Patrick Meenan, Carine Bournez,

Admin

Next meeting: Oct 31, 2024

Costumes optional, but kinda required!

Minutes

LCP/element-timing: Unzeroing renderTime for cross-origin images in a security-blessed way - Noam

Recording

Noam: Zeroing renderTime when LCP image is cross-origin and doesn’t have Timing-Allow-Origin
… Always very confusing to people when renderTime is zero – should it be at least loadTime
… Some timestamps after loadTime we already know
… Other thing we found is that security reasoning for this is to avoid exposing decoding time of cross-origin images
… You can already know the loading time with ResourceTiming and load Event, and size of the image
… Idea is that given you already have size and loading time, if you have renderTime it could give you additional information about decoding
… Kind of a weak attack vector
… Also we find that when we protect it with Timing-Allow-Origin you can piggy-back it on same-origin images
… Cross-origin image make sure it’s loaded, another small same-origin image, you can get the renderTime of it if it’s in the same frame (if decoding is not asynchronous)
… Ineffective security measure
… What can come instead?
… Render times are common, they’re not resource-specific
… Measuring the system, you can find out things about cross-origin things maybe
… Protection we have for that is cross-origin isolation, CORP and COEP
… Idea we came up with, together with security folks, is we expose renderTimes, but we would coarsen
… Granular enough to not affect LCP in meaningful way if it’s wrong
… Coarse enough it doesn’t give you more information
… Delta between that and decoding time is marginal and rare you could do anything with formulastic granularity
… Given rendering a lot of things and doing many things that can affect rendering
… Michal asked me if this would apply to other render times in EventTiming, FCP, etc
… Not yet, but I think in the future we’d make renderTime mean the same thing everywhere. Maybe expose it in LoAF.
… My ask for this group is to ratify this, see if there are questions
… Then we’ll have a written proposal, reviewed by security folks
Yoav: Regarding the 4ms coarsening
… Is there any public reason for “4”?
… What is the risk these 4ms will protect against?
… I think it’s fine to coarsen it, but I’m curious on the reason
Noam: Let’s say you add an image based on whether or not you’re logged in, cross-origin
… One is very small clip-art, another is large image in high-res
… Both the same size
Yoav: You think the difference between those two images will not be greater than 4ms
Noam: Probably a lot smaller than that anyway
… Want a number greater than half a microsecond
… Better safe than sorry measure
… Mostly also because we have 4ms in other places, like setTimeout()
… Somewhere in PerformanceTimeline
… Not first time we use this measurement, so for something coarse, it’s coarse-enough but doesn’t affect LCP in meaningful way
Yoav: Coarsening we can “afford”
Noam: Easier to make it more granular in future than to coarsen it
… coarsen-time uses 5 microseconds
Sean: Will talk to Mozilla folks about this
… If we use cross-origin isolation for COEP credentialless, that allows website to display image without resource to be opt-in
… I’ll ping other folks
Noam: It means when there’s cross-origin isolation there can be much bigger timing attacks than this
… If other browsers wanted to coarsen things it would be per-spec
… How coarse it is could be somewhat left to the User Agent (UAs)
… UAs can also coarsen more than the spec
Michal: My understanding is colored by how I think it works in Chromium
… Presentation times are vsync-aligned, we’ll coarsen a bit, but they’ll be aligned
… Will get frame it’s presented in but not the very specific time of that frame
… For image decode, you’ll get insights if it’s in the deadline of a specific frame or multiple frames
… Don’t know how long it took, you’ll see how many deadlines it took
… You’re measuring how many units of frames there were of delay
… Makes sense to me this is already measurable through other mechanisms
Yoav: I think you’re saying it’s already coarsened to 16ms or 8 expected ms, because it’s aligned to vsync
Michal: More or less, platform details, implementers have different abilities to measure things
Yoav: Vsync alignment isn’t something currently happens but is something we considered?
Noam: Yes, vsync alignment is a detail not in the spec yet.
… All of it is not testable in an interopable way exactly
Yoav: Some platforms have these things that are 8ms long
Noam: What we measure in spec is rendering opportunities
… You may have a few more of them before you have a vsync
… Don’t want to align too much to how the underlying system works
… I think vague isn’t necessarily bad here
… For example is Chromium would say at least 4ms aligned because it’s aligned to vsync, it’s OK.
… 4ms means we can do a lot of things
Michal: I like it, I think it’s a good change, will make things more consistent
… In practice I don’t think it will change much
… This is just covering our bases and the coarsening isn’t going to be an issue in practice
Noam: We may not need extra coarsening, but may rely that vsync is coarsened anyway
Yoav: RUM providers, what do you think?
Andy: Getting rid of zero is the most useful thing, people question why zero there
… Have to understand the details of the spec
… Takes away a lot of the questions
… Coarsening not an issue for us
Nic: +1
Yoav: Ship it
Noam: With secondary security stamp on the actual proposal but yes
… Will also put in the spec, reiterate UAs can also coarsen more
Yoav: Sean you’ll kick off convo on Mozilla side?
Sean: Yes
… I’ll comment in the issue

Smarter buffers - Michal

Michal: At TPAC we talked about issues with buffering of entries
… Most of the context was around UserTiming
… Set of discussions related to field data / tracing use-cases
… Using LoAF to measure parts of the timeline that need fine-grained UserTimings
… UserTiming buffers unlimited in size, can get full quite a bit
… Some sites over-use, potentially have 100k+, can be a performance issue
… Some thoughts if no one was listening, if they’d get tossed
… I’m interested from EventTiming perspective (and other types that do buffering)
… EventTiming issue 124
… Policy for EventTiming where we’ll only buffer entries that are above duration threshold over 104ms
… Otherwise there’s a lot of events and it can fill up quickly
… With Observer, you can specify a threshold, most use-cases uses a lower threshold
… Lowest is 16ms. web-vitals.js default is 40ms
… Register Observer for ET with custom duration threshold, and you get some, but not all entries
… API quirk that people complain about
… People are using this to get interactions, e.g. for INP
… Any interactions on that page that aren’t >104ms, the won’t get reported
… Many scripts have to fall-back to first-input entry, which has unconditional buffering
… Should we change default duration theshold to something like 40. You’d run out of buffer space quicker.
… OR should we always buffer interactions above non-interactions, because they’re more important
… And out of that came an idea can we do dynamic buffering, not many events we can buffer everything, cherry pick most important ones
… Maybe longest 150 events instead of first 150 events
… How often does this happen? In Chromium, once we fill buffers we stop trying to add, so we had some bugs. We don’t have field data reports of how often this happens.
… If I load Wikipedia and scroll, and registry an observer, I’m not going to get a single entry
… Site is so fast, none of the interactions on this page register
… Maybe it’s not a problem to not have any ET if they’re too fast
… But in terms of understanding and attributing, causing a loss of data
… Another issue is if I reload and clear, and add a little bit of jank, it doesn’t matter if the interactions are slow, stream of events, especially with rage-clicks
… Now I have 51 ET, one jank on the page can cause these things
… I took a look at CNN, I’m going to refresh CNN, not going to interact or add jank, register first observer, a few seconds in, had 150 entries and 86 dropped without me interacting at all
... Even just loading, having a mouse hover over, will cause 100s of events to fire over 104ms
… Interactions would get dropped
… Pages like Wikipedia have a lack of insight during load. Pages like CNN there’s too much data.
… Actual entries, there are a lot that overlap the same timeline
… When I pointerover, you get pointerover enter, stacked DOM tree with nested containers, they each get triggered because they have a different hit test
… Dozens or hundreds of events fired over a single mouse move
… Maybe change ET to say specific things aren’t buffered, or collapse the data
… I don’t think all entries is the best way to implement a buffer
… Neat if each entry type could have an order of important rather than just
Nic: https://www.w3.org/TR/timing-entrytypes-registry/
Michal: RT is bounded, LongTasks is unbuffered
…
Nic: RT started as 150, but changed it to 250 as buffers gotten full
… it also has a “clearBuffer” function, but it’s not an API we want, as any entity on the site could clear it and it affected others
Michal: Element timing is interesting, I could see a lot of Element Timings being applied in the future
… Also, the first 150 are not necessarily the most important
Nic: Like the proposal of having a prioritized list of entries, so we’d throw out the lower priority ones
… I don’t think we discussed that in the past
Yoav: I think a lot of need here for buffering that goes beyond default is just a question of very-late registration and lack of an opt-in to increase buffer that is not late-registration
… ResourceTiming has two mechanism (1) an increase in buffer
… (2) is clearing the buffer, which is a mistake
… But bumping up the size is a non-destructive operation and can be something useful for websites to do as an early way to opt-in
… Also discussed in past headers, if you need more room for X entry type
… Headers have deployability issues for some RUM providers
… Maybe not right mechanism
… I’m wondering if all that logic and complexity, all we need is a simple mechanism to say we need more than 150
Michal: Default buffer is all sites bear cost of measurement, and don’t know fi there will ever be an observer
… So you start with a bounded amount of work, and in theory it could be unbounded
… Doesn’t address cases like UserTiming, where a site could be poorly designed
Yoav: Sites are creating too many entries
… UserTiming issue is different than the others
… Others we want a higher initial buffer, or an opt-in to thresholds
… Interested in data and will collect it, but at the same time, for UserTiming, we want a limit
… Site can do limiting, it’s possible 3P code is doing shenanigans
… So site could lower buffer from infinite to 1k or 10k
… Providing that opt-in which seems simpler, will give us enough control here
Michal: We struggle with RUM providers augmenting meta tags, or whatever
… Easy defaults is good
Noam: Adding more knobs is not where I would go with this
… Wondering how late do RUM providers get to register
… Think about bearing the cost here, 150 entries is barely 10kb of data
… It’s not a lot of memory for the beginning of application, accumulating state
… Point in time where everythings registered and you don’t need to keep buffering
Yoav:
Nic: we talked about a “point in time” in the past, but it always comes down to determining that point in time, and onload is not a good one
… The point in which our RUM library loads, in a deprioritized manner, it will often load after onload, and some of our customers intentionally don’t load us until everything is “done” to avoid it affecting metrics. So it’s useful to look at past entries and these buffers help us do that
… Whatever flexibility we get there would help us
Andy:In terms of GDPR consent, we lose control of when the script loads, as we don’t load until the users consented and the frameworks for that are super slow
… We have a mark to record when our scripts start running and they are slow
Noam: So no buffer would be big enough
Nic: The value of later events is also diminished. Interactions are not a good example for for load data, early entries are the most meaningful
… Smart defaults are good. But Michal’s pointing at some cases that can be improved
… For RT, going from 150 to 250 helped, and customers that need more can augment the buffers
Michal: Is it necessary to measure all 1000?
Nic: Few specific customers that really do load lots of resources, and wanted to get a better understanding of that
… Applying “smartness” to RT wouldn’t make sense, as the waterfall won’t tell you what’s not there
Andy: Smart buffering for event timing would be useful. If we look at CNN, the INP is probably at the start
Noam: Some things would require running a script early (e.g. to change the buffer size).
… How is that different from running your own early observer that doesn’t send anything
Nic: Want to load the majority of our RUM library later. A single line can be inlined by the customer. In some customers we can add ~1K of script that can do the buffering, but it’s added friction and gets pushback
Michal: Buffer sizes have a memory overhead, but that’s not a primary concern
… Processing to measure things that may not need the measurement
… In Chromium that’s already cost that we’re paying, so we make sure the overhead is minimal anyway
… Worried about JS processing to go over the entries - if you’d filter these entries anyway. For event timing, a lot of the entries gets removed later on, so we could have removed it upfront.
… If we had prioritization, the browser could have done that filtering ahead of time
Noam: I like the idea of e.g. not buffering anything below the median once the buffer is half full
… For event timing we could have a default that counts the median, may not apply to other entry types
Michal: For event timing we can just ignore all mouse moves. There’s no reason to buffer those.
… So we can a simple approach to not buffer them, which would reduce the frequency in which the buffers are exhausted
… Otherwise, event timing with non-zero interaction IDs could get higher priority, and not be evicted
Yoav: makes sense for EventTiming specifically, not sure if it applies to others
Michal: Do all types have a way to prioritize or not
… Eg. Element timing?
… Resources is simpler
Nic: LCP could be a circular buffer for example
Yoav: Do we have browser data on how often entries get dropped before a PO is registered
Michal: Exposed to PO register, so RUM providers could look now
… Couldn’t find use-counters in Chromium
Yoav: Think it’d be very useful for this conversation where to focus efforts
… e.g. For PaintTiming it’s not a problem, buffer is 2 and there’s only 2
… Would be interesting to see how often we get dropped entries
Noam: That’s one AI
… Second one is a reasonable defaults proposal for EventTiming that we can look in detail
Michal: For UT, some UT are meant to be buffered, e.g. during load, we want to read those later
… But some UT, if you have a registered observer and you want tracing, here’s a detailed list of UT
… Potentially not intended to be buffered is one option
… Or we shouldn’t do any processing on them if no Observer
… Don’t have to serialize data that’s there, etc
… Additional value
Yoav: Feels like a different use-case
… Can continue at another call
… Found very useful!

AI: Everyone needs to wear costumes for next call