WebPerf WG @ TPAC 2024
bit.ly/webperf-tpac24
Logistics
TPAC 2024 Home Page
Where
Hilton Anaheim, California, USA
When
September 23-27 2024
Registering
- Register by July 16 for the Early Bird Rate
- All WG Members and Invited Experts can participate
- If you’re not one and want to join, ping the chairs to discuss (Yoav Weiss, Nic Jansma)
Calling in
Join the Zoom meeting through:
https://w3c.zoom.us/j/4556160000?pwd=2024
Or join from your phone using one of the local phone numbers at:
https://w3c.zoom.us/u/kb8tBvhWMN
Meeting ID: 455 616 0000
Passcode: 2024
Masking Policy
We will not require masks to be worn in the WebPerf WG meeting room.
Attendees
- Yoav Weiss (Shopify) - in person
- Nic Jansma (Akamai) - in person
- Noam Rosenthal (Google) - in person
- Shunya Shishido (Google) - in person
- Noam Helfman (Microsoft) - in person
- Barry Pollard (Google) - remote
- Leon Brocard (Fastly) - remote
- Sean Feng (Mozilla) - in person
- Dave Hunt (Mozilla) - in person
- Philip Tellis (Akamai) - in person
- Keita Suzuki (Google) - in person
- Jason Williams (Bloomberg) - in person
- Kouhei Ueno (Google) - in person
- Michal Mocny (Google) - alive and in the flesh
- Ian Clelland (Google) - in person
- Scott Haseley (Google) - in person
- Bas Schouten (Mozilla) - in person
- Nidhi Jaju (Google) - in person
- Andrew Comminos (Meta) - in person
- Jose Dapena Paz (Igalia) - remotely
- Patrick Meenan (Google) - remote
- Lucas Pardue (Cloudflare) - remote
- Benjamin De Kosnik (Mozilla) - remote
- Utkarsh Goel (Akamai) - remote
- Jeremy Roman (Google) - in person
- Nishitha Burman (Microsoft) - remote
- Hiroshige Hayashizaki (Google) - in person
- Eric Kinnear (Apple) - in person
- Alex Christensen (Apple) - in person
- Brian Strauch (Meta) - in person
- Anudeep Palanki (Meta) - in person
- Carine Bournez (W3C) - remote
- Adam Rice (Google) - in person
- Tim Kadlec (IE, SpeedCurve) - in person
- Guohui Deng (Microsoft) - in person
- Victor Huang (Microsoft) - remote
- Fabio Rocha (Microsoft) - remote
- Joone Hur (Microsoft) - remote
- Sebastian Käbisch (Siemens) - in person
- Erik Anderson (Microsoft) - in person
- Pete Gonzalez (TikTok) - in person
- Yujie Hao (TikTok) - in person
- Colin Bendell (Shopify) - remote
- Simon Pieters (Mozilla) - in person
- Evan Stade (Google) - in person
Agenda
Lightning Topics List
Have something to discuss but it didn't make it into the official agenda? Want to have a low-overhead (no slides?) discussion? Some ideas:
- Lightning talks
- Breakout topics
- Q&A sessions
- Questions for the group
- Short follow-ups from previous sessions
Add your ideas here. Note, you can request discussion topics without presenting them:
- Please suggest!
- ✅ [Dave] Working on A Primer for Web Performance Timing APIs
- ✅ [Michal]: LCP Breakdowns: resource vs element “slot” available (followup from TTFB conversation)
- [Michal]: Smarter buffers (followup from: why is User Timing buffer unlimited?)
Times in PDT
Recordings
-1 Lower Level - Catalina 5
Recordings
-1 Lower Level - Catalina 5
Wednesday - September 25
(breakout sessions)
Recordings
4 Concourse Level - Huntington
Friday - September 27
Joint Meeting with WHATWG
https://w3c.zoom.us/j/7639586116?pwd=1NQ1Zj2DjGVdP0uSVTJYcbizkQQM5c.1
https://github.com/whatwg/meta/issues/326
https://whatwg.org/chat
https://app.element.io/#/room/#whatwg:matrix.org
Minutes
4 Concourse Level - Capistrano
Timeslot (PT) | Dur | Subject | POC |
14:00-14:40 | 40m | FetchLater consolidation | Noam |
14:40-15:20 | 40m | Scheduling API updates | Scott |
15:20-16:00 | 40m | CompressionStreams | Adam |
Meeting Minutes
Day 1 - Monday
Intro - Nic Jansma
Nic: Highlights..
… Rechartering - mostly clarifications and Event Timing adoption
… We’ve been evolving ideas around adapting our APIs with privacy in mind
… Working on a `confidence` attribute to enable privacy preserving dimensions
… Been also talking about LoAF and Element Timing for containers
… Worked with IETF and WHATWG on Compression dictionaries and fetchLater
… Closed 77 issues (last year was better)
… Charter, we’ll be talking about it again at the middle of next year
… Goals to revitalize - there’s a WG primer. We wanted to revitalize it and rework it to modernize it.
… Another deliverable we haven’t updated in a while, called performance APIs, security and privacy
… we have thoughts on that subject, and we’ll talk about principles related to it later in the week
… A few of us submitted a proposal for a RUM Community Group. We’ll talk about that more tomorrow
… Submitted last night, but there’d be a voting process at some point
… Lots of incubations that we’re interested in following:
… Interesting to go through these past discussions and see if it’s still relevant
… Could be interesting to potentially do a lightning talk about these
… Was interesting to go over the conversations we had in the last year
… Market adoption (based on Chrome)
… Small dips with Server Timing and Reporting Observer, everything else went up!!
… A few new members joining: Techfriar, Netcetera, Datadog and Tim Kadlec
<intros>
… Been thinking about lightning topics. If there are small things you want to talk about, have questions, etc. let us know
… There’s a section in the agenda where you can suggest ideas. If that’s useful, we’ll schedule some time for it
… <Housekeeping note>
… <health rules>
… Follow up with a poll on meeting cadence
Summary
- Good technical progress on a prototype
- Lots of interest in the use case - both Chrome and Mozilla seem generally supportive
- Lots of open questions regarding performance, configurability, and edge cases
Minutes
- Jase: Focus on performance in Bloomberg eco system
- ... Here with Jose (Igalia), helped us on implementations
- ... Bring everyone up to speed first
- ...
- ... Talking about LCP and ElementTiming, wish developers had a more bespoke API
- ... i.e. for X to know when the first tweet was painted
- ... Not very easy to do something like this today
- ... We have ElementTiming, subset of individual elements
- ... Bad for developers doing components, not easy to know when a component as a whole is painted
- ... Challenging measuring the whole thing when components like ratings may come from a third party
- ... Want to know when this component was painted for the user
- ... Bloomberg uses Chromium
- ... Use INP, Long Frames, et
- ... LCP uses this element in the upper-right
- ... For users, table isn't useful until it's had its initial paint
- ... When cells have been populated
- ... What we want to track
- ... Slowness is what we want to know about
- ... For us, table is the most important part of this application, not the little text on the top right
- ... Can we fix this?
- ... Can we have Container Timing as well as Element Timing?
- ... Can we know when a container has been presented? Using it in some way?
- ... Can we offer developers more control than what we have on Element Timing?
- ... API suitable for component design
- ... Maybe have a different team building a component that we want to track separately
- ... How do we know when something has been painted?
- ... Settled on a candidate model (similar to LCP)
- ... Rather than guess, we have an API when there's new events in that container
- ... Developer has a choice to choose the current entry, or wait for another one later
- ... Tried other things, looking into arbitrary size
- ... % of painted content
- ... Picking arbitrary thresholds didn't work too well
- ... We just give events instead
- ... Started off with naive proof of concept
- ... When you mark a container with an attribute, it's a container root
- ... Started off with ElementTiming (ET) for everything inside
- ... Collect ET events and bubble up to container
- ... Gets you most of the way there
- ... In field, sometimes containers swap elements around
- ... Element swap triggers new ET and you get more paints
- ... Not as useful as this is after the component is painted
- ... Instead, look into algorithm for only painting new areas
- ... User marks the container they're interested in
- ... When we get a root, internally we create a region
- ... Most browsers can do this
- ... Elements come in, add those to our region as they have a paint
- ... If elements are removed or components swapped out, we track areas painted
- ... i.e. for carousel, no-op, no new events
- ... If we go back to API, you're getting elements populating components, on swap, you wouldn't get a new event
- ... Much better for knowing first paints
- ... We have explainer on Bloomberg repo, and Chromium implementation
- ... Also a userland/JavaScript implementation, Chrome extension
- Bas: What does last in this context mean?
- Jase: A list of elements, one of the ones painted
- ... (example)
- ... Nested containers, how do we deal?
- ... We have 3 modes
- ... Default to ignore, the outer will ignore anything happening on the inner container
- ... "transparent" outer container treats inner as its own and counting anything inside of there
- ... "shadowed" similar but things like last painted element and size would be the root of the inner, not individual elements inside
- ... get info about root but nothing inside
- ... Questions to put to the floor
- ... Would like to know if it's possible to add options on a per-type basis?
- ... More people using e.g. SVGs
- ... Can we add those to Element Timing?
- Bas: Polyfill uses ElementTiming, what functionality can polyfill not support that UA can provide?
- Jase: Yes it uses ET, we do a few things. We need to load a MutationObserver as early as possible
- ... Load as early as possible in the HEAD, now they have a blocking script in the HEAD
- ... In UA it could be more important and efficient
- ... i.e. with Rects and intersections it could be more efficient
- Bas: Tracking rect per region populated?
- Jase: Yes
- Jose: As we're measuring performance, big overhead. One of the concerns is why we want to use native. For polyfill with all ET events, we need to process a lot of JS events. Significant impact.
- ... With native implementation we try to be careful with the data we store per root
- ... One of the nice things of Container Timing is that we have just the data we need
- ... If it takes a lot of memory or CPU overhead, or creating a lot of JS objects it takes a toll on the JS heap
- ... Try to reduce the number of objects we pass to JS world
- ... We're based on ElementTiming impl
- ... At the end of the each processing node event, we collect all container root elements that were updated and that's what we calculate
- ... Internally not creating all of the ElementTiming events
- ... We could create the JS elements only on demand when the API for accessing them is used
- ... We wanted to actually check if internals in Chromium were allowed to implement this easily
- Michal: On performance I have a few follow-ups
- ... On algo stopping at interaction, same stream as used for LCP
- ... One of the questions going into this, is the tree and marking up and whether an ancestor is marked for container timing? Previously you had published some perf benchmarks on polyfill, any update?
- Jase: No, hope to update soon
- Michal: I would imagine there's no additional overhead, the hard work is already done
- Jose: In this implementation we're using one of the few remaining flags in DOM, don't know if that's fine or if we need to do something else for this
- ... ElementTiming and container root collects this information. One of the main issues.
- ... Other is the depth, if 20 or 30 levels they need to traverse to get the root. Very fast but depends on the depth.
- ... I think we'll need more iterations for making this fast
- Michal: Many of those were Chromium specifics
- ... Observed paints based on ElementTimings is a pretty scoped stream
- ... That's when you have to do the walk
- ... Not just total number of paints * full depth of tree
- ... I suspect it's not tremendous overhead
- Jose: In testing it's really fast
- ... Container timing on top-nodes, but maybe that's not the common case
- ... In desktop lots of memory overhead
- ... For a typical page, you get 4/5/6 entries and then when you start scrolling you may get some new events
- ... So far it's better than I expected
- Bas: How are you handling when elements move?
- Jose: Not handling yet.
- ... Something still not considered at all in spec
- Jase: Maybe try that out in the field
- Michal: Maybe 2 questions, I interpreted it differently
- ... 1. ElementTiming itself has challenges with how it measures. Building on top of existing limitations.
- ... 2. Unique to container ET: you get one paint reported per nested element, but the nested element can move around, or the container can re-layout, etc.
- Bas: Could get up to two areas of the container that are now wrong
- Jose: Should we track the area with respect to where the container is painted, not the root. But that doesn't solve the problem.
- Michal: To question about PO options, is it required that each observer configures how it is observing? Could it be an attribute on the container?
- Jase: Yes, considered that. More attributes on the container.
- ... Are there other proposals where there would be per-type options?
- Michal: I would hope how you bubble things up would depend on the container, whereas however many wrapper containers I'm not sure if you want that flexibility
- Jose: One of the things we discussed. Far more useful to have a nested strategy on the top level where they're registered. We need info for both, nested and non-nesting.
- ... If the Observer register for nesting or vice-versa, we answer based on that
- ... If we knew from DOM attributes what mode, we could store less information
- Michal: Can you help me understand the use case for ignore?
- Jase: Websites that wrap ads or components pulled in from third-parties (3P) and they didn't want to track that
- Michal: I wonder if the way you layout your page, or in userland you could just subtract the area?
- Jase: You could subtract the area if you know what it is?
- Bas: Complexity of use for someone looking to use this?
- ... Initial proposals downsides of simpler approaches
- ... User defines container, UA defines (via magic) when it's painted
- ... I understand the complexity, but maybe it would be easier to use?
- Jase: I think it comes down to the correctness side of things
- ... Terminal is a niche example of this, where we could have a huge chunk of this painted
- Michal: Debate to be had if this is only for userland to markup and consume
- ... But if there's a browser primitive, it would be great to build on top of
- ... For example, I think we fail in cases like tables and canvases
- ... Some use-cases, I'm working on a project related to Soft Nav Heuristics to leverage container timing. A few use-cases where we may want to use this primitive.
- Bas: Some of that is moving complexity, the solution from the UA to the userland consumer, now as a user I need to track changes and figure out when it's painted. Gives them more flexibility though.
- Jase: We've tried to work from a similar model to LCP, candidate model, I'm hoping it shouldn't be any more complicated than LCP today
- Philip: I like this from a RUM POV, being able to measure full containers
- ... Look at this from a sub-document. Layout-shifts for example, I only care about LS from a certain container. INP as well.
- ... You mentioned earlier, things are not usable until the buy-now button is loaded. Difference between visible and usable.
- ... We proxy Time to Interactive where we proxy long-tasks and other things that would be interesting from the measurement POV
- Jase: Interesting mentioning that, earlier revision had sub-metrics. Removed it to keep the API simple and clean.
- Philip: IFRAMEs which may be cross-origin and may not have access to, how do you see that working here?
- Jase: Don't have a good story, they may just be ignored
- Jose: At this stage, we do nothing about them
- ... Blackbox from security POV
- ... Not sure what we could really expose. Maybe a high-level IFRAME event?
- ... Tried with Chrome extension, see if we can get events from several contexts. I can get a container root from IFRAMEs. From a tooling POV, where we can skip some isolation, it can be added to e.g. web inspector.
- ... Not only adding to JavaScript, we could add to devtools, so it's easy to understand impact of performance on their site
- Jase: Curious about bringing this into the WebPerf WG
- ... Should we create an intent to experiment and we don't have a
- Tim: I would love to see this pushed forward, anecdotally, in general ElementTiming feels like one of the more important things online
- ... ET fits into a small category, one of the more important types of metrics available on the web
- ... Progression of how people are building for the web
- ... Anecdotally, from submetric perspective, I have a company that has custom timing around a buy-it-now button around their component
- ... They had been using it as their gold standard metric for 13 months, but it was firing at the wrong time. Doesn't necessarily get solved by sub-metric things, but it could be solved by something like container timing. Reduce the risk of that kind of error
- Bas: So a container in that situation would function more as a definition of a unit of UX as a modular portion of the page. Every aspect of that would function as a unit.
- ... Contained to that particular element, similar to an iframe
- Philip: Interactions we stop measuring (for some) of that interaction, we'd look at it for that container
- Jase: One of the reasons it was removed from an earlier version
- Michal: EventTiming reports element, for SVG that could be just "LINE". Target isn't an interesting event listener even. Common request to improve the attribution for targeting. Walk the whole tree.
- ... Maybe there's a bigger problem of how you mark up the page's container tree
- Bas: How do we solve for SVG and canvas? Do we solve for ElementTiming first?
- ... I don't see a reason to not solve for Container Timing and not ElementTiming?
- Jose: I think for SVG and Canvas, video media, the same thing -- we could incrementally add support for more things, could be useful for Container Timing
- Barry: Related, as far as I know ElementTiming is WICG.
- Michal: We've done a refactoring, some things are adopted
- Barry: My understanding is it's not replacing
- Michal: Not replacing ElementTiming directly
- Barry: 90% of ElementTiming has to be implemented before doing this.
- Nic: waiting for interest from UAs; don't recall past discussions on ElementTiming
- Sean (Mozilla): didn't see a use case that benefited enough from it
- … main concern is the performance aspect of implementing it, especially for container timing, the attribute might be misused
- Bas: still trying to understand how much you really need to track
- … any new drawing just has to be compared to the region, right?
- Michal: in Chromium, already tracking all of this to track LCP candidates
- … that same stream goes to element timing; if it doesn't have the element timing, we don't do anything else
- … the only difference is that element timing doesn't stop after LCP
- Bas: some of the LCP restrictions, like scrolls, might apply here too
- Nic: Alex C (WebKit), do you have any thoughts?
- Alex: I have no recent thoughts.
- Bas: now that we have LCP, we have a lot of the groundwork, right?
- Sean: (inaudible)
- Jase: does the WG have recommendations on next steps?
- … or would you like to see more performance analysis?
- Bas: past concerns around ET revolved mainly around how broad the usability was – whether enough content creators could make meaningful use, to justify the investment
- … probably the same question applies here
- … want to know how easy it is for any content creator to use, as opposed to just big teams like Meta & Bloomberg
- … serving web at large vs just large creators
- … personally, it seems clear to me that there is a valid use case, but haven't discussed this internally
- Michal: seems there is demand, and it's hard for folks to do on their own right now
- Nic: some of the more advanced companies would take advantage; RUM providers etc could enable customers
- Anudeep? (Meta): authors often have a specific notion of what it means for the page to be loaded; right now the notion is fuzzy
- Bas: playing devil's advocate, if we did allow people to simply contain LCP to one or more sections of the page, how much of the value would already be delivered?
- Tim: you're saying LCP for pages + LCP for sections?
- Bas: yeah, exactly
- Jase: if we did have sub-LCP, it probably wouldn't fix the Bloomberg terminal example, because it would just pick some cell
- … I think other components would have sub-LCP pick something that doesn't reflect the component as a whole
- Michal: What if I just say that a container reflects LCP?
- Jase: browser would need to know what makes the container painted
- Tim: isn't the goal not necessarily the largest piece of content, but when one or more most important things (like Buy button) are painted?
- Michal: I'm excited with the progress you've made & regular updates, would like to see this move forward
- … if we can get useful defaults so you can usually just say "this is a container", that would be interesting
- Jase: we should talk more this week
Summary
- Problem: No information on partial network failures for navigation/resources
- Other CDNs and operators agree the information is critical
- Unclear if it’s privacy safe to expose it
- Differential privacy may be the path forward for aggregated reporting of these issues
- Need to collaborate with relevant IETF WG
Minutes
- Nic: presentation from a few of us at Akamai interested in pushing forward reporting API or something else, around getter insight into "happy eyeballs" behaviors
- … we work on IETF related things, and happy eyeballs is a key thing there we want visibility into
- Utkarsh Goel (Akamai): in the last decade or so, we've all worked on new standards (v6, H2, H3-over-quic, etc)
- … ISPs have different configurations, e.g. v6 supported some places, elsewhere not, elsewhere doesn't work well, sometimes routing misconfig
- … similar with QUIC
- … happy eyeballs tries to prefer newer protocols (AAAA before A, h3 over h2, etc)
- … initially a client would prefer an IPv6 connection, AAAA DNS lookup, tries TCP handshake, and then request/response happens
- … but because the client doesn't know whether the v6 connection will succeed, after a delay, it starts an IPv4 lookup & TCP handshake
- … in the meantime, the IPv6 handshake may succeed, and the IPv4 connection is abandoned
- … on the other hand, if the IPv6 connection fails or is slow, the IPv4 connection succeeds first and the IPv6 connection is abandoned
- … current browsers report absolute successes or failures
- … the problem is that a lot of things are tried in the middle, and none of that is reported
- … NEL assumes A happens, then B, then C, but that's not always the case
- … some clients even do parallel DNS lookups, some do it serially; similar for TCTP connections
- … there are things that are retried but aren't an absolute success or failure, because there was a fallback
- … some connections start a TCP handshake while waiting for QUIC
- … some DNS connection records guide connection behavior if present
- … in the red group, we have errors; right side we have successful connections that have 2xx responses
- … in the overlap, we have fallback that is not visible today
- … there is also post-navigation events that area also not reported today, like TCP resets due to NAT rebinding
- … path MTU is another thing where clients send packets smaller than MTU, but later packets are bigger and break the IPv6 connection – but we lack visibility into those details today
- … we can't understand what the problem really is because there's always a fallback
- … what we really don't know is: when do fallbacks happen? Is it because an AAAA lookup timed out? Maybe AAAA doesn't exist? tried IPv6 but timed out? was slower and caused fallback to IPv4?
- … did the IPv6 connection break because packet sizes were too big?
- … QUIC problems: maybe some ISPs are dropping packets, but happy eyeballs falls back to a non-QUIC behavior
- … maybe ISPs are rate-limiting QUIC packets, which is hard to identify today
- Nic: Erik filed issues covering some of these details: w3c/network-error-logging 175 and 176
- … 175: tried v6, fallback to v4, maybe it was slower than desired
- … useful for CDNs and ISPs to monitor problems
- … can help reduce unreliability of clients interacting with CDNs and ISPs, which is hard to monitor today
- … 176: when we do have a successful connection or even navigation, streams can get lost, TCP stream might need to be restarted, etc
- … rather invisible right now, but might be indicative of CDN/ISP issues that would be useful to monitor
- … wanted to start discussion with this WG to see if there are other entities interested in understanding this better
- … are there new insights we can gain?
- … why did the fallback happen? Was it too slow and eventually succeeded? Was it broken? Was some ISP or network unable to satisfy the connection? Which protocols were tried? How long were they tried for before abandonment?
- … would it have eventually succeeded? maybe browsers could get more insight into the timeouts they're using for connection establishment
- … was it a problem with the network, server, client, or some combination of them?
- … today NEL is somewhat similar, part of the reporting API
- … can register to learn about fraction of requests which failed or succeeded
- … lots of questions about whether this is possible to surface, privacy implications
- … is NEL a good fit to be extended for this, or should we do something else?
- … questions
- … NEL has "phases": DNS, TCP, HTTP request, etc
- … this is slightly different from reality in some cases (might be happening in parallel, might be multiple happening)
- … would we have to re-model those concepts to reflect this?
- … Do we try to fit this into NEL, or do we try to fix the specific use case of happy eyeballs, and have some other reporting mechanism?
- … IETF is considering a working group for happy eyeballs evolution; we should coordinate with them
- Erik Nygren (Akamai): Every time happy eyeballs come up at IETF, one of the concerns that is raised is the degree to which it hides problems with networks and servers, and the need to gather information about it.
- … reporting from the client has been seen as outside the remit as IETF, so NEL might be an opportunity to gather this
- … beyond v6/v4 fallback, evolution would cover cases like QUIC and service binding records, etc
- Nic: deployment challenges negatively affecting users and origins – would you mind sharing an example we've come across as a CDN?
- Erik: we've seen quite a few cases:
- … end-user networks breaking IPv6 and keeping IPv4 working, hard to track down but causing performance problems with users on those networks
- … some clients handle happy eyeballs well, but not all
- … inability to catch those problems causes content getting switched to IPv4 only to make older clients work better at the expense of more modern clients
- … another example: a number of large server operators have broken IPv6 in common ways: MTU discover (?), often hidden behind happy eyeballs, but having better visibility on the server operator side would make it easier to operationally respond to these issues
- [end recording]
- Adam Rice (Google): presentation didn't mention: connection to first IP in a DNS result fails and then connects to the second; this is similarly hidden
- … in Chrome (unsure about other browsers), if we don't feel OS SYN retry is aggressive enough, after some number of milliseconds we start a second connection which might win over the first one, but this isn't visible to the page
- … it might also be interesting to surface this in JS resource timing API
- … we're doing a lot more stuff than Resource Timing exposes, so we fiddle with the numbers to align with what the spec expects; might be worth revisiting
- … assuming, of course, that the privacy issues are somehow resolved
- Michal: Nic, can you say more about resource timing?
- Nic: most of the NEL stuff today is focused on the navigation, but most of this applies to the resources as well (connection establishment, etc)
- … does NEL give you success reports for subresources?
- ?: just failures
- Nic: some indication that there was a sub-optimal success
- Lucas Purdue (Cloudflare): I agree with everything the Akamai folks presented; I spend a lot of time triaging customer issues that happen and then don't happen anymore
- … real durability gap affecting our customers
- … users might reload or switch networks
- … this causes frustration because it's hard to track down
- … I've been trying to optimize our QUIC implementation, and the browser is doing the best job it can trying to keep the experience seamless, but frustrating when things break and there's no way to tell
- … requested something from Chrome (?), would need an origin trial, privacy considerations
- … any more visibility than today would be helpful
- … doing it in the simplest way, though, might not give us the correct answers without the full picture
- … a resolution might require more insight than the first failure
- … some customers use multiple CDN vendors, and the pace of deployment across them can vary
- … we run our own VPN, and if you try to use h3 or QUIC it will fail, and Chrome will mark it as broken in an Alt-Svc cache, even if you access it on another network later
- … we're adding like 10 dimensions of possible failure; the timing and error reporting model is not fit for purpose in my opinion
- … would like to improve it, have some ideas, would love to agree we should work on that
- … posted in the Zoom chat about an issue a month or two ago, re. near zero DNS time for h2 but not h3; Barry Pollard put me in touch with someone else on Chrome who found this was a known issue which has since been fixed (ht to the people who fixed this)
- Utkarsh: there are privacy challenges about the infrastructure the client is using and where they are physically
- Nidhi Jaju (Google): a lot of interest on the IETF side; I definitely think this is worth solving
- … this is definitely within the charter of the IETF group, worth solving
- … we don't delay the DNS query; we issue the A and AAAA queries in parallel, but we do delay the handshake
- … on the NEL side, if we have this API, it does make sense to bring it up there
- … but NEL spec has limitations around multiple IP addresses for an origin
- … we wouldn't be able to report errors for things we don't end up connecting to
- Utkarsh: two happy eyeballs behaviors I found between Android and iOS
- … I understood that it starts at AAAA, waits 300 ms, then starts A lookup
- … other one sends both queries at once
- … things may have changed recently, but my impression was that different platforms implement happy eyeballs differently
- Eric Kinnear (Apple): I own happy eyeballs on Apple platforms
- … do send both queries in parallel
- … not first which comes back wins; we do wait a little while if the A comes back if we want to prefer AAAA
- … there is spec text that should be cross-platform, but there is some variation
- … bigger concern is the privacy story: how do you know for someone that you've never connected to, how they would like to have errors reported?
- Bas: some sort of differential privacy? if you collect this on an individual level, (not viable?)
- … our negative position on NEL revolved around privacy issues; I know that has been improved and is being reviewed
- … but this opens whole new cans of worms, and for the web as a platform it would be easy to get the actionable information you're looking for while also satisfying privacy concerns
- … those mechanisms aren't well explored yet for data collection on the web
- Lucas: colleagues in the research team are working closely with browser folks about private connection logging stuff
- … don't know if others in the room can talk more
- … some non-NEL thing might be able to provide aggregate information, even with higher latency than NEL
- … e.g. knowing with a 1 day delay that some ISP has been broken for a long time is okay
- … right now I'd take anything
- Eric: it does feel like it would be interesting to do two things
- … first, update existing spec to make fewer assumptions about a single address per origin etc
- … second, we've managed to find ways that are not super destination specific, to report aggregate stats
- … e.g. "this took too long and eventually succeeded later"
- … happy eyeballs works because most of the time you only talk to one or two endpoints
- … is it possible to define something much less specific – not fine-grained info about everything that was tried
- … but just saying "I didn't think QUIC was possible here" might be safe in a way that "I tried this IP, with this time limit, …" is too detailed
- Bas: probably not exposing new information, just telling them something is misconfigured?
- Eric: there's a paragraph in the spec (privacy considerations): not trying to tell the server anything it couldn't already know
- … e.g. the server knows it's not being talked to via QUIC
- Bas: [didn't catch]
- Utkarsh: reporting what happened vs why it happened
- Bas: intuitively, still seems like there are privacy concerns with that
- Eric: concrete example: we collect from clients the index of the attempt, which was the successful one
- … >90% the index is 0
- … if you're an operator, you notice that the peak is at 4, that says that you ought to go fire up a client and try it
- … on one hand, that's not something the origin previously knew
- … OTOH, it would be worth doing deeper analysis, but it doesn't seem like a very sensitive piece of information to expose
- … would something even at that level of abstraction be useful?
- Erik: comes down to where along that curve is the sweet spot between minimizing privacy impact and getting as much useful info for debugging as possible
- … may or may not be worth considering: what are the other straightforward ways a server operator wanting to abuse this for privacy reasons could collect the same data anyway?
- … if someone wanted to abuse happy eyeballs (e.g. to fingerprint clients), it might be doable and if so it's unclear that this proposal would have substantial additional privacy impact, as compared to observing client behavior in the successful case
- … Eric's example of index: is it just the index, or could it be the index + some additional attributes of the previous things tried, but without server IP addresses (e.g., v6 or v4, h3/h2, …)
- … how much worse off was the IPv6 one than the IPv4 one: would it have never succeeded, or taken longer?
- Bas: privacy mitigation methodology disappears if you have that much information
- Eric: Are you thinking about operators sending clients TCP resets after some number of milliseconds and then seeing that in the reports?
- Erik: e.g. giving different v4/v6 addresses to every client
- Eric: more of a problem as we move toward a model with secure DNS connection to resolvers
- … if I can give you a unique IP address, I don't need an error report to notice that
- … the spec today already covers making sure I cannot introspect someone else's behavior via my own reports
- Nic: two open issues on GitHub if anyone else has any thoughts
- … other next steps?
- Nidhi: would be worth meeting (with the IETF WG?)
- Bas: how to use differential aggregate privacy is an interesting question going forward
- … that's the solution to these problems in my mind, not "how much privacy can we sacrifice?"
Summary
- Adding interimResponseStart for Early Hints changed the semantics of responseStart - we shouldn’t have done that..
- Developers now need browser-specific logic
- Unclear why we should treat Early Hints differently from early flush
- Discussion on whether we should change the semantics back, and add a finalResponseStart (to be continued in this issue)
- There may be a case for other TTXB metrics since ResourceTiming has holes (mostly due to “browser time”)
Minutes
Barry: Investigated TTFB and it’s painful
… TTFB is one of the oldest metrics and also one of the most ill defined
… There’s no spec for TTFB
… MDN defines it as responseStart - navigationStart
… uses the older navigation timing, so needs updating
… web.dev has a similar definition
… but also has alternative definitions.. :/
…
…
… not useful for what people say it’s useful for
… It tries to cover all of
… What we actually need it parts of that section
… I tried to do this in webvitals.js, but it turned out not being possible
… redirects have an old issue - it’s kinda measurable, but not really
… No timing for the HTTP cache, if there’s no worker you can kinda guess it
… connections get messy with http3
… because browsers use happy eyeballs, the connections used may not be the ones reported
…
… There are lots of bits where browsers are doing unmeasurable things
… <insert image on subparts>
… TTFB can’t tell you what is wrong, just that something is wrong. Very easy to misunderstand
… Also, what does the first byte measure?
…
… We fixed it (issue #345)
… Implemented firstInterimResponseStart and made responseStart back to be the old response start
… This was effectively a breaking change and browsers that don’t support this are sending the older value
… Now we’re getting different results from different browsers
… So RUM gives you very different TTFB results
… Also not implemented good across tools, and CrUX doesn’t use it
… …
… Changing ill-defined things is hard
(let the record show, that Barry is NOT throwing shade)
… so we just broke it more..
… Is it only Early Hints?
… But why are early hints different from early flushing?
… For both of them the goal is to let the browser start work earlier
… differences around HTTP status and rendering, but they are both similar
… Should we try to fix this? What should it be?
… Interested in Firefox and Safari to implement the firstInterimResponse?
… We need to do better and avoid changing semantics of existing things
… Do we also need to have more holistic solutions?
… Could we have solved early hints and early flush together?
… Can we try to revert firstInterimResponse? Or is it too late?
Philip: TTFB came from “80% of load time is frontend”
… It was added originally as a comparison with existing tools
… I am OK with getting rid of the concept of TTFB
Pat: Less eager to get rid of it. Helpful to track down infra vs frontend
… Out of the gate, we need to stay away from durations as they don’t handle async things very well
… If we race things, durations just fall apart
… UNO represents a real pain. Renderers on slow Android can be very slow and developers can’t do anything about it
… being able to account for that is valuable
Barry: So we’d need to flesh out resource timing for that
Pat: How granular do we need to be? Should RT become tracing from the field?
Michal: My intuition was “time to first useful byte”. The “render blocking” concept may be it?
… Minimal time before the UA could actually start rendering anything at all. Not necessarily when it actually unblocked rendering, but the first chance it could have allowed rendering if it weren't blocked.
… May wait up for a second to get blocking resources if you had enough content to start rendering
… As long as there’s a stream of bytes waiting from the server and we haven’t started rendering..
Barry: Not just rendering. For LCP, it’s the first time you could start fetching an LCP resource
… Even while fetching blocking content, the fact you could fetch that LCP resource is useful to show to people
… Delaying that to render, is basically paint timing
Michal: even if we’re not rendering because we’re render blocked, it’d mark the first point in time where we could start doing work
Bas: You’d almost want to tell people how much time the UA spent idle
Barry: kinda what we measure in LCP sub-parts
NoamH: so “time to head read”, minimum bar to start reading more, fetch blocking scripts, etc
Yoav: </head> end?
NoamH: start of the head tag?
Michal: but start of head could still not be useful
Philip: that’s how we used to implement it - a script tag at the start of the head
Pat: The point at which you can start measuring things in script. So you could theoretically measure this in script
… but is header the first byte of content
Barry: early hints can be used to start useful work from the browser. So in some ways it’s better than early flush
Pat: But the results show up in later measurements.
Barry: In practice they won’t
Bas: So in the slide of “things TTFB is useful for”, are all of those things needed? Do we have ways of solving those?
Barry: Server timing could be useful for server response time.
… “cross-browser supported” is interesting.
… We kinda have metrics for most of them
Michal: Interop issue with firstinterimResponseStart, what part of this is exposed?
Barry: We’ve change the semantics of responseStart
Bas: Doesn’t sound like it’d be difficult to change
Sean: responseStart changes - did Chrome do an analysis of the change?
… we haven’t so didn’t implement it
Barry: the origin plan was for finalResponseHeaderStart
Bas: so there’s nothing that’s the current responseStart
Barry: so we’d need responseStart to still point at firstInterimResponseStart
Nic: This would allow RUM providers to choose which version of this they want to support
Bas: So chrome would change back
Jase: What's the baseline?
Barry: “for things like LCP, look at your TTFB first”
Bas: We also do both TTFB and time origin as a baseline
Yoav: With API owner hat on, this all depends on how big of a breakage we would have if we want to expose another thing, and keep interimResponseStart where it is, but at the same time change back the responseStart semantics.
... People that collect interim would be a good proxy
... If they got broken, they already got broken when Chromium shipped the first version
... Use counter data would be useful
Michal: I'll follow-up after
Barry: We use responseStart in web-vitals.js, just measuring firstInterimresponseStart won't measure breakage
Yoav: People who use firstInterimResponseStart and have Early Hints would be useful to measure on the Chrome side
... This could inform that decision
Barry: On Chromium side we haven't implemented firstInterimResponseStart in telemetry
Michal: What is firstInterimResponseStart if there is no response?
Barry: 0
Michal: What if we exposed all of the timings, and responseStart is the smallest of those if non-0/non-null
Yoav: You're suggesting two breaking changes rather than one?
Michal: Value that responseStart returned changed recently, the values in dashboards will change but usage won't break
Bas: What I like is that responseStart then do what the name suggests it does
Yoav: More complex from compat perspective if there's reasonable usage from interim one
Bas: From FF perspective we're interested in people having useful information
Yoav: w/ Shopify hat on, there is code that distinguishes between browsers that determines what the value means. Would be great to fix that.
Bas: I would prefer on a complete solution to only work on something once
Yoav: Figure out what's wanted end state
Barry: I will open an issue (update: commented on the existing issue which is still open)
Summary
- Lots of “pervasive” assets out there, being downloaded again and again
- Two ways to solve this
- Server-defined public resources + browser enforcement
- Browser-provided compression dictionary
- Broad agreement that this is an important problem to solve
- Concerns about a compression dictionary influencing the framework market
- Agreement that compression dictionary is likely to provide larger benefits due to constantly changing resources
Minutes
Pat: We've discussed this for decades, cache partitioning, etc
... Let's look at it again
... One of the things we did in HTTP Archive ~2mo ago was added sha256 hash of every response body, even things we don't store
... images, responses, etc
... So we can see duplication of resource across web independent of URLs
... e.g. jQuery shipped by thousands of sites
... 17,034 "pervasive" responses
... 150 common across > 1 million sites
... Since this is a SHA hash of entire body of response, is it doesn't handle bundled code that's been minified and grouped together
... Anything edited to e.g. add/remove copyright
... Ran query against BigQuery for HTTP archive
... CSV of 17k of resources
... Basically 2 or 3 types of resources that showed up
... 3P embeds, YouTube desktop player
... Analytics, ads code. Common URL used by sites, controlled by 3P.
... Same thing goes for first-party things like Shopify, Wix, where they control the platform
... Then slower-changing long-tail of resources: libraries, CMS, wordpress, jQuery
... Rev less frequently but when new versions get released
... By far most popular things were scripts
... Surprised fonts showed as much as they did, including Font Awesome
... Some binary fonts showing up on many origins
... Some CSS and a few images from e.g. WordPress templates or Wix template
... By far most popular was Google Analytics
... On 14.5 M pages of ~50M
... WordPress by far drive lion-share of jQuery standalone file usage on the web
... Libraries for jQuery e.g. jQuery UI
... Recaptcha is relatively large at 217K compressed
... YouTube largest one at 800 KB for player, a few million sites as well
... But not just largest libraries, Font Awesome, Google Maps, lot of Shopify code and libraries embedded in sites
...
... Cache partitioning and how we got to where we are
... Most browsers do triple-keyed cache for resources, so no more sharing of downloaded resources across top-level sites
... On Chrome it's top-level site, frame origin and resource URL (e.g. 2 resources from 2 iframes are fetched independently)
... TKC performance impact is negligible, but data coming from that report and UMA, is usage-based. People that visit the same site aren't going to see a difference. It's a single or first page load. Doesn't show in 75 percentile of UMA data
... Unpartitioned cached risks
... If you put unique data inside payload you can use it as a cookie
... Put User ID in response, re-generate as origin, you can use it as a cookie
... Or, existence of a file at all. e.g. 32 files you have a 32-bit number depending on what resources you put into a user's cache, existence of those tells you who those are.
... Leaking the history of sites or technologies the user has been to is sensitive. Not a site decision, it's a user decision. Example is a map tile or auth provider, knowing you've looked at a map tile for a sensitive location could target you.
... Or knowing you've been to a certain banks' website makes you an easier phishing target
... Site may not care, but individual users could
... Unpartitioned cache doesn't solve jQuery problem as they're all hosted on many sites (unless you're using a well-known CDN which doesn't happen and has connection overhead and everything)
... If we have the client use integrity attribute on resource or on a fetch, we can eliminate the explicit tracking of the payload
... No way to add additional data to the client that the client didn't already know
... Can we identify resources that are well-known, public, static, unchanging. We allowlist them for example
... Is there a way to declaratively say they're immutable and public (opt-in for consideration)
... Need to know at time of fetch if you're looking at share cache if it's public
... In HTTP we have public and immutable headers, but we don't have anything until response that we know that
... For now let's assume we have new attributes that says it's public and immutable, and you have the integrity
... If there's a trusted cache and it's requested by a lot of clients, can we get to a point where then it's used in a shared cache that we know it's pervasive, unmodified and not targeting specific users
... Not probe-able
... Doesn't solve problems, it makes them more probabilistic
... Doesn't completely eliminate the privacy concerns
... Still leaking information about some level of history
... A browser with an empty cache, probing with a known resource like GA. If a user doesn't have GA in it, maybe it's never been to a site with GA or maybe it's just not populated.
... Doesn't solve jQuery problem with sites saving same copy at different URLs
... Maybe use SRI for cache indexing? But security nightmare
... Resources served from a given origin and for that origin to not be partition
... Nidhi and others can give more background on the experiment we can try
... Ran an experiment, manual list items had expired, difficult to tackle without automation
... Brings me to dictionaries
... Is a pervasive dictionary something safer to share perhaps
... May not solve jQuery problem unless we look at pre installed dictionary
... Brotli shows with a small dictionary with common web terms in it is useful
... Can we ship with a larger dictionary, Compression Dictionary stuff, Available-Dictionary that the client has for compression
... Like Brotli dictionary but maybe larger, versioned over the years, 2024 version of "web" dictionary that has the current versions of React and other things commonly seen on the web
... We previously talked about this, e.g. no 1 version of jQuery everyone's using -- and we pin people to stick towards that
... Compression Dictionaries are interesting because even if we ship with one version, subsequent versions would compress very well against it
... Site is sending what it will send, it's still on the sites to update as it wants
... Not just the the one version we happen to include
... Deters updates a little bit, but solves problem self-hosting standalone or bundling jQuery within another file
... We will be favoring whatever resources in that dictionary -- does React have a perf benefit in e.g. 2024 that new libraries will have to overcome
... Tradeoff of pushing innovation on the web vs. savings bytes
... For jQuery where would we draw that line?
... Legal issues to sort out about what can be included in a dictionary, are there copyright concerns?
MNot: I like the first approach, it seems promising to me. I don't know if I would leverage cache-control: immutable in public, they're specific, might cause confusion to reuse
... Sounds promising
... If predicated on privacy-preserving proxies, maybe we could do some interesting things with that
... Doesn't have problem with 2nd solution which is by choosing what to do in dictionary, we're choosing winners
... Distorts marketplace, seems problematic
Christian: Generally a statement of support
... Thanks for sharing data
... We have a breakout on Wed AM about the problem that could be solved with cross-site caching of pervasive assets
... We download country data on every store across many domains, if could cache it could help
Pat: Country data is example where both paths have intersection and divergence
... If we make it immutable, that exact copy from that origin may be available
... If we include JSON blob in well-known dictionary, even if it's not a library, we could compress everyone's data with the same thing
... Sites could still make those decisions without relying on one known shared version, still compress as well
... Libraries and site-specific code may be verboten? Large blobs in JSON maybe makes sense?
... Shame for one specific shopify dictionary be cached but other sites not be able to benefit
... Both potential ways to solve Shopify-specific case
Michal: Question about 2nd solution and "Deters updates/code changes"
... If updated to a newer version not included in dictionary, less efficiently compressed
... Better than the status quo today
... Any use-case I can think of, this only improves that problem (people not wanting to update because inefficient)
Pat: Unless there's a strong reason to move to the ".2" version if it's e.g. ~5 KB bigger, and a site w/ infrequent visits, they're usually downloading the full version today. We give them a magic bullet where jQuery is faster. Now if the two versions are a different size, the second version may be bigger.
Michal: Over the course of the year we'd slowly see it perform less, but it's better than the status quo
Yoav: Could be a monthly thing
Pat: Where origins or CDNs that have a rolling window of dictionaries available
... If we're asking sites to update dictionaries they're compressing against monthly, manual has coordination problem
... I think there's a people problem, yes cost was way worse yesterday, but new baseline going forward
Michal: Everyone after update is seeing less efficient resources, I think with this everyone would be more efficient going forward
Noam: Can you expand more on the SRI idea that had security concerns?
Pat: If you have the hash of jQuery "1234" and you want to access it from your URL as SRI://1234. If you pull it directly from the cache, you could in theory bypass CSP where given origin may not serve that resource, but since it was already in cache you could pretend it came from that origin without having to fetch it
Noam: No hash collision?
Pat: No stuffing something into the cache with SRI, it's available from every origin you want to pull it from
... You want to be able to go to an origin to ensure the SRI from that origin matches hash
Evan: When you update the dictionary, do you not need the old dictionary to decompress?
Pat: Browser impl detail -- Chrome is the only one that's implemented? Chrome caches the decompressed version of resources, otherwise you get into a cascade problem: fetch origin resource, then delta update, etc, you need to keep the whole chain
... Chrome's current version once fetched over wire, keeps the dictionary-decompressed version
... Whether that's compressed on disk or not
... If you have well-known dictionaries, you may want to keep older versions of that dictionary around
Nic: Is it possible to have multiple “standard” dictionaries?
Pat: It complicates things a bit and you’d have to have attributes on the fetch side
… You can use compression dictionaries with the link rel dictionary.
… you’d have to fetch it on demand, and it won’t be usable across origins
… That has privacy concerns
… We could have defined 10 well-known dictionaries, but it’s more complicated than having a single large dictionary
Bas: It changes the “pick the winner” dynamics
Pat: to avoid all privacy concerns, you want this to be something that the browser already downloaded
Yoav: To tackle the point of deciding winners, the dictionary, have you looked into diffs of React vs. Future React vs. Preact, how much of a difference does that make in practice?
... Do developers care about JS size a whole lot?
... Diff in libraries that perform similar tasks
Pat: React 17 vs. previous version, how effective, it was fairly effective. 50-60% maybe. How much of a complete rewrite?
... Wrinkle is bundlers, where they rename things differently, work with bundlers to be more consistent with e.g. naming so it compresses better
... If function names and vars, maybe the code flow still compresses away
... Idio Synchronicities that get into when things get bundled
... How much does Preact look like React for core? Modules that depend on the two may be similar in how they use the libraries for example
Mnot: Maybe not target any specific libraries, or common functions. AI ALL THE THINGS. <joke>
Bas: As a matter of principle, I struggle with the idea of favoring any particular distributor of frameworks. Even 10-20% optics are very bad.
Yoav: Crawl the web, throw all resources into a thing.
Mnot: Creates a feedback loop
Eric: Is there a way where we can have some things that are popular, some of "new" upcoming things?
Bas: Whatever methodology would have to be agnostic
Eric: If I want to publish a new library, if I can get me and 100 friends be in the "newcomers" part of the list
Mnot: A new library already has a lot going against it, adding "now less performant", you'll have a less diverse field
Michal: Relative cost may decrease
... Today the overall increased cost, of a new library is a lot
... Fetching jQuery may be 100% efficient, but a new library is only 10%, but still a lot cheaper
MNot: Today but not for new baseline
Kannan: ??
Michal: Inequality
Bas: Optics of someone having to pick
Pat: Would we be whitewashing by throwing AI on it? Would pick popular things on the web today anyway.
... Whether or not that's explicit decision to de-prioritize some exact copies of React in library, I'm not sure just throwing machine algorithm at it
Bas: Bias with extra steps
Dominic: User Agents do this sort of thing today, optics not bad. Brotli dictionary. Non-perf things where top 200k sites get media auto-play.
... Other optimizations in the browser, bias exists
... e.g. JavaScript Map options
Bas: PGO
Yoav: Dictionary isn't necessarily standardized
Mnot: Might be interested in data with React version N, N+1 bump vs. another library with similar functionality but different code base
Yoav: How much bias would be introduced
Kannan: I feel like part of the discussion with picking winners and losers and biases, almost all decisions we make in standards picks some winners and some losers. Sounds like valuable optimization that may help users right now.
... Maybe log scale weighing against popularity, present and publish that's what we're doing
Pat: Other wrinkle to throw into it, do we include third-party embeds in this scrape? Always from same origin
... Like facebook events JS, ga.js, always coming from one URL?
... Pierce cache boundaries in some way?
... If we create dictionary crawl based, those would show up
... Or only things common from different URLs
Dominic: We want users to see the benefits
Mnot: Users also benefit from a diverse ecosystem
Bas: and innovation
Pat: May want to discuss this further
... Keep thinking about both paths, see if there are avenues to make it more valuable
Guohui: What if we throw in some randomness from the browser, so we can see protect against history testing
Pat: You can reduce the likelihood, but not eliminate it
... Is there a line where probabilistic coding goes away
... Some options where rolling the dice you miss the cache intentionally? Is that enough?
... Or only after seeing N number of sites?
... Some options to make it more probabilistic, but nothing that eliminates all concerns
... You wouldn't want to pre-download all 150 resources, to completely eliminate it
... Those resources get rev'd frequently
... There is some line where each browser vendor's team makes trade-offs, where it is and if it's absolute I'm not sure where it lands for everyone
... No way I've seen to be able to do some form of piercing
Yoav: One thing that could be interesting is to compare approaches from benefit perspective
... Seems to me naively Compression Dictionary approach is more resilient to changes over time
... Where exact-cache-match would have potentially have less efficiently
Pat: Dictionaries resilient to change, solve first-visit problem
... For something like a YT embed, that put videos into their pages, the exact match over the course of a year (YT revs code) the dictionary will age over time
Yoav: If we used shared cache for Compression Dictionaries that would be better
Pat: If you can exclude resources you put in shared cache from this dictionary it would be better
Philip: With Shared cache we'd want to avoid steering site developers toward specific vendors, e.g. Google Analytics and Boomerang are always top sites.
Pat: If we're OK with self-declaring immutable and it shows up on e.g. 5+ sites then you lower that bar significantly. Doesn't have to be GA well-known
Nidhi: In our experiment, several URLs were changing, but the changes were minor, so dictionaries would be more resilient to this
WebPerf Admin / Specs / Incubation
Summary:
- Someone™ needs to read both the MDN web performance section and the primer and figure out the diffs
- MDN may currently only contain ref docs, we need a guide for people to get into monitoring
- Dave Hunt and Barry Pollard are interested in getting involved
- Estelle and other MDN content folk are around so we should chat with them
- Links:
Yoav: We have LCP adopted but e.g. ElementTiming is not
Michal: Ian refactored, some of those things moved into LCP
... ElementTiming is minimal now
Michal: Chromium's implementation that supports LCP and ElementTiming was leveraged for Container Timing
... Under the hood, ElementTiming off, Container Timing could still work
... ElementTiming attribute
Yoav: Expand semantics of Element Timing or have opt-in to this new Container Mode
Bas: From the perspective of using Container Timing, would it be adopted in some form close to what it is now? Would it eliminate the need for ElementTiming or not?
Yoav: Feels like a superset
Barry: I thought we had not' adopted ElementTiming because use-cases weren't clear. Aren't ContainerTiming superset? How can we say one has a use case and the other doesn't?
Bas: Where and when the use-case for ElementTiming wasn't clear, do those apply to Container Timing. Any more context.
... Seems like some clear use-cases for Container Timing that it seems reasonable
NoamH: Some argument that ElementTiming didn't support use-cases of containers, so a variant of ET with containers would solve problem
Bas: Individual elements may not have correct granularity so unless you have Container timing it's not useful enough
Michal: If you aggressively apply ElementTimings it gives you a lot of visibility into many timings
... Also argument that polyfills could do this
Yoav: Some argument about :visited visibility. Maybe those arguments are no longer valid that the visited cache could be partitioned?
Michal: That's already leaky, and ET gives you a bit more control
Yoav: If we eliminate that leak, observing the paint of something that doesn't leak
Michal: FCP is exposing same details, for a specific :visited
Barry: Are we saying there's interest in CT vs ET, or both?
Yoav: Seems more interest in CT potentially
... Seems too early to talk about adoption, but glad to see interest
Michal: Web developers are positive, Chrome has shipped, if Firefox would ship would that be the bar?
Bas: Don't know we've done the work, LCP work would make it easier to do now
Nic: It’s a good rough start for people that may want to use web performance
Michal: there’s no way to know what resource to fetch before the first byte arrives
… but in practice you really need that resource to be fetched and a slot to stuff that resource into
… Early hints can get the resource fetch early, but that won’t help you if the LCP resource is not there
… The slot on the page is a different “byte”
… Both timings are interesting
Bas: assumption that other than the preloadScanner the static DOM contains all the slots
… that’s not true today
Michal: pendulum is swinging
Bas: DCL
Barry: We published some analysis on this, put some recommendations for people for how to improve their LCP
... e.g. we can say things like stop optimizing servers, or minimizing images, because developers will often concentrate on the wrong part
... e.g. slow part is images in HTML is being detected later
... Getting image downloaded faster isn't going to help
Yoav: I think TTFB isn't the right tool, ResourceTiming initiator, play around with resource loading graph
Bas: Complete critical path you'd analyze locally
... But to see if you did good and optimized
Yoav: Critical path for LCP could be consent management, third party
Barry: Redirect times, measure locally, go to site via an ad that goes to many things, you need RUM to measure
Michal: Don't disagree, but it's focused on resource flow. There's two flows.
Yoav: Create 2 initiators
Michal: TTFB in my mind it's saying what's the earliest time the slot could've been discovered? Resource initiator.
... And then you finally discovered the actual slot. Load delay.
Bas: For IMG, is it the time you get the IMG, the bytes describing image tab, renderer processed IMG tag
Michal: Prescanner would be sufficient I think
Yoav: RT gives you that
Michal: LCP gives final time after render blocking has been unblocked, you already have that
... If it took a long time, what were we waiting for?
Bas: ElementTiming or some API should have some field, this was the "last" bit of work I had to do to get it
... As a site creator you could make changes
... If decoding of IMG that was big to decode, the last step to get image on the screen it was decode. Decode was bottleneck.
... Decode faster, change decompression, then the thing you need to fix is the DOM element created or reflowed.
... Way to tell developer to optimize X to get faster
Barry: We've done in Chrome side, we have 4 segments. Before TTFB doc problem. From TTFB until resource download is initiated. Download time. Time after Download until rendered, decode time. Those phases work pretty good in optimizations
Michal: What happens w/ Early Hints?
Barry: EH doesn't work, as implemented by Chrome is busted, both get resource, put in cache, then forget it ever got it. Not linked to download times.
... Duration effectively 0
Noam: Race to download fetches
Bas: I don't think we fix race cache with EH things
Barry: One reason why Chrome devrel team didn't want to move responseStart because it doesn't help w/ metrics
... Agree this is a powerful way of doing it
... At today you can still carve up timings
Michal: ElementTiming, right now we have loadTime, and renderTime. Often there's some amount of delay from the actual paint instruction until you render. Some proposals to expose paint that issued the final image. First Layout where that image was in the layout.
... This slot was already in the page
... Start of the layout
Noam: Resource was ready?
Michal: Document was ready to have the resource slotted in if the document was available
... Those pieces all need to align
... Document-centric
... If there was no problem getting to layout to slot, then etc -- you look into ResourceTiming to find out why?
Michal: When is the first time the actual IMG element goes to get it
Bas: Creation of the frame, layoutstart
... Creation of DOM element, IMG maybe special-cased to decode earlier
... Not yet slot you're talking about
... What defines the slot here?
Michal: In my mind, if resource is fully loaded and cached and available, you could begin decode now
Bas: IMGs get cached decoded
Michal: Will improve performance of decode once requested
.... Promse.all() on those things
... Sounds like timings are all more document-centric than expected
Barry: Always going to be boundaries when you have many different segments
... Due to X reasons
... Difference between download finishing and paint happening, and there could be various reasons
Bas: When you're exposing to web developers, you assume this is when the browser had doc, had your slot, up to user agent to do all the other work. You can't do anything about that anyway, so we don't give you times
... But you can affect it, so what should we expose?
Michal: Barry's breakdowns are useful.
... In particular let's say a 10 second long task, IMG maybe already decoded, we have a 10s render delay in breakdowns. An alternative is to split that further or move those timings. I want to know about the input delay.
Barry: We've been advocating these 4 breakdowns. Could have more, but won't cover every scenario. Simplicity is better, would rather have 4
Tim: Perf stuff is more like "enhance" on TV shows. Each time you zoom in you get more detail. Every metric is a bit of a black box consisting of sub-phases and breakdowns.
... Good exercise to make sure we can always answer "so what" do we do about it?
... Here's exactly what's delaying my render
... Do we have the proper metrics underneath, so I can figure out granular things for what to do about it
... Render delay is one of the harder ones to pinpoint the "why"
Bas: Can include anything browser wants to do
Barry: At what point do we gather too much info in field, and move to tracing instead
... Same discussion with INP. 3 phases instead of 4. Need more details.
... Latest proposal is everyone needs to collect 56 pieces of data and send back to RUM provider
... That's too much for average user, developer
... We'd all love that level of detail
... Too much is dangerous
... "Enhance" is great, but simplicity wins a lot of the time
Bas: Tim's suggestion is there is something relatively simple. If it is possible for UA to say primary factor in render delay is long running JS on main thread, or X X X, then actionable for most developers
... That's a thing that's hard for the user agent to do well
Barry: We've looked at this into things like Lighthouse
... Works great when 95% of the time it's X
... But harder when less percentages
... Gets really messy
... We want to do that, but we're having problems aggregating
Bas: Not impossible for UA to categorize things in its top level event loop
... Provide list (censored) saying N time decode, N time internal, etc
... Quite extensive thing to do
Noam: Usually I feel there's no such thing as too much data
... Even if not actionable by "regular" web dev, but could be useful if shared by UA vendors
... e.g. issue in 1% of cases, repro is hard. But if aggregate data and internal metrics, then you can share that with a bug report, it can be useful to solve issues.
Bas: We can already do that, repro in the wild. Same as grabbing a profile.
Noam: Without enough samples
Bas: We're not just sampling profiler, most essential information in there
... In FF that mechanism already exists
... I suspect the same is true for Chrome
Barry: Even w/out profiler, ResourceTiming has 100s of bits there.
... I'm trying to distill that down to guidance for developers
Tim: Can say same thing for any standard out there
... e.g. Push giving "foot cannon"
... We have to think of consequences, side effects. Onto developers and tools for how to use it.
... Going deeper enables people to get to the point where RUM tooling and monitoring solves problem
... You need someone to read/interpret to mine insights
... One of the things really exciting with attribution, RUM tooling isn't just data now
... LCP blipped, this element, particular delay
... Most people monitoring to solve problems
... Ability slice and dice we give ability to tools to solve problems
... Some tools will just get all data and overwhelm
Nic: LongTasks was a tough API to collect data, not much actionable
Bas: More interesting thing is what things are more actionable and implementation complexity
Adam: For TTFB we thought people always wanted to know about the time of the network, but discussions here it's time for when the rendering process reads it?
Michal: A bunch of mechanisms to feed bytes early, even coming from CDN, maybe improve performance of that fetch. But your specific request requires some server processing. A little bit of byte fetching, e.g. Early Hints, knowing when it first arrived is useful. But really what you want to know is when the server was really done processing, that's when TTFB matters.
Adam: When was the server done, from network POV, we know when they get headers. But there may not be a render process, other work needs to be done first.
Bas: Confusing to web developer if they included that time
... Slow machine can affect render process
Michal: If it takes 2.5 seconds to hit LCP, is it because render was slow from too much JS, or huge document, or because rendered is sitting around "idle", that distinction may be interesting
... 2 second in render heating up caches
... Could be useful on its own diagnostic
Philip: We've seen cases where boomerang.js has been 2 second load, and it's because host is too busy to process it
Michal: We've talked about gaps in NT. How do we turn it into 4 useful values?
... We're trying to say higher-level what's important, then "Enhance"
Eric: A set of things that preclude useful work
... And are useful work
... e.g. TTFB where I didn't have a connection yet
... If I don't have a connection yet, nothing useful will come across it
Anadeep: TTFB to distill this down, whenever TTFB comes up, a few browser intracities are different in how they deal with it, would it make sense to be a higher-level metric. Time takes for the entire request to complete, each would expose that number, based on its own implementation specific.
... If you want to monitor metrics, as Barry pointed out, you want to know if it's web server, CDN, front-end trying to do too much
Bas: RT already gives you that
Anadeep: Fetching resources, parsing resources, in overall rendering process timeline, chunk into 3/4 key parts. One is network request and network response. Give the user developer insight into how much time it's thinking for the server to get back.
... Give a simple number for how to interpret
... Give FCP
Bas: Isn 't that NT responseStart
Michal: Maybe we leave TTFB exactly as undefined as it is, but a LCP breakdown or whatever it is, all complexity
Philip: TTFB is a duration, not a timepoint
... It has a undefined start and end
Barry: It has a defined start and undefined end
Bas: TTFB is relative to origin 0?
Michal: Yes
Jase: Other metrics using TTFB, are they not using time origin?
Barry: Can't understand LCP unless you understand TTFB
Day 2 - Tuesday
Summary
- Using lots of hacks to measure what’s needed
- Missing features:
- Resource Timing: content encoding, decompression timing, transport chunk timing
- Scroll performance
- Async UI latency
- Partially presented frames
- Worker performance
- PaintWorklet
Minutes
- NoamH: Work in Excel / performance
- ... Want to cover APIs and scenarios in Excel
- ... Talk about some pain points that we could improve
- ... Lots of different areas and topics
- ... Excel is a SPA complex app with many scenarios and dependencies
- ... Want to keep improving the user experience and performance
- ... Think about perf in 3 scenarios
- ... Page Load, interaction Responsiveness (e.g. similar to INP), Async UI latency (E2E)
- ... Animation smoothness
- ... Page Load experience first
- ... Resource Loading, JS resource loading is a big bottleneck for us
- ...
- ... Looking into using Early Hints, hoping for an improvement
- ... Downloading faster, want to make files closer to the client. i.e. increasing cache hit rate
- ... We do aggressive precaching of next versions. Prefetch them to the client, cache hit
- ... Second approach is reducing the file size
- ... Bundling, minifiers
- ... Looking into Dictionary Transport
- ... Running sooner or faster, how soon can we run it?
- ... Extensively using JS Self-Profiler API, look at call stacks for bottlenecks and optimize
- ... LoAF at boot time, look for anything not JS related
- ... For measuring main API we use Resource Timing, track and aggregate
- ... Second area is interaction responsiveness
- ... Metric based on average responsive action rate per actions we collect
- ... Next challenge is determining bottlenecks
- ... Experimenting with LoAF, correlating with EventTiming entries
- ... Also used for general high-level aggregation
- ... Third is Async UI latency
- ... We look from interaction to when the UI first responds. Could take a lot of time to show the context menu.
- ... End to end is UI latency
- ... Measured using ElementTiming API
- ... Create DOM element that is empty / mock, add to DOM just before we want to show to the UI
- ... Correlate perf entries to that interaction
- ... Calculate at percentiles and look for bottlenecks
- ... Challenge is creating a metric that represents what the user feels around smoothness
- ... Right now taking measurements while scrolling, rAF loop
- ... Looking for how many animation frames in that chunk, create metric
- ... Challenge while user scrolls or jank, we use LoAF we see its correlated relatively recently, so we use that as one indication
- ... Also use JS Self Profiler API
- ... Correlate with call stacks to get root cause
- ... Mostly using setTimeout to schedule async tasks, we're starting to adopt scheduler.postTask() API to use priorities
- ... Since not all platforms support this, we have a polyfill (imitates with setTimeout, but some challenges with it)
- ... Reporting API we use NEL for errors, gives a good signal of what's going on
- ... Especially if there's a regression
- ... Crash reports which helps provide signals to platform folks
- ... Opportunities or pain points
- ... Compression Dictionary Transport
- ... Some challenges with shape of API
- ... First is measuring, ResourceTiming API does not give us indication delta file was used when it comes back
- ... Not always the case the delta compressed file is available. Hard to measure success or not.
- ... Using heuristics w/ transferSize
- ... Useful to see huge benefit with CD Transport
- ... Once delta file arrives, no way to measure overhead
- ... How do we measure latency from interaction that starts scroll until the scroll actually starts?
- ... onscroll event may happen later
- ... We have a blindspot where we don't know what is the real scroll experience
- ... White areas, related to long animation frames, content when it renders doesn't show
- ... But not always the case
- ... But when we profile we see correlation with Partially Presented Frames
- ... Don't know magnitude
- ... Can't explain with just JS on the main thread
- ... Few metrics that help us understand situation w/ WebWorker (WW)
- ... Hard to know when WW is busy, when you send a request and there's not an immediate response, why? Contention? Tasks on workers taking a long time? Task prioritization?
- ... Thinking if there's a way to send tasks with priority that could help
- ... People can be discouraged from using WW and not sure why task isn't coming in fast enough
- ... Could provide some solutions to complex scenarios
- ... If runs JS, and no way to know when JS fails, cannot communicate with worklet
- ... Cannot detect failures and critical functionality, we'll be hesitant to using it in production
- ... Don't know about timing from run
- ... Load multiple chunks during boot
- ... HEAD then other multiple chunks, until we have all of the necessary data for application
- ... Useful for performance, reduces a round-trip, almost a "push" concept
- ... No way to measure when chunks arrive
- ... <script> tags but don't trust those timestamps
- ... Would be helpful in understanding and optimizing things further
- Yoav: We talked in the past about memory, you collect crash reports, some of that is OOM? Also memory measurements, cross-origin isolation
- Noam: COI is too hard for us to use, COOP requirement
- ... Proposal that we're evaluating
- Yoav: You're loading 3P that can't have that applied?
- Noam: Lots of dependencies we have no control of
- ... Crash reporting does provide some hints
- ... Signal from memory more frequently, but the only way to figure out if it's related to the app is analyzing crash dumps. Tedious, but can be done.
- Michal: Impressed with breadth of solutions you've come up with
- ... Partially presented frames during scroll may be a Chromium bug
- ... Scott's working on some partial fixes
- Noam: We did discuss a few years ago when we started noticing it
- ... Trace analysis indeed it's Partially Presented Frames
- Michal: We really want to push frame updates and if we think the main thread will sacrifice scroll performance, we're aggressive in throttling. There's a line where we may be making a bad decision. May want to adjust that policy.
- Noam: Challenge even if it's a bug and fixed. No way to know, maybe we see it less? Cannot measure so it's hard to understand the magnitude of this.
- Michal: Strategy to request rAF from WW if you're using them, that gives you a frame-synced signal. Measure from worker, won't see drop.
- Guohui: Missing content indicator?
- Noam: Lots of times content is missing because a lot of JS is running, rendering happening in the background blocking the main thread. That's what we call Missing Content. Correlation with LoAF. By Optimizing rendering or JS we can improve the UX.
- Scott: WW prioritization, we've added to our explainer a use-case. Prototype works for message channels because you create a dedicated pipe between the ports. Neat to explore, run Origin Trial?
- Guohui: If postMessage to the main thread, the delay is similar to the lowest priority. If I want to offload some work to the WW, I will suffer a penalty. Hop will suffer 10x more delay.
- Sean: Measurements in other browsers, or just from Chrome?
- Noam: We measure in other browsers, but nothing specific to say.
- ... Gap for us and blindspot for understanding and improving UX on other browsers
- ... No measurement means no improvement
- ... APIs could help us improve UX
- Yoav: ResourceTiming feature requests. (1) Exposing Content-Encoding, e.g. for Compression Dictionaries.
- https://github.com/w3c/resource-timing/issues/381
- ... Decompression Timing for ResourceTiming. Maybe there's a gap when applying delta compression. Decompressing full Br also takes time. Do we run benchmarks?
- ... Things can be different in the wild.
- Noam: Gets asked many times when discussing this feature
- ... Network time vs. CPU time
- Yoav: Is Dict decompression more expensive than regular decompression? Not sure
- ... Different characteristics on Android vs. laptop SSDs
- ... Third request is more granular data on download chunks. h2/h3 streams.
- ... WebPageTest Pat exposed this in the UI and it makes a big difference in CDN IO issues, server processing issues
- Noam: Simulated environment
- NoamR: Exposable in ServiceWorker
- Bas: Potentially additional latency
- Yoav: Available as SW API means we're not exposing new data if going down that path
- ... How could we expose ergonomically?
- ... TT?B discussion
- Michal: Mentioned using ElementTiming a lot for async UI latency
- ... Presumably you already have to track that frame
- ... Have you done any tests to compare latency to the frame start, and perhaps main thread end
- ... LoAF provides some info on style and layout
- ... Which portion of work is most important to measure?
- Noam: Haven't done what you suggest, but we could analyze something like that
- ... Most important thing we're trying to understand
- ... We know heuristics aren't fully accurate, but approximates presentation time
- Michal: At least some of the content is visible
- Noam: We use as a signal
- Michal: If using sync image decoding, any images will freeze content
- Noam: We tried to play with 1px transparent image
- ... Had some issue, now using empty paragraph element
- Michal: Asking as I'm interested in how much in real world, some debate on style/layout, main thread
- ... How much is rendering delays?
- ... Moz folks have asked use-cases for ElementTiming, etc
Conditional tracing - Noam Rosenthal
Summary
- Will research if we can make performance.bind() work with CPED to resolve the microtask issue. Otherwise there’s some consensus it’s worthy.
- Re. PerformanceTrack - since it’s an object, it feels like something that would not contribute to RUM very well as the RUM script would need access to the object. This makes it a 1p solution (+lab). Need to see if perhaps we can get away with a lab only solution, or some other namespacing solution. Not a conclusive direction..
Minutes
- NoamR: wanted to resolve this issue if possible
- … Trying to solve a couple of different things
- … LoAF #3 - complaints on LoAF only exposing script entry points
- … sometimes not granular enough. E.g. in React you get a React callback that’s not helpful
- … there’s also the “blame the wrapper” problem, where everything slow is the fault of the messenger
- … separate issue related to that, about adding more information to user timing for lab testing and devtools
- …
- … use cases: diagnostics and attribution
- … Finding a problem and attribution/blame are different issues
- … attribution needs to be precise
- …
- …Having an object creates a natural namespace
- … when the observer fires, the entry now has tracks that were not emptied and can contribute to this particular LoAF
- … the console namespace is a way to communicate with the lab
- …
- … binding would allow authors to provide their own entry points that will enable precise attribution based on entry points
- … As a RUM provider this is what you’d do to wrap functions.
- Michal: When the perf observer fires, what does the track entry have?
- NoamR: user timing entry
- Bas: track comes from the idea that it’d be displayed in the perf profiler as a track?
- NoamR: different libraries would have different tracks
- Bas: So two independent proposals here, one for the track and one for the binding
- NoamR: We should start with the track one
- NoamH: Consumer is lab tools?
- NoamR: Also LoAF. Lab tools would get them regardless, Not a web facing one
- Tim: The only part that’s lab tools specific is “describeTrack”, right?
- … does it make sense to have that be just lab? Or would it be interesting for RUM as well?
- Bas: it enabled tools to display a color and a name to visualize
- Yoav: could be interesting for RUM visualizations
- NoamR: feels like a future thing
- Michal: previous proposals to namespace user timing. Attaching the observer to the track and if it’s not observed, the entries get blackholed.
- … the problem with user timing is that it’s a global namespace and any consumer would see all of them
- … a track is a namespaced user timing, it’s like a filter
- … An observer for a single track would observe all the entries that are pushed into it
- … You’d need a reference to that object and labtools may not have that
- … Don’t understand the appeal of LoAF clearing the buffer
- … I’d imagine a custom stream of timings that are more efficient
- … Why is track specific to LoAF?
- NoamR: could also be related to event timing. But e.g. doesn’t make sense for resource timing
- … It’s a clearance policy
- Michal: It’s observing a piece of the performance timeline, but then you clear it
- Bas: Why don’t we make the clearing explicit?
- Michal: There’s a buffer per observer and a global buffer. User timing adds to the global buffer.
- … A track could be a bufferless stream of timings, but when you register an observer, you get these entries
- … I like the object creation, but you need a reference to it
- … If react is creating a track, would they publish a public reference to their track? Probably not
- Bas: So the track is an interface and each observer observing the track would have their own version of the timing events registered to the observer
- Nic: Are tracks intended to be measured by RUM
- NoamR: as part of LoAF or Event timing
- Nic: How can RUM providers know about these tracks
- Michal: If we make them global… is this just user timing + namespace + less overhead via…
- … If user timing was a write only entry and didn’t support buffering and had no overhead unless observers are actually registered…
- Bas: It would essentially create namespaces in user timing where no work has to be done if that namespace is not observed
- Michal: Maybe a future namespace opt-in?
- NoamR: We can’t do this efficiently with the concept of buffered
- Bas: Chrome or firefox profiler would just expose them
- … If lab tooling is running, that implicitly registers for any track
- Michal: It’s about making sure that there’s no cost when you don’t need the feature
- … Instrumenting everything is too much
- NoamH: Another use case for not binding it to LoAF. If you’re using JS profiler
- … it’s a sampling profiler so it can miss short functions
- … This could enable tracking of a particular function that can help drill down frequent short functions
- Michal: It’s still an unbuffered use case but not the same as lab use case
- NoamH: Lab is interesting but limited
- … lab issues can not be represented in the field and vice versa
- Bas: It’s just a marker instrumentation
- Michal: one of the alternatives was to just do console apis
- NoamH: could be wrapped for RUM 😈
- Michal: What about bind?
- NoamR: bind can help avoid confusion around microtasks. Another option is to create an object to be used with function callbacks
- … when added directly to an event listener, it includes the whole thing including the microtask
- … can be specified if the measurement is until the end of the next microtask
- … but you don’t know who created those microtasks
- Michal: You still have the script entry point. The script already captures all end to end
- … even that is ok - to know these functions were involved
- … In chromium we have a running duration for a task that could be captured
- Scott: can bind that data to the continuation of a promise
- Yoav: There are conversations around AsyncContext, implementation around task attribution. Discussion around caller, which semantics we want attribution on.
- ... This feels essentially the same problem, but solved in a different way. Delegating the power to the author.
- ... Generally looking for attribution here you want caller semantics.
- ... Care about who called fetch that was wrapped, not fetch wrapper
- Michal: Task attribution builds on several primitives
- ... Simplest one, in a simple task, mechanism to say these are related. Helps power biggest project of task attribution across tasks
- ... List of functions bound within for LoAF
- Noam: Except not for blame
- ... If it doesn't include the whole thing this function does, still says the wrapper is creating the problem
- Michal: Could spec bound duration
- NoamH: Could change over time
- Michal: Go from N amount of total time, 10% attributed 90% black box, another browser gives you 40% time attributed
- NoamH: Maybe percentage not duration
- Bas: You know the total duration of the two are completely interchangeable
- Noam: Standardize once we can standardize the whole async stuff
- Scott: Need to pull something to get those values. Things get propagated. Needs something to read to be useful.
- ... Once each microtask finishes you pull that? Layering is tricky. Need sync across the API.
- ... AsyncContext built-in variable
- Yoav: AsyncContext discussions not fully baked
- Scott: Promise semantics baked, just platform stuff on top of that
- Yoav: What do you want to attribute to LoAF? Event handler? Entry point itself being wrapped.
- Noam: Wrapper usually do something, may be responsible for micro tasks as well
- ... Best thing we can do is this bind(), microtasks won't be attributed
- ... Rely on AsyncContext task attribution
- Bas: Microtask would be attributed to root/wrapper in this case
- Yoav: .bind() feels very similar to AsyncContext also. Similar parallel API.
- Michal: To implement AsyncContext, you need sync context. If I was writing my own function, I could start perf.now(), await promise, perf.now(). We're talking about auto-instrumenting all of those points.
- ... We want to track the final async one
- ... Really easy for developer to do this
- ... For wrapper it's quite difficult
- Bas: Maybe mentioned, but how would nesting work?
- Noam: Flattened, just another one
- Bas: Ordering misleading?
- Michal: If you just report duration without timestamps, you could just keep summing instead of range of times active.
- Noam: Could have several contributors to the microtask queue.
- Michal: Assuming CPAD(?) in Chromium could help you figure out
- ... 3 binds, doesn't matter order of microtasks, you could attribute to the new function
- Bas: Simple proposal doesn't seem like much resistance
- Michal: If you could expose the name of the wrapped function, duration value.
- Bas: Downside of exposing start/end?
- Michal: Could be multiple points, long list.
- Bas: Thinking task calls something 1k times, cheaper if we just calculate duration and had a single entry
- Michal: For client, post-processing
- Noam: We should research better how this fits in across microtask contexts, and make it future proof.
- Bas: Downside function not being called 100x and one takes a long time, others were really fast
- Noam: Take that into a solid proposal
- Yoav: Let's continue discussing until we have a full resolution
Summary
- postTask shipped. Yield is shipping
- Other use cases: idle-until-urgent, wait for render, render now
- Render now is aimed as a hint for the parser to yield
- Link rel=expect could be an alternative
- Idle-until-urgent - use case is around analytics work at unload
Minutes
Scott: Update since last year
… Been focused on schedule.yield
… first proposed in 2019, but with the focus on INP that got shipped in Chrome 129
… TAG review wanted a big picture explainer
… outlines the direction and things we’re thinking about
… Also nascent ideas around extensions
… scheduling matters during congestion
… browsers chunk up work and prioritize it internally. This exposing it
… improving responsiveness is the focus
… long task can block event from processing, or frames from rendering
… Sites can use this to prioritize work and improve user perceived latency
… Lots of things run async, we keep adding more
… HTML spec guarantees that every task source has guaranteed order run
… but different queues can be prioritized compared to other queues
… Rendering is special - it’s a separate task with variable priority
… Other browsers may vary the frame rate
… scheduler.postTask was the first API and thought of as a tool to improve responsiveness
… helps developers break tasks into individual pieces
… Modern API - promise, TaskController (inherits from AbortController), Signal
… defines the ordering between the tasks
… “user-visible” is the default
… scheduler.yield() - doesn’t require a function boundary, yield then resume
…
… heard a lot from developers that they are hesitant to give up the thread, as they’d need to wait until everything runs
… With yielding, if you’re breaking things up, setTimeout delays the continuation. With yield, the continuations continue to have the same priority as the task that scheduled them
… Yield inherits priority from the task that called it
… Treating it conceptually as a single async task
… Also works for requestIdleCallback. Yielding from rIC would keep the continuation as “background”
…
… Idle until urgent - you want to yield but you may lose data if e.g. navigation happens
… Related to async event listeners which may be related
… Rendering!
… You’re yielding but want the yield to only return after the render has actually happened
… proposed scheduler.render(), but people weren’t happy
… if the site knows that the above the fold content was loaded, and want to display ASAP
… currently no way to signal this to the browser
… seems bad to couple them
Yoav: For the renderNow() one, what do you think are the use-cases?
... How would developers know that all the content they need for rendering is already there?
Scott: Search has talked that they know
... Amazon may know this as well
Michal: Rumor: Server will block response so the bytes across stream temp pause, parser forces yield point
Yoav: Introduce that delay, in order to force parser heuristics, to flush
... Seems like you'd need a declarative element
... Better for this use-case maybe for others
Michal: Previous thinking, we have a API to request not resuming after next rendering is done
... Inline script that asks to not resume after rendering is done
... Nothing explicitly blocking parser from moving forward
... A bit of magic
Yoav: Sounds very indirect
Scott: Suggest de-coupling them
... I don't think it's the right fit, separate use-cases
Noam: Link rel=expect (without blocking=render) pointing to an ID, hint expecting a specific element, yield parser
Michal: Opposite of blocking=render, don't continue doing more work until
Noam: Link rel=expect Important milestone in document
Yoav: Wonder if other implementers have parser heuristics regarding blocking or not?
Olli: User-input very old heuristics
Yoav: No user input, loading HTML spec, when do you first render
Bas: Don't render in short period after page load
Olli: Timer based (short)
... Multiple heuristics
Michal: Chromium has some heuristics and timers as well
... e.g. first bytes of body start streaming
... Later than document.commit
Bas: Not relative to origin
Yoav: Not relative to request, relative to response
Bas: Main point is to avoid painting something useless
Yoav: You don't necessarily want to block here, but if you see this, render
... Different semantic, link rel=expect with some other signal
Bas: Worries me there are ways to use it that are stupid
Michal: Idle-Until-Urgent, status quo is anyone allowed to register event listener
... Default priority is infinite
... Hold that task as long as you want
... Lot of book-keeping happening
... If they try to be nice and be good citizens, they see a loss
... Guarantee will run before doc is unloaded
... Go a long way for better actors on the web
Noam: Run continuation before unload
Michal: Is it possible a task broken up with setTimeout, would just yield
Naom: If yielding for input, in event prioritized task?
Scott: No, main app would do small DOM update then do big thing
... Not update app state
Michal: Maybe if browser provided, script run in background as page is unloaded
... Maybe flush with higher priority
... Too much work and it doesn't run
Alex: Infinite amount of work and yielding, browser can say it's done
Michal: If I was blocking a doc, now yielding and I get 100ms, very good policy
Noam: Another thing we could do is priority if a navigation is pending, not yield in the same way
... Started nav but no response yet, several things we could do w/out additional API
... Polyfill-able
Olli: Plenty of idle time after beforeunload and when we actually unload
Michal: Reasons why existing players think it's not enough, lack of guarantee is important
Yoav: Would you get that guarantee in cases not today?
Michal: Crashes no, in case where browser is trying to unload, browser is too eager to unload previous document is feedback we hear over and over
Yoav: Browsers maybe be less eager?
Scott: Changed recently to be more eager, want to be as fast as possible
Christian: OS is probably interrupting app
Bas: Process will be at risk of being killed once it says no longer it should be in foreground
Yoav: Don't kill prev rendering process until the next one is ready
Michal: Is explicit signal important, or just anything scheduled to go FIFO, good enough?
Scott: One concern is if you have >100ms continuation in main app state change
Bas: Hydration taking >2s
... Nav to something else
Michal: A well-behaved site can abort doing that work
Scott: Minimal work, like to do the minimal thing and unblock as soon as possible
... Need to investigate when all continuations finish running. Good enough with guarantee?
Michal: Event listener and you yield, priority?
Scott: default, user-visible
Michal: Opt-in was scheduler.postTask() instead of scheduler.yield()?
Scott: You can add same task to either-or
Alex: Perverse incentive we want to do as much as possible and aggressive as possible, try to solve somewhat with fetchLater(), what kinds of other things do people want to do at unload
Scott: Could be processing that you need to send that data
... Could do that in event handler and fetchLater()
Yoav: Processing has to happen right before unload
Michal: If you yield for being a good citizen, and doesn't run, fetchLater() doesn't get to go
Nic: in Boomerang we try to send resource timing data at the very end, as a concrete example
Scott: In this case it wouldn’t help
NoamR: another example is putting things in local storage
Michal: Could you do that work incrementally?
Nic: Depends. We use a trie of all the urls on the page
Bas: Is this a proposed API?
Scott: brainstorming. It’s similar to yield but a different concept
Michal: Only when we’re at that last moment do we know that this needs to be flushed
… A snapshot of the page needs to be very late
… Other things can run in the background, in idle periods
NoamR: I’d be curious to see how far we can go with the polyfill. Would inform decisions
… e.g. If you’re after “beforeunload” you don’t yield
… promise race between yield and visibility change
… Phil Walton did some work
Scott: maybe we need to come back to the group when we have more
Michal: Even if a polyfill works, having it baked in would help adoption and be worth the cost
Bas: What’s the downside? Could people abuse/misuse it?
Scott: We could add limits
NoamR: I can see libraries that use it and create too many such tasks
Michal: Right now events have no way to get interrupted. It can only be better than status quo
NoamR: It can make things worse with continuation task getting to block things
Michal: Only one of them
Summary
- RUM principles are something we should codify
- Long list of pain points
- Forming RUMCG to get more folks involved and try to address that (through lobbying/funding)
Minutes
Nic: We talked about that already in previous years
.. Talked about
… Made pretty good progress on these things
… We’ve been thinking through the things that are important to RUM providers
… given that list, here are the things we’re struggling with
… So we started looking into forming a CG
…
… Hoping to get more folks involved beyond the WG (because no W3 membership is required)
… Not be developing specs, but bring inputs to the WG
… Waiting for the W3C to review the proposal
… << Click here to sign up >>
… Took a poll amongst the interested folks
… Non blocking script - mPulse uses preload and a snippet as a fallback, but an attribute would be nicer
… Interested in server timing and increasing its semantic stability across the industry
… Better RT initiator support - building the resource fetching tree and do critical path analysis
… Small RT enhancements, LCP issues, Container timing..
Christian: LCP popup behavior?
Tim: you got a hover event that pops in content and that becomes your LCP
Bas: Standard position on Resource Timing enhancements?
Yoav: too early
NoamR: browser work funded by the interested parties would be great!
Nic: There would be challenges, but it would be interesting to consider
Bas: Non-blocking script loading. Are browsers even consistent in how they work?
Olli: Chrome changed async loading a few years ago
Philip: The issue is about blocking the onload event. Only preload doesn’t block the onload event (Ref to original bugzilla ticket: https://www.w3.org/Bugs/Public/show_bug.cgi?id=21074)
… Also true for dynamic imports that started loading before onload
… The iframe hack is tackling it
Yoav: preload was supposed to fix this. This would just be additive
Nic: Yeah but that would lead us to a better future
NoamR: Server timing standards were already discussed. So it might be something that the CG can take on and create a registry
Olli: question about observer effect - have we considered notifying the site when they are misusing the APIs. E.g. when adding a million user-timing entries
Yoav: maybe we should limit the buffer size?
NoamH: configurable limited buffer?
Nic: we have that for Resource Timing
Bas: a very common observer effect
NoamR: maybe we could make it so that all that work is done in a worker. Create a performanceObserver in the worker and the worker gets all the entries, but it’s hard
… In chrome you could move the objects between threads, but not serialize it
Nic: Otherwise, there’s a RUM archive session at 6pm
…
… Data on how real users are using the web
Day 3 - Wednesday (no WebPerf meetings)
(breakout sessions)
Day 4 - Thursday
Summary
- New deliveryType for SW cached resources makes sense. Name TBD
- Regarding new metrics, it’s worthwhile to measure internally first and see if they are negligible or not
- None of this should stop an OT, and new experimental metrics can be added as part of an OT
Minutes
- Shunya: continued conversation. Last time we talked was July
- Keita: we’d like to talk more about RT and SWs
- … SW can have high overhead on the critical path
- … static routing allow developers to control how certain resources are fetched
- … Developers provide router rules and specify which resource is fetched from where
- … 4 path rules:
- … fetch event - similar to the current SW default. Fetch event is triggered
- … network - go directly to the network without any SW interception
- … cache - get resources from the Cache API is they are already available
- … race - calls fetch and issues a parallel request and uses whichever comes first
- … Many components involved and multiple behaviors
- … The behavior of these components may be unclear - e.g. we don’t expose how long the router eval took per resource
- … For race rules, we don’t know which path won
- … how long did the cache lookup take?
- … So we want to expose a few new attributes: workerRouterEvaluationStart, workerCacheLookupStart, matchedRouterSourceType, actualRouterSourceType
- Got feedback
- How beneficial would the new metrics be? Did a bit of further research and our current decision is to not include workerRouterEvaluationStart, as it’s not a significant impact
- … Expand cache lookup to API access in general
- … decided to align fetchStart with responseStart
- … Decided to create a new delivery type calls “SW-cache”, to differentiate it from the current “cache”
- … Currently sites have few rules, that aren’t a significant slowdown
- … Seems like useful information for the Cache API lookups in general (e.g. in SW fetch handlers)
- … Need to consider what happens if cache access happens multiple times. Also, what do we do when the response is not used?
- … Regarding the “delivery type”, we may want a new value
- … Need to bikeshed the name
- … Want to discuss
- NoamH: Is the cache lookup start supposed to be for static routing or in general?
- Shunya: Decided to expose it in non-static cases
- NoamH: There’s also cacheLookupEnd or are we relying on the response end? Developers would want to understand the race between the cache and the network. How can they analyze the race?
- … Adding a new delivery-type that splits the cache semantics, developers may need to be aware of them
- NoamR: Great to not add new timings and start measuring if and when they become non-negligible
- … confused about cache lookup in non-static, as you can perform multiple fetch lookups
- Shunya: We’re not sure, but it could be Promise.all of all the cache lookups
- … Thinking of having a counter of total duration and measure the total time
- NoamR: so cacheLookupDuration
- Shunya: need to still investigate
- Marijn: how do you deal with overlapping fetches? Which cache worked for which fetch?
- NoamR: That’s very difficult. SW have their own performance timeline
- … Attributing this directly to a fetch is complex
- … static routing thinks of SW as a fetch, so you can reason about it like a fetch
- … Isn’t the cache lookup start negligible? If you just have static routing, isn’t it in the microseconds?
- Keita: For cache lookup we don’t really know, because there aren’t in the wild websites yet
- NoamR: But its start should be close to the previous measurement. I’d expect that for this delivery type, the time between the requestStart and responseStart would give you the cache lookup time
- … requestStart and responseStart already have these semantics
- … we can bikeshed about the name
- … If you have multiple rules, exposing which rule were matched could be great
- … But maybe there’s no rush for it
- Kouhei: motivated by the customer, so there is a need to measure this
- … Should we wait for adoption?
- … We do want to proceed with an Origin Trial and one of the goals of the trial is to decide if this is useful
- Yoav: To Noam's point, it's worthwhile to measure if cacheStart is meaningful, or just a few ns after previous point in the timeline
- ... See if responseStart is not good enough
- ... Even if there's no adoption still, measure that in the lap on 90th percentile devices and see what those numbers give us
- ... Hearing there aren't any objections to a new deliveryType, bikeshed the name
- ... Regarding the semantics change, for static routing, we're not really changing the semantics of "cache". If we were, it could have compat issues in the wild.
- ... People who capture current "cache" may not capture this new "sw-cache" etc case
- ... We're not changing semantics
- ... For OT, you could also add whatever new metrics you want as part of the trial, help to decide whether these metrics are meaningful or not based on partner feedback
- Kouhei: Makes sense
- NoamR: Using static routing API for synthetic responses, may be even another new deliveryType
- ... Can just create a response in the router (proposal from yesterday)
- Yoav: New made-up delivery type
- Shunya: Conduct OT, can we start writing spec draft PR?
- Yoav: Sure
Exposing style/layout
Summary
- Exposing Forced Layout + attribution to RUM seems important
- Could be as part of LoAF, maybe while tweaking the definition of “long”
- We should work on APIs to make it easier to avoid forced layouts, in collaboration with other WGs
- For layouts in general, it might be interesting to expose the affected rect. This can feed into decision making around layout containment.
- Hit testing can also force layouts
- To be continued
Minutes
- NoamR: Open discussion, identified as a blind spot in performance metrics
- ... No concrete plans about how to resolve this
- ... We have a lot of things that rely on style, layout and presentation time
- ... INP starts from interaction
- ... LoAF tells you about time in style + layout, but it's not actionable
- ... Have to go figure it out
- ... We see a lot of jank is spent in forced style + layout
- ... No insight about it
- ... Need to know why and how to resolve it
- ... After style and layout, there's compositing, decoding and images, etc
- ... Not a black box, a purple box, opaque
- ... A lot of time in hit-testing before and script is running
- ... Is this a devtools problem or a RUM problem
- ... Chrome get selector matching, etc
- ... We saw this about content-visibility, marketed as a solution to this, but hard to match it to a metric
- ... Hard to see step by step improvements for this in the lab or the field
- ... Do we need more tools to avoid in the first place, PositionObserver, to understand why better
- ... Fix in API level rather than metric level
- Bas: For the lab measurements, the information available isn't standardized
- ... Forced layout flush, what information does the lab tooling not give you that you'd like to have?
- ... Stack trace of initiator?
- NoamH: Initiator for what kicked it
- NoamR: Not statistically for the page, have to calculate it yourself
- ... Forced layout doesn't always mean forced layout
- ... Could have forced, but next things you do don't make it happen
- ... Insights in lab is it's kind of difficult to grasp if that's what's happening or not
- Bas: Statistical aggregation over time of forced layout, anything else concrete?
- NoamR: More question is about RUM for this room
- ... Is Lab enough or do we need RUM?
- Yoav: Getting that information in lab is very hidden, a lot of developers don't
- ... Having that info of "forced layout happened" is important thing
- ... Attribution that can point to what kind of read forced that layout super useful
- ... Beyond that we need better patterns to enable developers to avoid reads-after-writes
- ... Userland libraries kind of wrap things so you can try to manage it, those should be platform primitives
- ... Well-known pattern that are safe
- NoamR: Actual solutions may be beyond scope of this WG, but we can provide proposal (e.g. to CSS)
- Michal: Distinction between knowing "did it happen" and "why"
- ... e.g. LoAF provides some details, with long script total sum of duration of script and layout
- ... If a bunch of short scripts do layout thrashing, that's lost information?
- NoamR: Correct
- Michal: Collectively page is suffering from problem, should be aware of in the field
- Bas: Add a field to LoAF, forced layout happened?
- Michal: Total amount of time, you get duration per script. Timespan in rendering, you can add those up. But not all timeline is covered by that. Some scripts don't get reported.
- Yoav: Makes me think that this isn't necessarily LoAF metadata, something else?
- ... Forced layouts unrelated to LoAF
- NoamR: Then they don't really change UX
- Bas: Then maybe it becomes LongTask API
- Michal: We should change definition of "long", if we think 50ms not sufficient, we should work on that directly
- NoamH: Important for RUM, extremely hard to derive actionable items from lab results
- ... Signals from lab, not aligned with what really happens in the wild
- ... Enforced layouts, what you want is actually the read that triggered it
- ... What is the first strike before that
- ... What steps do you need to take to fix it?
- NoamR: Very rich info
- NoamH: As a developer that's what you do, you see the read and decide if you can postpone or do a different ordering
- ... JS Profiler markers proposal, that provides additional information
- ... Proximity to the initiator
- NoamR: Interesting direction to tie this rich info to another rich info API
- ... Adding more things it can do perhaps
- Yoav: Unless gathering this rich info is costly, I would be cautious tying it to a costly API
- ... Double sampling (profiler and users)
- ... This doesn't sound like it'll have that overhead
- Bas: On our side I see more potential to use this than JS Profiler API
- NoamH: JS Profiler API from our measurements isn't that much overhead. Depends on how you use it, low sampling approach, occasionally start it for short periods, sampling rate higher than 16ms, extremely helpful for large apps with a large number of users
- ... Large enough samples you're reconstructing flame chart of frames and their causes
- Yoav: Not saying JS Profiler is bad, has cost, requires caution that's not required here
- ... When we initially went with JS Profiler we got numbers that were initial, if y'all have other numbers it would be interesting to present those numbers
- Michal: Beyond scope of this discussion, hit-test problem
- ... Before every event dispatch, do hit-test, do layout/style of page, make sure it's valid
- ... But default user action may update layout/style of page and dispatch more events
- ... A well-behaved app where developer tries to stagger reads/writes, but browser is thrashing page
- ... Separate issues with event dispatch and hit-testing, not great anyway, races and such
- ... Maybe there's room to change how some of these things work
- ... Not really on developer to understand that
- ... Expensive as layout is, it should only be computed once
- ... More than one frames of layout often
- NoamH: hit testing comes up as bottleneck in local profiling
- ... Forced layout during JS or post-JS, actionable research can do to optimize
- Michal: You have a perf budget for total layout, effectively reduces perf budget
- ... If we remove the redundancy or optimize certain paths then you have a higher perf budget = better latencies or more complex sites
- ... We have powerful tools to help reduce dost
- ... Sometimes unfortunate to say "you are thrashing", there's too much layout happening
- NoamH: Can we expose the cause for the long hit test?
- ... No idea what contributed to that?
- ... Most likely layout sometimes
- Scott: Great if we could try to categories sources of layout thrashing, see majority of where problems are coming from
- Michal: Long list like update style that causes layout
- ... Isolation of styles which we have capabilities for it, promote those patterns
- Alex: Why do some people say it's outside the scope of this group to solutions that reduce thrashing?
- Bas: Some of those solutions are handled by other groups
- Alex: Majority of this group has become talking about RUM, I'm just saying if we see performance problems, that CSS group hasn't noticed, it's on us to come up with proposals and bring it to them
- NoamR: Mentioned PositionObserver here, came up in many traces, webdev years, I need to do something JS based positioning things, anchor positioning in CSS, should reduce amount of unnecessary things
- ... Still a lot of things rely on where things are and require JS
- Yoav: Like getClientBoundRect() which forces layout
- ... Something we could propose
- Bas: Scheduled session on reduced layout thrashing
- Yoav: We can schedule something with them post-TPAC
- NoamR: Whole effort is just starting
- Yoav: It's early, but once we have a clear Line of Sight to a solution, bring it to them
- Bas: We have similar experiences with Firefox being an issue
- Yoav: Not browser specific, it's by design
- NoamR: Other solution things missing, is an async programmatic focus function
- ... Right now thrashes because you rely on it to immediately send focus event
- ... Have to know if display:none because you're not in focus, and can't do until calculate style
- NoamH: Async focus that returns promise?
- Yoav: For DOM writes, non-urgent that you don't want to
- NoamR: With duality of writes and reads, only fix one of them
- ... Fix writes, do all reads async
- ... One of mistakes in FastDOM and libraries, overshot to how to make the writes together
- NoamH: Do all reads together
- Yoav: In some cases you need a "sync" read
- ... Async writes and sync reads
- NoamH: To solve that, what I see a lot of the time, sync reads that cause thrashing, a lot of the time you have a parallel state that tracks what you expect. Not ideal. That's what people are doing
- ... VirtualDOM idea
- ... If there was some way to automate it, if developers are doing it, maybe there is a way
- Bas: For me the most important thing is the least amount of work possible on developer to do this
- ... Yes async reads could solve this, but it would require developer to think much harder to do things
- NoamR: Async writes already exist (rAF then write)
- ... Is there something more insightful that is more interoperable?
- ... Amount of elements visited vs. style visited
- ... Common enough and baseline enough that can give insight about what happened during style and layout
- ... Careful to not be impl specific there
- Yoav: Magnitude of the layout
- NoamR: Actionable about layout or style
- ... Tried to compute 100 stylesheets or 1000 elements
- Bas: I think this could be done interoperably
- ... We already kind of know, if we implement LoAF
- ... How much time we spent, reasonably accurate, in style vs. layout
- ... Is that enough information to be helpful?
- ... That presumably other browsers could more or less know as well
- ... Not 100% clear
- ... Harder for me to know how interop it is
- NoamH: Selector matches
- Bas: UA do something smart that touches less elements maybe?
- Yoav: Two separate things you want to report here
- ... One is info about layout/style itself
- ... Other is the trigger
- ... Two different actionable things
- ... Expensive layout, should make it quicker
- ... Other is the trigger, maybe don't do it
- ... Two separate things we need to try to attribute or give insights into both
- Jase: Something actionable by developer. Splitting style/layout, what are you going to do that
- ... Someone did talk on CSS selector perf
- Bas: To some extent, appreciate love for RUM, but if you know in the wild people are mostly style causing it, the analysis can be in the lab
- ... What is my style doing etc
- Yoav: Trigger needs to come from RUM, numbers, but if you know how to reproduce, you go to the lab
- Tim: As long as you know how to reproduce it
- ... Duration and reproducibility, deeper analysis is fine
- ... Trouble when reproducibility is difficult
- ... If this is tied with LoAF and have attribution, we should be able to reproduce
- Bas: Would love to increase reproducibility
- NoamR: Reporting-API style thing, state of the DOM
- All: Just PCAP it
- NoamH: For repro, you may know it happens but not magnitude
- Nic: Moving to the lab is great, but need data for reproducibility
- Sean: Have Layout Instability API. Can it be combined with LoAF
- Nic: Not always a shift that triggers
- Michal: Rerendering the world and seeing a tiny change that doesn’t shift anything is very common
- … so many libraries that have great DX but take performance 10x worse, but look fine on a macbook
- Bas: How much of the page was invalidated? Is that valuable?
- Alex: Sounds like a lot of bookkeeping
- Bas: We’re already doing that book keeping. Unsure about other browsers
- NoamR: somewhat exposed with content-visibility. “Anything outside of this can be laid out”
- Bas: If LoAF would expose the dirty region, could that help you understand what it was doing? If you know what content changed that would give you a good hint to what happened
- NoamH: Region makes it hard to infer what element was actually on the screen
- Michal: LoAF should give you overall duration. Layout instability is a UX metric but it’s also a performance metric
- … Layout driven animations trigger it. Annoying vs. expensive to compute
- NoamR: I disagree. Layout Instability should remain about user annoyance. This is specifically about expensive sync layout
- Bas: Back to what information would be useful for folks
- … The dirty region suggestion, would that give a lot more information. It’s not perfect, but it would give you the part of the screen that impacts layout
- NoamH: Depends on zoom, screen size. Without capturing the screen it’s not useful
- NoamR: You can do element.at() to figure out the element
- Michal: If we knew the amount of needless layout beyond what’s necessary
- NoamR: I’m motivated to research this. Can we expose something that can feed into containment? Can we have input for a containment value?
- Yoav: Research into reasons layout is triggered in the first place
- NoamR: Thrashing aspect
- Bas: If we can solve problem, interesting avenue to explore
- NoamR: Anchor positioning, some things are coming also
- Yoav: Solving some of the need to do reads
- ... Avoiding need to read is one, but better way to read is two
Soft Navigations
Summary
- Live demo using Chrome HUD integration. Showcasing how Interactions + Task Attribution + Dom tracking leads to paint area “attribution”
- Discussion on which types of interactions, which types of URL changes, and what paint area requirements are considered soft-navigations.
- Folks don’t like requiring heuristics. Desire to expose primitives and allow heuristics to be layered on top, rather than baked in.
- Analogy: Layout Instability API exposes all shifts, but marks some with had_recent_input: true, which CLS algorithm uses as a heuristic to ignore for score calculation.
- Maybe Non-user-initiated navigation should still get measured and reported, but can be ignored by the algorithm.
- Should developers be able to mark / describe what they consider a soft nav?
- Sure, developers might benefit from primitives, but the requirement is to have a performance timeline report by default, not the other way around. Feedback from Tim K that developers in the wild getting heuristics wrong and messing up measurement. Feedback for Nic J about mPulse techniques overlapping.
- We want the API to focus on the common cases, but developers should still be able to control advanced use cases.
- Desire to expose to field data: e.g. CrUX. Enlighten and motivate improvements. Better insights into architecture choices. Frameworks are competing on metrics but there are huge oversights.
- Discussion of existing gaps and next steps.
Minutes
- Michal: hands on progress report
- … Lots of assumptions thrown around on SPA performance - don’t know if they are faster or slower
- … Nice to have a consistent way to answer that
- …
- … <live demo of paint flashing>
- ,.. Every paint on the page flashes
- … Chromium has ElementTiming, which can measure the stream of paints
- … <demo page> Glowing regions that show what Chromium thinks were new paints
- … In order to solve SoftNavs, we track paints that are a result of a user interaction
- … JS driven paints would not glow, because they are not a result of an interaction
- … Thanks to TaskAttribution we can follow the tree of effects through scheduling and link it back to the original interaction
- Bas: Is TaskAttribution specified?
- Yoav: There’s a proposed spec in the soft nav spec
- Michal: Lots of related proposals. AsyncContext, etc
- … Now that we know parts of the page that changed as a result of an interaction, if we have an interaction that changed large portions of the page, we consider it a soft navigation
- … URL change is also required
- Anudeep: What would trigger a URL change? There are also hash changes that may not represent navigation.
- Bas: Not in love with the use of heuristics here
- Michal: Trying to minimize it. You could also argue that there’s no need for interaction (auto-advancing pages).
- … But the assumption is that it’s needed
- Bas: Is it only for DOM modification or also for style modifications?
- Michal: There are a lot of different change types, we’re working through the list
- Bas: If I changed styles only for soft nav, would that count
- Michal: It would eventually?
- NoamR: style change would be a DOM change
- Alex: Would an SPA that logs you out automatically be a softnav?
- Michal: Not necessarily
- Alex: We should let app developers to tell us what’s a soft nav
- Michal: We can expose the underlying primitives
- NoamR: Is the performance of non-user-initiated navigation important?
- Bas: in an e.g. gallery it’s important
- NoamH: Users would have no interaction expectation
- Bas: But users still have performance expectations
- NoamR: It’s a starting point
- Michal: Developers are marking their pages already
- Bas: Why not performance.navigationStart()?
- Michal: We have the navigation API
- … Might be difficult to know what changed on the page as a result of the navigation
- … Would be interesting to get these details
- Tim: People try to instrument it themselves and getting it wrong
- … e.g. in next’s router people don’t know how to start this
- … serious reservations of relying on developers to know how to do this
- Bas: Can’t developers know what their pages are doing? There could be cases where you wouldn’t change the URL but you are interested in the interaction.
- … Right now you need to time-correlate
- Michal: There are a bunch of ideas in the TaskAttribution talk
- … I’d love to explore EventTiming. E.g. UI latency till the change given an interaction
- Bas: So why not expose this to developers?
- Michal: LCP after every interaction seems valuable but there’s something unique about soft navigation
- … The URL give you the view you are own
- … That’s the distinction between every event and all the interactions
- Bas: You’re assuming that developers are using the URL
- Nic: In our RUM monitoring tool, we’re doing similar things. Trying to do that in JS
- … There are always exceptions where developers think we should measure something where we don’t, and cases where we shouldn’t measure something
- … The browser can be smarter and has more tools, but it can be complex
- Bas: Weird things, e.g. content creator changes the URL in some way and their data changes because they fell off the heuristic cliff
- … Concerned about the stuff Nic described
- NoamR: The level of expectation from this solution is that it gives a reasonable default for the common cases. It can’t do edge cases
- Bas: What’s the downside?
- Michal: I work for a company that shares RUM data with developers. The more you slice a page, the data is less accurate.
- … If you publish an SPA, you’re getting twisted results. SPA may not navigate as much
- … So the navigations we see, may look worse
- Alex: I’m interested in solutions where developers can measure their things. Not interested in solutions to measure the web as is
- Michal: We’d like to expose this data to the timeline
- Bas: Having this in CrUX data that this would motivate developers to improve this
- Michal: Saw it in a case after case
- NoamR: People that have a choice regarding their application architecture, this gives them data to inform their decision
- Michal: Many frameworks that have an MPA/SPA checkbox, and there are cases where the current dashboard
- Yoav: There is no way for frameworks to designate how to opt-in to a specific nav
- ... Or will it enable a dataset like CrUX to connect data from everyone?
- ... Maybe we should create, seems orthogonal
- Bas: Need opt-in and opt-out
- Michal: Current limitations
- … For the paints we’re measuring we have some attribution issues
- … Choose to observe certain types of interactions: we will observe certain interactions
- … But if you’re using the wrong hooks, you might get lost
- … The team that’s working on AsyncContext specified their web API integrations
- … When we’re in a task that’s attributed, we’re observing every change to the DOM
- … If you clone a template, its children don’t get tracked
- … We can recursively walk the entire DOM, but container timing is related. We could say we’re interested in all the children of a container
- … For efficiency you only need to understand the region, but currently we’re doing that on the output
- … We think we know what the next steps are
- … The other half of the bottle is increasing the quality of coverage
- … A soft navigation changes some of the content. If there’s an image tag that changed in subtle ways, what do you do?
- … There are wide gaps between what users would consider soft navigations and what Chrome considers a soft nav
- … If I soft navigate on a page, and half the page changes and a giant image on the page hasn’t changed - if you soft navigated to that page you’d get a different LCP than a hard navigation
- … So LCP is not the same kind of metric
- … Other issues - scroll restoration
- … It’s very common for SPAs to manually restore scroll
- … But we define our metrics to stop when scroll starts
- Yoav: Open issue Sean opened on LCP and programmatic scrolls
- Sean: Not directly related
- Michal: Reality is many SPAs will very quickly back-nav, then scroll to point
- ... Documentation sites are messy to not do it
- ... Very common to have nav start, scroll restoration, then painting
- ... Still have to make hard decisions on performance timeline
- ... Retrospectively we know nav started, fetch, etc
- ... Time history changed (URL), time where enough paint, heuristics met
- ... Done enough book keeping, gathered enough evidence
- ... Stream of paints to get LCP
- ... Some of these things we only know in hindsight.
- ... Previously talked about exposing navigation ID
- ... But we have a problem where we can't assign navigation ID before we have one
- ... Ideally you would say, which nav it belongs to, slice and dice and bucket
- ... May take 10 seconds
- ... Reason we're trying to measure is because they're slow
- ... 10s is not unheard of for many users
- ... Time-based cutoff not work for every user
- ... Trying to help developers make choices
- ... With every new nav you can summarize previous one, etc
- ... Worry about having to buffer the world
- Bas: Easier if author of page indicating soft nav started
- Michal: Perhaps by exposing primitives they could do an easier job
- <demo of live SPAs>
- … While we gather evidence, only the green square counts, then all the squares count. I don’t like it
- … It would be cleaner to stick to the green ones
- Tim: Thanks and I appreciate that this is messy, but it’s a super important problem
- … Developers don’t know how to do this, frameworks don’t know how to do this
- … So developers have been making architectural decisions for the last decade or more
- … I feel that having good way to measure this is super critical for the web
Long Animation Frames: Status & what’s next
Summary
- Some renewed interest in desiredStartTime (the queuing time of the task)
- See also the style/layout brainstorm, and performance.bind()
Minutes
- NoamR: LoAF shipped and in use by a few RUM providers and some sites
- … Wix are using it internally. We get a lot of feedback for it
- … Used it and found scripts that are doing bad things. First parties that use this to catch e.g. ad providers and blame them for slowness
- … Main paint points
- … a script that wraps everything gets blamed for others. LoAF only gives you entry points and it’s not good enough
- … Microtasks are weird and hard to measure. We should figure out a way to measure them
- … Similar with entry points that are promise resolvers (e.g. fetch.then())
- … Most of the is a single task, so that’s often fine
- … Today we measure from the time the promise was resolved until the time the microtask queue was completely cleared
- … If we have a way to attribute microtasks to their originator that could be a solution for this
- … No style/layout info, no presentation time for LoAFs
- … Exploring ability to attach data to microtasks
- … performance.bind so that blame can be transferred to the right owner
- … style and layout attribution
- Yoav: Talking about CPED, can we cover what is that?
- Michal: We'll cover in the next session on task attribution
- Alex: Purple box?
- Bas: Layout is purple in dev tools
- NoamR: Screenshots of large purple boxes, why is it that long?
- Michal: We started with assumptions of (e.g. too much JS on the web), but purple comes up often. Lack of presentation times, etc.
- NoamR: We started with script (yellow) because it's the most obvious one with actionable output
- Bas: Considering limitations, without proper task attribution, it's biased towards purple
- ... Complicated apps with a lot of JS, -- hm, would LoAF capture that?
- NoamR: Yes
- Bas: A lot of the time, janks are caused by unrelated scripts
- NoamR: Need to correlate event
- Michal: Today LoAF start time, starts w/ first task that ends up producing the update. But there's input delay.
- NoamR: Property firstUiTimestamp
- Michal: For arbitrary task there's a queuing delay
- NoamR: We removed it
- Bas: Once we've produced a frame, makes paint the highest priority there is
- ... We don't start scripts until we've done the work need to do
- Michal: In Chromium if you rAF, enough to trigger start of a LoAF
- ... setTimeout(x), compute a bunch of a stuff in memory, finish running it's not a LoAF
- ... LongTask
- NoamR: LoAF without renderStart
- Michal: rAF with callback, measure at that beginning
- NoamR: Unrelated task that delayed something that produced rendering, LoAF starts after that one
- Bas: That's not great
- NoamR: Can be impossible to know
- ... Had property on LoAF in past, confusing to people
- ... desiredStartTime, queueing delay
- Bas: Looked at time task was posted to event loop
- Michal: If page was idle it would have run
- NoamR: We could look into adding it again
- Bas: Perfect is the enemy of good
- Michal: Good news is if you have long input delay, you will probably be in the same LoAF or have the same LoAF as before
- ... A lot of developers are doing this manually
- NoamR: Very specific set of problems
- Bas: If you have a task that does a lot of JS and schedules and updates, does it go back?
- NoamR: All the scripts from that task going back
- Bas: For every task you need to bookkeep scripts regardless as you don't know in advance
- NoamR: Only long scripts that are over 50ms or more. Need every script entry point.
- Bas: Low precision timer to determine if you need it? Can't do that either
- Michal: High-level goal is you're blocking the event loop, and if you yield, the browser has a chance to reflect rendered output. Browsers may choose to not do this. Just a task. If long, it gets reported.
- ... If visual update and we need rendering, we'll measure to the end of those phases
- ... Get full end-to-end delay
- ... Task boundary is still interesting
- NoamR: totalDuration and blockingDuration
- Bas: We'll find out when we implement
- Michal: Tooling, time+effort spent in this is leading to improvements in lab tooling
- NoamH: Regarding bind(), idea in some form, it would be very useful to integrate, the main use case I see is not just adding where they expect, I see it as a way to automatically inject tracing points to production in a declarative way in config. No code change.
Protobuf encode/decode
Slides
Summary
- Smaller analytics script => higher collection volumes
- Protobuf is significantly smaller than json
- Unclear what the schema would be - do we need to predefine it? Define it in the API?
- Need to compare with compression, especially in light of CompressionStreams and brotli support
- If compression is hard, we should make it easier
Minutes
- Colin: Talk about work we've done with performance measurements and musing around Protobuf
- ... Moved measurement from boomerang.js to web-vitals.js
- ... Resulted in 2% increase in volume of metrics
- ... 5-10% changes in LCP (at p75) from survivorship bias
- ... Many times that was an improvement
- ... Example payloads
- ... NavTiming and ResourceTiming metrics
- ... Can we use Protobuf instead of JSON for that payload?
- ... 2.5 KB + minimum
- ... Protobuf is basically JSON with a pre-shared schema
- ... Majority of payload is data not preamble or keys
- ... Future forward and backward compatibility
- ... New consumers can read it, backward compatible, old readers can read new messages
- ... Schema can evolve gracefully
- ... Example for beacon
- ...
- ... Under a quarter of the size
- ... Only payload on wire is just the bytes
- ... Have to encode via protobuf.js or google-protobuf.js, ~20 kb on the wire
- ... Exploring Protobuf as a first-class citizen just like JSON encoding is
- ... Similar byte compression with Br or other compression algos
- ... WIth Compression Streams available but in WW
- ... Chrome already has Protobuf as a vendored dependency, so not a net new thing
- ... Question for room if is anyone else using this
- ... Other use-cases worth exploring
- Bas: Not a concern, Firefox already ships Protobuf
- ... We use Google implementation for Protobuf
- Michal: Significant difference in developer experience?
- ... Does it eventually turn into JSON or a different message format?
- Colin: Just on the wire
- ... Data payload, decode providing schema you have
- Jeremy: A specific format, or just Protobuf bindings for JS
- ... Encode and decode bindings.
- ... Proto-lite may not include reflection API
- ... There would be code to add to do this
- ... Would we ingest Proto schema in some way?
- Colin: Don't ship generated code, but schema itself
- Nic: How large is the .proto?
- Colin: 300 bytes .proto
- Jeremy: If only encoding this for your particular format, and you don't need decoding, have you looked at minimal things? Handful of KB maybe?
- Bas: Shipping KB in cache, per-origin
- Alex: What issues with Compression Stream?
- Colin: Just from WebWorker?
- Jeremy: Compression Stream not quite as small as Protobuf, does it work?
- Alex: We could get close to similar to Compression Streams somehow
- ... Niche field
- Nic: we’ve looked into protobuf as well, and use it for sending data from non-web context
- … we prefer to be able to use compression streams or have the browser be able to upload with compression
- … but if it were available, we’d probably be using it as well
- Colin: Are you shipping with an encoder?
- Nic: mobile SDK so using the native stuff
- … Also experimented with JS URL which helps pack data in some cases
- … I feel like compression streams would be the easiest thing
- NoamH: We explored it as a way to propagate state from backend to client. Wanted to avoid transforming binary payloads and avoid expensive JSON parsing on the client
- … We were thinking of prototyping keeping the buffer as is and write a thin wrapper around it, and only read the data as needed
- … Didn’t consider not to compress it, as some of the fields include the payload that’s highly compressible
- Bas: So you had a large binary blob you didn’t want to iterate over in the client?
- NoamH: Yeah, another use case
- Jeremy: You’d still need to turn it into a JS object, protobuf is not great at random access.
- Philip: What content type are you using for sending data back? Did you run into CORS preflights?
- Colin: In my experiments I didn’t run into those problems
- Philip: We run into that when using anything that’s not URL encoded
- Anne: That is a thing we can solve. A bunch of ideas around TLS extensions where the server opts-in to any requests. Heavy handed but could work
- Yoav: Some interest in Protobuf. Is it this vs. Compression or this+Compression?
- ... In terms of schema, open questions we need to answer
- ... My sense is there's demand
- Bas: I felt like I heard they weren't super keen on it
- Alex: Can you define a proto schema
- Jeremy: A text format, that can be converted to a schema
- Alex: Most of the benefit can be gotten just by Compression Streams
- ... Are ways to get even tighter than Protobuf
- Anne: Structured serialization at the byte level
- Jeremy: Scary
- Bas: Can choose to not support things
- Anne: Transfer cannot do, from server to device
- Yoav: Start with compression then see what's the delta beyond that?
- Alex: Protobuf is also popular and it has a space between most optimal and JSON
- Bas: Webkit has Protobuf
- Alex: Not sure if we have one already as a dependency?
- Anne: Problem with JSON?
- Colin: Too big, size, probability of getting those transmitted
- ... Compression Streams with Beacon API don't work together
- Anne: If we have fetchLater() that would work?
- Colin: Yes
- Jeremy: Do we not expose sync compression?
- Nic: I think to avoid it in unload handlers
- ... Would love to have fetchLater() or sendBeacon() have a compress:true option
- Yoav: Sync compression on main thread, even limited in size, reach a point where it's clunky
- ... Finding a way to use compression streams and beacon things up makes more sense
- NoamH: When compressing on main thread using JS library, see you have high variance on higher percentiles. "Only 30ms, it's OK". For some users it takes 300ms.
- Bas: Memory bus bound
- Nic: For beaconing cases like fetchLater(), we have some beacon size limits that may help
- Simon: For fetchLater() can we add plzCompress?
- Yoav: Open issue for it
- ... Not way to go for queuing a beacon
- ... Only compress at end, when sending, not during
- Nic: https://github.com/w3c/beacon/issues/72 has some past discussions on this
- Anne: We should see about building this into fetch() itself if possible, not just fetchLater()
Task Attribution: what is it, and how it might help beyond soft-navs (...RUM tooling, LCP hover, async event timing, resource initiators…)
Summary
Minutes
- Scott: We want to track task state from some starting point to all future tasks
- … e.g. soft navigation
- … That breaks down into two parts - tasks and microtasks
- …
- … we have a context created in the beginning and we want to ensure it’s maintained
- …
- … v8 api that maintains that state
- … When a promise is created, this data is stored with it and when a reaction run its restored with it
- … Anne: Why is this just microtasks?
- Scott: For promises, we do this thing
- Anne: some are resolved in the same task. More often we queue a task
- Michal: There’s a “macro task” scheduled and within it there’s a microtask
- Scott: No need to propagate in the task
- … This just resolves a promise, but the state is already captured in “going to be restored”
- … In that microtask checkpoint there can be other continuations
- … there’s some overlap with APIs that are async across tasks
- Michal: If I save a scheduler.yield Promise to a variable, and later have multiple sync functions await on that same promise…
- … That pattern is not going to be common with scheduler.yield, but e.g. fetch() caches are very common, where the original promise is returned if already available (Note: React use() hook requires the exact same reference and has infra for caching any async value)
- Anne: Or document.ready
- Scott: Motivation was that we wanted to handle async functions. Wanted to handle the continuation case
- … context after each line should be the same
- Michal: The idea of initial context in terms of writing to it, and then we’re preserving it. You’re not creating new contexts
- Bas: What's the state?
- Scott: Some data associated with the task
- Michal: If set by something, it gets preserved
- Scott: If this is set by postTask, the state would be the priority
- NoamH: So the developer doesn’t need to propagate the context. Like AsyncContext
- Michal: AsyncContext exposes this to web developers
- Scott: this solves part of the problem
- … Bunch of APIs that don’t use promises. They schedule a task with a callback or events that may fire
- … For scheduling CPED was sufficient
- … For TaskAttribution this had to go further than that
- … Initially we created a full tree of tasks, but now we grab the current state and we propagate it
- … and restore it when invoking JS
- … For XHR it was a little bit trickier
- NoamH: on all the events?
- Scott: yes
- Michal: older APIs are slightly harder to get right
- Scott: Tricky to generalize. E.g. storage API, which we haven’t handled yet
- … started with all the scheduling API and that caught a lot of things
- … but e.g. React uses postMessage which is doing an IPC hop
- Yoav: IPC hop in Chromium
- Jeremy: for MessageChannel specifically -- on a local WindowProxy we don't even get that far
- Scott: there’s a shortcut in mojo
- Bas: In Firefox IPC layer does it, but maybe not an actual IPC
- Anne: Not great if it doesn’t work in XHR
- Michal: if there are edge cases, documenting them can be sufficient
- Scott: AsyncContext is a general proposal for this, working on that in parallel
- … CPED made it easier for them. We looked at the problem and wondered what do we do to make this generic
- … We want to migrate this to AsyncContext, but there are different cases that may want different semantics
- … e.g. you may want to see all the work triggered by that interaction, but you may not want to see everything
- … If you want to get visibility into a certain interaction, you may not get all of it
- Michal: One effect happens but it’s a result of multiple things
- … If I construct a new Promise, schedule work and then the work resolves the promise.
- … If different actors did those things, who made it happen?
- … It’s sometimes clear and sometimes less so
- Scott: AbortSignal is an interesting case
- … Event listener for that. The reason to abort happens later
- … Folks want this to be the registration time but that may not be ideal
- Anne: the tricky thing with events is finding a general model so that new things get the right thing done for them
- … Platform API designers may not pick the right thing
- Michal: Eventually we’d regret any decision
- Anne: For event in particular, you’d always capture the context at addEventListener time, but for specific events where we want a different context, we can expose it on the event object
- Bas: So a single execution task can have multiple things that triggered it, but it can’t split. Can it have more than one state?
- Michal: Easier to see with the public API
- … reference to a certain value and you expect the value to not change. This gives you the consistency
- … If you have a different reference, you can always choose to switch to using a different value
- … it would be flexible, but can it be fast?
- Bas: In that case, it can work, but what about other cases
- Anne: events were the one thing where people’s requirements were all over the place
- …You want something that’s the same across events
- Michal: You can subscribe to a stream, but maybe it’s like an event?
- Scott: Important for the SPA navigation. We have heuristics and we have to solve this problem
- … I’d like to align with AsyncContext and make it work for us
- Michal: There’s asyncLocalStorage that builds on the same capabilities. A large class of consumers for that. AsyncContext tries to be more convenient and it’s not yet set in stone
- Anne: AsyncLocalStorage will be subsumed by it
- Bas: Confused about the story of AsyncContext in relation to this
- Anne: independent efforts
- Bas: how do we consolidate
- Anne: once there’s AsyncContext in JS, we can build on it
- Scott: when the click happens we set AsyncContext on it
- Michal: If you load a page we measure LCP. If you interact with the page, you might have modified the page so we stop LCP. But if you hover, an image loaded can still be an LCP candidate
- … Imagine we could know that a DOM node was modified by the hover. We could use this to ignore that element as LCP
- NoamH: Plan to capture the stack?
- Michal: No
- Scott: we only wanted to expose this in aggregate ways
- Michal: it’d be exposed only if AsyncContext ends up shipping
- … talked about Soft Navs as the full effect of an interaction
- … We could extend eventTiming to be more powerful
- … “Why is this resource being requested?”
- … etc
- Bas: From a browser perspective there are a lot of reasons this information could be useful. That sounds great
- Michal: Talked about LoAF and time spent in script. It’s very useful to know that a single task awaited and resumed. That subset of the feature is very easy to build and very powerful
- … You don’t need to API to solve LoAF attribution problem
- Guohui: if we consider exposing this to the developer.
- Michal: That’s AsyncContext with use cases
- Bas: IIUC, you prefer not to wait for that
- Michal: Parallel streams
- Scott: they are trying to refactor AsyncContext on top of TaskAttribution
- Michal: we’re trying to work with them to align semantics
- Bas: concrete spec proposals that rely on this
- Scott: yield relies on CPED
Summary
- Lots of things to be improved about JS loading
- When they load/execute, should they block onload, bundling tradeoffs, import map usability, blocking execution, code caching, differential serving
- Discussed ideas to improve the above
Minutes
- Yoav: Rant on JS loading, maybe make it suck less
- ... A lot of times when talking about web perf, we're skating to where the puck was
- ... A good chunk of the web is JS
- ... Getting worse/bigger
- ... No good way to load JS
- ... Developer teams ask best way to load in specific scenario
- ... Compromises we can live with
- ... Blocking scripts still suck, block parser and rendering
- ... Developers shouldn't be using them. async is is race-y
- ... Once script hits browser, it may-or-may-not be a good time to execute them
- ... Cases where loading at the long time earlier/later can degrade perf
- ... Deferred scripts we know the order they run at, they execute late in page loading process
- ... All execute in single task that's long
- ... Conversations about improving that, at least in Chromium
- ... Not clear if it's web-compatible to change to run in multiple tasks
- ... ES modules have a discoverability problem
- ... Creates a waterfall
- ... Don't start executing everything until we have the entire module tree
- ... Not the fastest option
- ... Tell developers to cache their JavaScript, immutable, change file URL with versions or hashes in order to apply changes
- ... Versioning essentially bubbles up either with explicit ES modules or implicit classic scripts
- ... Each reference bubbles up, people use hashes, content-addressed hashes change all the way up the three
- ... Import maps save us, but are fragile
- ... Cannot have more than one at the moment, can't have import maps load after modules
- ... Multiple actors involved in creation of the page can lead to breakage
- ... Import maps are a blocking resource
- ... You need to load a map of all modules this page will ever see as the first thing in your HTML
- ... Bundling improves things, reduces number of requests, reduces compression of larger bodies of text
- ... Also has its trade-offs
- ... Once bundle, every single change in any single file in bundle invalidates entirety of it
- ... Has execution costs, since execution doesn't happen until it's all downloaded
- ... Silver lining, there's blocking=render
- ... Gives browser some indication of what that script does, know this particular async script is actually important and should load ASAP
- ... fetchpriority is also a way to provide hints to the browser
- ... We don't have "executionrpiority", can't presume one from the other
- ... Dynamic imports, working well for lazy-loading large JS apps,
- ... No way for language support w/out UA sniffing
- ... Requires not using until broadly supported for all browsers in support matrix, or do UA sniffing with multiple bundles
- ... defer vs. async -> depends on context, continuous trade-offs, despite race-y ness
- ... Then use custom loading for stuff later on
- ... UA sniffing required
- ... discoverability with modules, multiple layers of waterfall. Small files compress poorly
- ... per-request overhead that is being paid
- ... Have to wait for all to execute
- ... Solved with bundles, which are not streamed, code cache invalidations frequently
- ... Import maps blocking
- ... Not enough controls for execution
- ... Should that script block onload or not?
- ... Multiple past proposals
- ... Resource orchestration from a few years ago
- ... blocking=render maybe resembles it
- ... WebBundles JS serving ideas, plus some open HTML issues around options for yielding to event loop
- ... None of this happened so far
- ... Compression dictionaries could solve some of this one the wire
- ... Small changes remain small on the network
- ... Dynamic import maps being worked on, changing import maps so there can be more than one
- ... Not necessarily as first thing before modules, removes fragility and as a blocking resource
- ... Multiple phases of import maps instead of a huge one at the top
- ... Module declarations and streamed execution, wanting to prototype. Hacked version of Chromium allows you to define inline module specifiers.
- ... Ability to import an inline module into a different module tree. Trying to get streamed execution of a bundle in JavaScript. Prototyping on Chromium and Vite and they're not super flexible
- Jeremy: I think Rollup is the underlying bundler in Vite?
- ... Issues on bundler side
- ... Requires some opt-ins and streamed execution in some way
- ... Would be great to have per-module or per-function code caching
- ... If we start breaking up, sending down modules, not a meaningful boundary
- ... New keywords for easy loading of scripts to enable non-race-y execution that loads earlier than DCL
- ... No conclusions on differential script loading
- Bas: What is differential script loading?
- Yoav: Load ES2015 scripts vs. current
- ... New JS feature, how do we deploy that without 2 complete copies
- Bas: At Mozilla years back, there was BinAST, transferring mildly-compiled JavaScript, seems better than text it seems. Why did that fail?
- Yoav: Not on loading but on execution side
- ... Interesting but orthogonal
- ... Compression Dictionaries helps loading
- Bas: This could help as well because it can be smaller
- Jase: Could you explain apps need more than 1 import map, dynamic import maps, how does that work to load after?
- Yoav: Need it to load some modules but not all modules
- ... Real-life cases, platform code that is loaded as a module, merchant theme-code that loaded an import map after that. Everything breaks, multiple actors that aren't coordinated.
- ... Invest effort in coordinating at the back-end, a lot of complexity that's unneeded.
- ... Why would apps need more than a single import map? Multiple actors. Or e.g. apps with 5k modules, maybe don't need them all at the start. Need dynamic imports so need dynamic import maps
- ... Expanding on that, when they know they need this part of it as well
- Jase: To just not have a huge map?
- Yoav: Yes, and for some modules you may never load
- NoamH: Is the concern the import map too big and blocking, delays resource loading?
- Yoav: Yes
- NoamH: Import map, or a bunch of script tags. For core resources. Would it help if it was sent earlier in Link?
- Yoav: All inline today
- NoamH: Early Hint import map? In headers maybe?
- Yoav: I don't think that's the main bottleneck
- ... Be above modules script that depend on it
- ... But other modules can be before it that don't depend on it
- Jeremy: Modules have a waterfall problem. link rel=modulepreload could be used?
- Yoav: Module preload allows you to flatten the tree
- Jase: Multiple import maps, mental model is appending more modules later on
- ... Same mapping just dynamically appends later?
- Yoav: Single global map you merge and you map into
- Bas: Excited most about reducing traffic that it takes to load a webpage
- Yoav: Better bundling and Compression Dictionaries are a way to help there
- ... Deploying it is going to take a while at scale
Summary
- Pivoting to a Monitoring APIs design principles and how they fit with other Principle docs
- Discussion about what that means in practice
- To be continued
Minutes
- Yoav: Previously we talked about some form of monitoring principles, or observability principles, etc
- ... Somewhat as a result of discussions earlier around Privacy Principles document
- ... A few of us talked through this
- ... My thinking has changed somewhat around this
- ... We have Web Platform Design Principles
- ... Document that outlines how one should design APIs
- ... A more recent is Privacy Principles document
- ... Some friction around ancillary data
- ... In document there is text that says there is no agreement around what happens with ancillary APIs that are using existing information
- ... Principles doc says that we "agree to disagree"
- Bas: What is an ancillary API?
- Yoav: PP doc defines ancillary data as uses not-functional on the site
- ... Ancillary API provides ancillary data
- Bas: LCP?
- Yoav: Yes
- ... Started doc around principles for this group to look into, and codify as ways we should define APIs
- ... Nic also put together other principles that make sense
- ... What we should be doing as a group, my initial thinking is we'd a Principles document to balance out other documents
- ... Enable us to have discussions around features and what they satisfy
- ... Privacy vs. utility in terms of monitoring
- ... But in talking to folks, what we really need a document that codifies best practices for monitoring APIs and clarifies how those set of principles align with other principle documents
- ... So we don't have any disagreements with other Principles documents, we just need to clarify our stance on various points
- ... In particular around ancillary APIs
- ... Interesting subject to see even if we can't react consensus with Privacy task force, we reach consensus amongst us
- ... Same task force to hash through that kind of document, but spin it to be something more than what we initially thought. Bring forward to this group. Add to charter.
- Bas: TL;DR on what Privacy task force and their doc didn't align with this WG
- Yoav: If information is available, if someone wants to abuse that information, and providing that information through monitoring dedicated means does not make the abuse any cheaper or simpler.
- ... Argument on the other hand, providing that info, if we take LongTasks vs. setTimeout loops, the latter are costly on user, but enable you to measure the same thing effectively.
- ... Not exposing fundamentally new information
- ... Other side claimed if we're making it cheaper, more people might do it
- ... Where my/others claim is that people will get the information they want, and won't regard user cost as a deterrent
- Michal: Privacy Principles doc is great, should read it, by design there is a range of interpretations.
- ... Contradictions in the doc itself. Talked with folks like Jeffrey who had a hand in writing it.
- ... How we read and think about them and why, groups would create their own addendum documents, and others have done so
- ... We are not trying to be an exception to the rule
- Bas: We we to agree on an interpretation
- Michal: A range of individuals can interpret it on a spectrum
- ... Document expresses both at length how the user has a big cost when they have to make decisions and choices, and the person presenting information can influence. Safe defaults are good.
- ... Other group thinks that only allowing user to choose for each API
- ... Devil is in the details
- ... Not even one right answer in every situation
- NoamH: Performance is not considered user functionality might be one thing
- Michal: Another one that is contentious, ack this data is already readily available, but takes significant effort to collect. Making it easier would increase the rate of collection.
- Yoav: Not effort, but cost
- Michal: Carve-outs, really well written. Even the task force does not have consensus.
- Alex: Many arguments are of the form there's a side-channel where you can already get this info, so let's add a JS API to collect it more. Questions around should we allow that? Close side-channel or no, it's not exposing any additional information.
- Michal: Principle if there's a fingerprinting risk we should address it
- ... OTOH, there's a contingency that thinks we can opt out of all these features and there's no cost to us, and someone else will bear that cost
- Yoav: Clear example of ancillary API, Reporting API as infrastructure vs fetch(). No one is talking about removing fetch() from the platform. Reporting API is doing fetch() but for reporting purposes. You can argue about specific features and whether they're using ancillary data. But I've heard Reporting API should be behind a user opt-in.
- ... But then people will just use to move fetch()
- Anne: For Reporting, for CSP reports, you can implement in fetch()
- ... Crash reporting is user opt-in
- Yoav: Specific features we can argue about, but Reporting as an infrastructure is not different from fetch()
- NoamH: Reporting API can send data to URL without any ability to intercept.
- Bas: How do you gauge the sensitivity of data?
- ... Fingerprinting or PII is easy
- ... Within Mozilla we have complicated 3 categories
- ... "gives you additional data is broad"
- Yoav: There isn't a distinction as far as I can tell in PP document
- ... From e.g. "data" to "private data"
- ... Hard distinction
- Bas: Reason it's difficult is a reason to think about it
- Anne: Example of a shared worker as potentially sensitive
- Bas: If it's a hard line we should shut the web down
- Michal: Examples in PP document that help identify the pillars
- ... Easy thing to bikeshed over
- ... Stated "Web APIs designed to support user's immediate goals are OK"
- ... e.g. DOM events
- Bas: API solving immediate goal but sharing all private information to the website
- Yoav: Other principles to balance it
- Michal: If we are serving users' immediate goals by serving the quality of the platform at large by user experience and performance
- ... WIthin constraints of privacy principles, that has to be done, and there's some to do better
- ... If we remove wholesale all monitoring, that does not serve users' goals
- Jeremy: Common areas disagreement. "The user" vs. "users"
- ... Aggregate performance of users better if the developer puts that data to use
- ... That's why some privacy people approach from individual user
- Bas: Has there been some consideration around a group of APIs that are opt-out?
- Michal: Some principles in document and maybe we have some gaps that can be addressed
- ... Plenty of sections citing research of negative repercussions of informing user and letting them choose
- ... Work here isn't to not abide by these principles, what are trade-offs
- Anne: Time Check
Day 5 - Friday
(joint meetings with WHATWG)
Minutes
fetchLater
Summary
- After a lengthy discussion, we reached a compromise where every 3P iframe would get a small (e.g. 8KB) budget from the top-level frame without any opt-ins.
Scheduling APIs
Summary
Summary
- No support for making brotli encoding optional
- Even though it seems safe for developers to only have decompression
- Only Chrome is sensitive to extra binary size there
- WebKit is planning to add brotli support soon
- Support for adding compression parameters