WebPerf WG F2F June 2019 Minutes

WebPerf WG F2F June 2019 Minutes        1

Noam Helfman - Microsoft: Gaps in web perf metrics: navigation, resource loading, interactivity        1

Nic Jansma - Akamai: Akamai and WebPerf        2

Gilles Dubuc - Wikimedia: Web Performance        4

Steve Bourgon and Nolan Lawson - Salesforce        5

Thomas Kelly - Shopify        7

Amiya Gupta - Microsoft: Identifying ad interference with page load        9

Beng Eu and Warren Godone Marsesca - WebPerf @ Google Ads        11

Ulan Degenbaev - Google: JavaScript memory API        14

Scheduling APIs        16

CPU reporting        18

Metrics for Single Page Apps        19

Group processes        22

Participants

Ilya Grigorik, Tim Dresser, Nicolás Peña Moreno, Sam Burnett, Staphany Park, Yoav Weiss, Steven Bougon, Andrew Comminos, Thomas Kelly, Todd Reifsteck, Philippe Le Hegaret, Ulan Degenbaev, Nic Jansma, Gilles Dubuc, Ben Kelly, Noam Helfman, Nathan Schloss, Ryosuke Niwa, Wooseok Jeong, Warren Godone Maresca, Beng Eu, Scott Haseley, Benjamin De Kosnik, Steve Souders, Amiya Gupta, Philipp Weis, Patrick Meenan, Nolan Lawson, Nicole Sullivan, Vikram Shankar, Andrew Clark

Gaps in web perf metrics: navigation, resource loading, interactivity - Noam Helfman (Microsoft Excel)

Slides, video

Talk

Questions

Ryosuke: what about mobile?

Noam: They use native apps, so presumably use APIs available there.

Yoav: what about paint timing?

Noam: we want per-frame information.

Tim: Element Timing exposes the paint time of new elements.

Noam: we also want long tasks caused by JS.

Can use the long tasks API to look at that, but attribution sucks.

Tim: hopefully in the future we can use the sampling profiler to expose better attribution.

Ilya: transfer size should already be available.

Noam: handling redirects is very tricky, need to pass on the timestamps, no standard way.

Yoav: re cached boolean, you can look at transfer size, but need some heuristics around that.

Noam: heuristic we use is time (if it’s very short, very likely from cache).

Yoav: yes but there can also be long cache reads.

Ben: also, service worker could be serving from cache, but could take time to serve it.

Akamai and WebPerf - Nic Jansma (Akamai)

Slides, video

Talk

Akamai is a CDN which does media delivery, performance, security, monitoring. mPulse is a real user monitoring product which can aggregate to a beacon. It measures more than a billion user experiences per day. It currently uses NavTiming, ResourceTiming, UserTiming, PerfTimeline, LongTasks, PaintTiming, EventTiming, ReportingAPI, etc.

Some challenges:

Top issues for ResourceTiming:

Top other issues

Love the new specs: long tasks, event timing, element timing, layout stability, JS self profiling, Javascript Memory API.

Links:

Questions

Yoav: are the bugs covered by web platform tests?

Nic: I’ve filed several of the bugs. A lot are fixed. We can do a better job at keeping track.

Steve: is there someplace where it’s not true that it would be good to ‘look back in time’ when the analytics script loaded and see past metrics?

Nic: for ResourceTiming we can, but want to make sure this is considered for anything going forward. We want metrics to be buffered.

Steve: what about long tasks?

Nic: Correct, so we need to inject script early to start collecting data ahead of time.

Steve: In ResourceTiming, for cross-origin resources the time includes queueing time, which is very misleading. It does not seem like a security issue to break those two apart.

Nic: Is there an issue for that on the RT repo?

Yoav: not sure if there is an issue in GitHub, may be in the backlog, not L2 blocker.

Tim: we have a proposal for header registration for long tasks.

Steve: we have to initiate the PerformanceObserver in the snippet and customers don’t put it at the top. And there’s nothing we can do about that.

Nicolás: the alternative is to always measure so we can have a buffer for when it is requested.

Yoav: That doesn’t work for expensive metrics, e.g. longtasks may be expensive to measure.

Tim: will be more expensive with attribution.

Gilles Dubuc - Wikimedia: Web Performance

Slides, video

Talk

We use everything except for long tasks (for reasons discussed before). We have public RUM dashboards and do perf regression monitoring. Some top suggestions:

Questions

Benjamin: You talked about categories, can you clarify?

Gilles: we need to know where long tasks come from (rendering, etc.)

Ryosuke: can you clarify about the Reporting API?

Gilles: right now we need to report JS errors manually, so load a library to do that. This should be done by the Reporting API.

Noam: +1 to avoid using expensive JS libraries.

Steve Souders: buffering can be an issue here as well.

Yoav: it is an issue for ReportingObserver, but not for reports that are beaconed to an endpoint.

Steve Souders: right now, if the onerror handler gets installed too late, you’re potentially missing errors.

Ilya: we recently added DeprecationReport. Someone should aggregate and put in a standard.

Yoav: maybe we should have header/HTML registration and ideally you can collect in a single endpoint.

Ilya: have you looked at the RTT and downlink data?

Gilles: no I don’t think so, we will.

Ilya: Also, User Timing L3 should give you what you need with regard to custom entries.

Benjamin: memory pressure or device info would be useful?

Gilles: yea, can’t derive much right now from UA string, for instance battery level. We want something that tells us how powerful the device is right now.

Yoav: UA string is likely to lose entropy in the near future, at least by default. does device memory give a good enough correlation to CPU?

Benjamin/Gilles: no, need both.

Todd: so we want slow/medium/fast. Hard to figure out useful grade that does not leak a lot of information.

Gilles: even something in coarse buckets would be good to know.

Yoav: We need coarse enough buckets to be barely useful, to avoid adding unneeded entropy

Benjamin: Need to build those buckets from data

Noam: bucketing is hard, maybe use some sort of baselining.

Gilles: We can run a benchmark in a worker, but want to know if improvements are due to website improvement or device improvement over a long period.

Noam: Can we expose CPU cycles?

Pat: Not enough data as there’s GPU, I/O, etc. also how many of these benchmarks do we want?

Yoav: can’t run these benchmarks on the user device, as we’d drain their battery . We need to collect that data passively

Benjamin: Unless we expose something on that front, that’s what devs are currently doing

Ilya:  For images, that what sites used to do: fetch a known sized image to determine the bandwidth.

Pat: For fingerprinting/entropy, it might be fine to expose as long as developers can already actively measure it

Yoav: Maybe. It also has to be active rather than passive fingerprinting. Sites need to explicitly ask for it.

Noam: also sensitive to timing attacks.

Todd: we should gather use cases and understand how people are solving them.

Steve Bourgon and Nolan Lawson (Salesforce)

Slides, video

Talk

Salesforce is a Web application (SPA). We transitioned from server-rendered HTML 5 years ago, and we are embracing web components. Performance is key to us. It is a metadata-driven app: need to dynamically add/remove components.

Current Perf API usage: high res timing, perf timeline, Resource/Nav/User Timing, beacon, page visibility. We have some in-process, some we do not use at all.

Suggestions:

Questions:

Pat: SPA navigations are also different from single-page navigations. E.g. no cleanup for inflight requests. Something broader to think about as we look at SPA navigations.

Yoav: Can cancellable fetches help?

Pat: Only if you initiated those requests with fetch. Not always the case.

Nolan: if there is XHR when we are navigating, we mark it as an outlier.

Noam: You mentioned out-of-process iframes. Chrome already supports cross-origin to be on different process, so is the suggestion for same origin?

Steven Bougon: correct.

Yoav: Currently it’s cross-site iframes that go into a separate process, but not same-site cross-origin ones. Maybe we can add that to the Cross-Origin-Opener-Policy proposals to enable a site to request isolation from same-origin.

Ryosuke: would need to disable document.domain.

Yoav: Sure, but a site can opt-in to disabling document.domain with a Feature Policy.

Ryosuke: Yeah, you’d need to set that in a header. Also, no access to any sync API that touches the opener would have to be disabled. We discussed COOP in the context of Spectre mitigations, but they may work for this as well. Maybe we can have header / iframe attribute.

Yoav: for browsers without site isolation, how hard would it be to make that work?

Ryosuke: need full site isolation because that iframe can contain other iframes.

Steve Souders: you are not using the long tasks API?

Steve Bougon: yea, because of attribution. We do not know where the tasks are coming from.

Tim: I was hoping it would be useful for regression detection.

Steve Bougon: we already have some regression detection. We’d love to know where the long tasks are coming from.

Todd: so even to use it for regression detection, there needs to be some attribution, esp. with how complex Salesforce is.

Nolan: not sure it captures style and layout?

Tim: it does capture style and layout. We fixed the spec.

Steve Souders: everything is just reported as unknown. We do find it really useful, even though we don’t know where it comes from.

Ulan: for GC API, everyone will start calling it and it would not be a good signal. We already try to do GC when the webpage becomes idle, so if it’s frequent problem then maybe there is some improvement to be done.

Steve Bougon: yes, some customers add and remove components, memory usage is through the roof. And we don’t want to force a GC, just a hint to the browser.

Thomas Kelly (Shopify)

Slides, video

Presentation

Performance is key. When Thomas started, quite different from using speedcurve and now, collecting data from all the stores, thanks to Lighthouse. Ability to dive deep through issue affecting all merchants.

Shopify is a CMS in a way. Online stores can be anything, so huge variance from merchant to merchant.

Online stores are built around themes, which are very fast out of the box. But get slower over time when folks add things. So need to automate optimizations so that folks don’t have to care about them.

Things need to be updated as issues are found, like the Chrome team.

The biggest problems:

ElementTiming: this API is very interesting, but tied to a specific HTML element. LCP may be a good way to get that information for users that don’t add the attribute.

Layout shift can help with images when completely implemented.

Client Hints for responsive images would also be great once implemented across browsers.

About WebP: is there a tradeoff between CPU and network ? That’s an unresolved question for us still.

Experience with Web Perf Group: wanted to introduce myself, I was late in the game, lots happened in 2017/2018. Shopify is involved in the Payment WG. My team and I are having challenges using the new APIs, learning from the specs, so wondering how can we bring up to speed new members, how to onboard new members. Maybe a mentorship ? Pairing with someone on issues in the repo ?

When it comes time to test new APIs: we would like to try ElementTiming with Origin Trial, but we have 500K domains.

Questions

Yoav: Are the different domains for Origin Trials *.shopify.com

Thomas: no. (“sellmydog.com” from Todd :-) )

Yoav: Around images, is there a reason why an upload process of images could not reformat images? or provide a hint?

Thomas: We have an open lighthouse issue on this. It’s hard to know which images should be PNGs and which ones shouldn’t. We use transparency as a way to tell them apart.

Yoav: You can use webp for that in supporting browsers.

Thomas: Yeah, but ~10% of the traffic is still going to be requesting PNGs.

Todd: Issue with signaling. There are 2 issues: surfacing the problem with requested formats and the upload process which could fix it in this case.

Thomas: it’s up to us to fix the automation and the upload process.

Todd: it’s not just the size. Lighthouse is a tool for automation but not RUM

Yoav: Feature policy might help: could look into bit per pixel, and above a threshold, would tell you to stop.

Todd: is there an API we haven’t exposed yet ?

Yoav: Would you turn a feature policy which would disable the download of images that are too large?

Thomas: yes, in the editor or as a reporting-only

Tim: Current FPs may have different ratios per file type

Yoav: So need to make it configurable?

Ilya: There’s an opportunity. At Google, we built a wordpress team to look into CMS. They have the same problem: apps are installed and we don’t know their perf due to server-side extensions and plugins. So, we went to WP and annotated it so we now know that one extension cost X MB. Maybe there is a shared format for these annotations so we can see them in devtools ?

Thomas: looking into something like that right now, with user-timing and lighthouse

Yoav: could we use server-timing?

Ilya: might be interesting use case for this (ST). We see this around all the CMS. For instance, I could instance a gallery extension, which would inject the 5th version of JQuery.

Todd: Open questions on how to standardize it and enforce the standard.

Identifying ad interference - Amiya Gupta (Microsoft News)

Slides, video

Presentation

Amiya works at microsoft News. MSN.com is monetized via ads, which add these challenges:

Identify bad ads:

Instrumentation: We user RAF + User Timing API to add a marker. We log for 30 seconds, and it’s sampled. We get some useful metrics: Time To Interactive. Max Frame Length: longest frame in that page, helps us identify bad ads.

Amiya is showing some graphs using the data, identifying the bad partner. Now he is showing how he identified a partner and helped them remove the flash tracker, showing immediate perf gains.

Downsides:

Questions

Ryosuke: attribution - is that for cross-origin or same origin iframes?

Amiya: Mostly same origin. Ad providers prefer friendly iframes

Todd: do the ad providers give an incentive to not be in the same iframe (vs 3rdParty)

Amiya: not sure, but there’s probably some monetary value there, as they can get more info as a first party iframe

Ryosuke: If the browser was offering frame timing, would that solve the use case for which you use rAF?

Amiya: yes, it would solve our use-case, especially if it buffered

Beng: Using a cross origin iframe at Google Ads shows a drop in revenue, because the ads are slower. We are trying to encourage them to use Sandboxed CO iframe. But it’s slower, so users click less

Souders: why use RAF instead of Long task?

Amiya: because of IE11 and lack of polyfill for LT.

Todd: Long frame were not reported by the Long Task API?

Amiya: yes, we had some with many short tasks resulting in missed frames.

Tim: The browser should try to schedule rendering between those short tasks

Scott: Not really shipped

Yoav: should implementation and scheduler solve this? Or do we need separate reporting for Frame timing and LongTasks?

Tim: we need a solution the differentiates between long frame and LT reporting.

Ryosuke: Yeah, there’s room for both. You can have LTs that don’t impact painting and frames.

Yoav: yeah, long tasks and frames also don’t have the same deadline

Souders: Can you capture the missing data around early abandonment via an unload handler?

Amiya: Maybe if we use sendBeacon there, but we’d still be missing reports

Todd: We need a reporting API that is registered early enough and reports if we never reached some point in the page.

Yoav: so a “dead man trigger” reporting API

Todd: Something like that

Tim: Maybe we can use paint events as the default target, that if not reached, the report gets fired

Todd: What happened before the “dead man” trigger, that’s interesting to know

Souders: looks like there is a general agreement that we can’t measure when a user abandon. Unload is not registered early enough ?

Noam: Beacon is only 90% reliable

Ryosuke: sometimes the browser gets killed

Ilya: 3 to 4 problems: sometimes we can’t execute JS (and can’t register a DMT), but reporting API should help here. Crashes and OOM are specified in the reporting API

Yoav: It’s specified but not necessarily implemented/shipped.

Todd: This space for RUM is critical. Laying out the phases, web developer solutions and holes would be helpful to determine if we need new specs.

Ilya: In some markets we see 20% of users that don’t reach FCP, and it’s potentially invisible to analytics.

Tim: some users don’t even reach FCP (slow network regions)

Ilya: Even in the US, it’s in the low single digit %ages.

Ryosuke: maybe users clicked the wrong thing

Tim: That would’ve been consistent across regions

Ilya: we can’t capture the abandonment ratio. We use FCP as the signal. It’s hard to understand the exact reason

Todd: We all looked into it, so there’s probably something interesting to expose there

Noam: could the browser save context and report it next session ?

Ilya: that’s the use case for Network Error Logging

Tim: May introduce bias towards users that never registered NEL

Yoav: Yeah, but if registration happens on early enough bytes, that should be minimal

WebPerf @ Google Ads - Beng Eu and Warren Godone Marsesca (Google)

Slides, video

adSpeed team: trying to measure, understand and see the impact of ads on pages

We measure on ALL pages which have ads.

We are trying to make the speed of page with ads == the speed of page without ads :-)

WebPerf API Usage:

How do we use them:

We look at the speed of the device, and choose appropriate ads. Feeds to ML system. We use these APIs to reduce resource consumption. We try to lazy load ads via the IntersectionObserver.

Wishlist:

Tooling:

Challenges

Measurement is easy, but using the data to improve is hard.

As a user you want to see your content sooner, but don’t care about empty rectangles

As a advertiser: capture user’s eye

As a publisher: maximise ad revenue

As a tech provider: balance satisfaction of users and advertiser ad publisher

Real world ad publisher priorities:

Content could load so fast that user scroll and don’t click ads

Wishlist API:

Data experiment:

Latency abhors a vacuum:

Paint/Element Timing of Pages

Resource Timing

Long Tasks/CPU

Self Profiling

Resource Hints

Thread Scheduling

Beaconing

Memory Pressure/Leak

Questions:

Scott: Scheduling talk this afternoon. Will talk about scripts yielding and getting back control. There’s also a proposal for isFramePending.

Noam: are you considering script abortion?

Scott: More about chunking work and yielding at certain point

Yoav: abortion would mean that script would have to opt-in to be abortable, which 3rd party won’t be incentivized to

Noam: We have a use case for that, but agree that 3rd parties may not

Tim: Risky to have scripts killed at any point with no chance to cleanup state

Todd: Work Scheduling: there is a token to allow the ability to abort something in the queue

Tim: Before it started

Todd: yeah

Scott: with postTask, you can cancel tasks.  yield is different: there is no way currently to kill something in the queue. Yield returns a promise. Browser will resolve the promise.

Todd: is it like async/await but with a different api ?

Scott: correct. Yield returns a promise that resolves once the scheduler returns control to the script

Todd: so that API won’t support being cancellable

Scott: yeah, as currently proposed it won’t

JavaScript memory API - Ulan Degenbaev (Google)

Slides, video

Ulan: work on v8

… this proposal is trying to take in all the learnings from previous attempts

… common thread and challenge is privacy

… we collected use cases

… Why should we measure it?Is memory pressure API sufficient?

… They’re different APIs and signals. Memory pressure API does not address the use cases of A/B tests or regressions; Heap size doesn’t tell you when you have memory pressure

… trade-offs: developers want complete picture, but full cross-origin attribution potentially exposes user data; implementers want easy and fast implementation.

… the complicated part is finding the “right” tradeoff

… our design is security-first

   … origin based security (originally wanted site based, but security folks convinced otherwise)

   … account only objects that the calling context can access

   … we don’t want to rely on mitigations like buckets/noise, delaying, etc. Also didn’t want to rely on site-isolation.

… given these requirements, we only account for JS exposed objects

… the proposal is to provide a promise based API (e.g. performance.measureMemory()).

… could provide an escape hatch for some security sensitive – e.g. throw an exception. Would also be possible to extend the same API to more types of memory (not just JS heap) as security allows. Limiting to JS objects allows to strictly define accounting in terms of Realms. For cross origin resources, they are either outside the JS heap, or we can run security checks on platform objects.

… API returns a result as well as a range in case precise calculation was too expensive. Can also provide optional per-frame attribution. (optional because it may be difficult to implement in some architectures)

Ryosuke: for same origin frames, how do we account for objects that move between realms?

Ulan: That could be implementation dependent, as some implementation may use an abstraction Realm in those cases.

Ryosuke: If you construct a function that has properties from multiple Realms, it’s ambigious as to what we need to report.

Ulan: The ambiguity here seems fundamental, because objects can move around

Ryosuke: Right.

Tim: Any security issues with just using the creation context and reporting on that?

Ulan: That would work for same origin, less so for foreign origins

Todd: Also, there would be runtime implications, as implementations would have to remember where everything was created

Ryosuke: May have changed, but I think Gecko does that kind of object decoration. WebKit and Chrome do not

Ulan: I think it’s a corner case

Ryosuke: I think it’s pretty common. People create objects inside iframes and then move them out.

Todd: Links across iframe boundary is a *really* common source of memory leaks.

Ulan: Different performance characteristics depending on context.

… Single JS agent with same-origin realms or web workers can just return heap size.

… For different-origin realms on the heap, more detailed accounting is needed.

… options for different-origin realms

… origin isolation, hopefully, becomes a thing across browsers

… If not, browsers can throw security error or rely on memory-allow-origin similar to TAO, or CORP policy

… They can also iterate the heap and only report what they can. Could be rate-limited.

… They could also account on allocation or do heap segregation

… We have an explainer on GitHub and discourse thread, feedback welcome

… The meta insight here is that we need to focus on security first

… Questions?

Tim: You report this as a range. Where do the lower and upper bounds come from?

Ulan: Can be the same value if the implementation is certain of the result. If the implementation is less certain, the ranges can help implementations report something useful without crawling every single object.

Ryosuke: Crawling cross-origin objects can be used as a timing attack vector.

Ulan: Implementations can fuzz that time, as it is promise-based

Ryosuke: Yeah, but those attacks can be amplified (e.g. by creating a 1000 iframes)

Yoav: yeah, but response time can be infinitely fuzzed

Tim: There was a slide saying we want to avoid such mitigations

Yoav: yeah, regarding the return values, not the time it takes to get them

Todd: Fair to say that the time can introduce timing attacks, so maybe worth while to indicate the response has to be at least as slow as some value

Ryosuke: We could also wait for next natural GC. It’s just a bit risky to trigger new work based on the request.

Todd: folks that would consume this API and mentioned memory, would this solve the problem?

Ulan: mostly were requests around leak detection

Ryosuke: regressions should be detectable via this proposal, as long as it’s a same-origin leak

Ulan: This will signal that there is a leak

Ryosuke: Yeah, you can measure over time from known-good-state and see if it keeps increasing

Ryosuke: Would that expose live memory or GCed memory?

Ulan: Live as we don’t want this to force GC. This makes sense for large scale experiments when you can get data from production. For local experiments, you can expose GC via a flag

Steven: In the lab, this is exactly what we do. Iterate the same thing and trigger GC to see the current live size. This will be able to give us an indication something is leaking, but not beyond that.

Ulan: Yeah, this can be an indication, but you need another solution to make it actionable.

Ryosuke: memory can spike up until points where GC happens, which can make it tricky to find the exact point of the leak.

Yoav: This API is not something that’s intended to be used every 50ms. GC will kick in at some point.

Ryosuke: Eventually, but it may be hard to detect small leaks that way

Todd: Can this indicate how much is a site consuming at a particular point over time, over many users? (e.g. “how much memory am I using at page load time”)

Ulan: yeah, in such cases GC timing should average out

Vikram: our app (teams / msft) is SPA in Electron.. Our app runs for days, so small leaks matter. Electron exposes this today. On the web we are currently blind. Something like this would be really really helpful

Ryosuke: would be interesting to see if folks find live heap reports useful at scale. If that’s good enough, that’s great.

Tim: Naively, can we wait until after the next GC?

Ryosuke: That would leak GC timing

Andrew: What about the last GC time that’s revealed K seconds into the future? So you can see only the GCed memory, but can’t tell *when* it happened

Ryosuke: We need to do something like that.

Ulan: You might still be able to force GC so that might still leak some data on that.

Steven: full memory vs live, ideally we would surface live as that would make our attribution much easier and require less data

Ulan: our hope is that at large scale you can average out the differences and A/B detect regressions and leaks

Ryosuke: Concerned that not measuring GC times will make it harder to detect leaks on the same session, as not-yet-GCed memory will make it hard

Noam: Maybe we can measure the number of reachable objects, instead of memory?

Ryosuke: You’d have to run GC for that

Yoav: or you could remember that from the last GC

Ryosuke: If an attacked can force a GC to happen, that can leak information

Yoav: Is the connected node tree a sensitive cross-origin info?

Ryosuke: Yes. afaik, browsers don’t isolate heaps today so exposing GC times can leak data. You’d be able to infer how much memory was used by another cross-origin frame by triggering GC on your frame.

Ulan: we can report in regular intervals, no?

Ryosuke: Yeah, but fuzzing can be amplified away

Todd: there is existing API that this builds on, correct? And this will be a better version of that.

Ulan: This is significantly better, as right now performance.memory only reports global heap size, so you don’t know which frame the numbers are coming from

Ryosuke: Have people successfully detected leaks with that?

FB+Google: yes, we have examples where existing API helped identify major issues

Todd: Is that just leaks or memory usage growth?

Tim: A lot of that is leaks.

Ryosuke: great info!

Ulan: Also detected leaks in Chrome itself between different versions

Yoav: there is also the memory pressure API we’ve discussed before. We also talked about full process memory that maybe origin isolation primitives will enable (from a security perspective)

Ilya: Specific about memory pressure, 2 years ago there was a lot of interest. is there still interest?

Ryosuke: yes, we’re still interested

Benjamin: ditto

Todd: I think large properties said they are not so interested

Ryosuke: Yeah, folks said they are more interested in something like this to detect leaks. Memory pressure can be useful for games, where frame can free up memory to try to stay alive. It can also expose a different signal — e.g. uploading an image that causes pressure

Tim: Can’t they do that when hidden?

Ryosuke: App can still be visible when this happens

Nate: An API like the one proposed here is more useful than memory pressure. Memory pressure can be useful, but lower priority. Having the numbers from this can make memory pressure more useful

Ryosuke: Reporting API does report OOM

Todd: Correlating memory pressure to JS heap memory can lead you to insights. We could also include page visibility and reporting to tell website devs they are using too much memory.

Ryosuke: Yeah, right now developers just know that the user disappeared but don’t know why (unless they have reporting implemented)

Nate: We found multiple regression types with performance.memory and it does change how much time people spend on your site

 

Todd: Seems like the use cases are larger than the scope of this particular solution. Would be great to have a unified use case + solution doc

Ilya: Yeah, a Venn diagram of how this, memory pressure, reporting, etc all come together

Benjamin: this would be a Very Palatable™  way to start

Ryosuke: Previous API has massive security issues, this looks very promising.

Ulan: Regarding memory pressure, I wonder if JS weakrefs would resolve some of the use cases, as they allow to create caches that are not cleared on garbage collection

Ryosuke: We found that it’s not so much objects that need to be freed, but the way people use them (e.g. copying image to multiple canvases without waiting for GC and eventually crashing)

Ulan: But memory pressure would be an indication they should clear those weakrefs

Tim: In that example, you don’t want your canvasses to be weakrefs

Ryosuke: Memory pressure can also be related to background user activity. That can happen fast.

Main Thread Scheduling APIs - Scott Haseley (Google)

Slides, video

Scott: long tasks are the enemies of responsiveness (but you know that)

… goal is to break up long tasks

… be responsive to input, keep animations smooth, minimize app latency

… however, breaking up long tasks helps to a degree

….. Helps with responsiveness and visual jank, but doesn’t solve all the problems

….. You can have buttery animations but high app latency

… apps like maps, or frameworks like react build their own schedulers

… user space schedulers have priorities, have a task queue model

… a scheduling task will run one or more queued tasks

… for the most part such schedulers work well, but there are limitations where we can help

… e.g. it’s a challenge to know when to yield: doing it is overhead + regaining control

… one of the problems is that there is not a lot of incentives for yielding (when you have competing scripts, etc)

… similarly, if you break your long task but there are other tasks, you end up being delayed

… the idea with continuation is that it gives you incentive to resume

Noam: this would be really useful for “selection” use case in excel. Resizing a div based on mouse movements, while being able to handle more input events; inputPending would help as well

Todd: So the Excel use-case is triggered from an input event, and they want to rest of the flow to inherit the scheduling of that initial input event

Tim: How often would folks use setTimeout vs. rAF in those cases?

Andrew: I thought setTimeout in the slide just refers to some task.

Scott: Yeah, but rAF is an implicit higher priority as it’s a “rendering opportunity”

Ryosuke: In Chrome

Scott: “Rendering opportunity” is a spec quote

Ryosuke: yeah, but we don’t prioritize it. We have no scheduler. Rendering may happen faster as an artifact, but not deliberately.

Scott: we’re proposing scheduler.yield with optional priority

… returns a promise that is resolved when higher priority work is finished

… priority argument integrates with scheduler.postTask()

Andrew Clark: we try to do this in react today, We have our own scheduler and plan to try and do the same for non-React code by replacing setTimeout with our implementation

Thomas: another use case. A dynamic buy button that depends on 3P code that needs to run to identify which provider is best for the user; ideally this would be done in background

Noam: [excel use case] - long calculation would benefit from that and live on the main thread instead of moving to a worker. Would also benefit from a “is fetch handler pending” API

Scott: Looking into inheriting priority from network fetches, so you’d be able to check if high priority fetches are pending, or other high priority tasks.

Yoav: So “is high priority thing pending”?

Scott: Kinda

Steve Souders: Maybe LongTasks can also report what triggered them?

Tim: Fairly straightforward to add that and seems useful

Nicole: Frameworks including this would mean customers get this out of the box. That working in concert with the browser’s scheduler would be extremely powerful

Scott: priority is another common primitive across all the schedulers

… e.g. for maps you may have different priorities for stuff on screen, near the screen, etc.

… the problem is that schedulers can’t interact with other priorities

… AirBnB: has own scheduler, plus React and Maps, each of which has its own priority system

… we need a way to coordinate across schedulers;

… the idea is to provide a new set of inter-frame priorities

… introduce notion of tasks, tasks are posted to queues and each queue has priority

… each window has a scheduler — i.e. window.scheduler

… you can move across task queues

Thomas: You example of multiple frameworks with different priority queues, a single queue would work only if they share the same priority semantics, right?

Scott: Yeah. There would also be a difference between userland “high” and browser “high”. Would be good to figure out significant meaningful names

Plh: So you could change the priorities or cancel tasks?

Scott: yeah, you could move tasks between queues. Also, we want to create user-defined task queues, so that developers will be able to lower the priority of the entire queue (e.g. if a series of tasks is no longer important because the user did something)

Andrew Clark: the names are important, “high” vs “low” is an arms race and in the face of starvation, devs are doing the wrong thing. We prefer basing names on user experience signals, or use “shaming” names for things we want to discourage

Todd: is there a way to tell which queue / priority?

Scott: yep, scheduler.currentTaskQueue.priority

Ryosuke: priority inversion?

Scott: It’s different from the classic priority inversion problems. It’s still an open problem, propagating priority to subtasks might help.

Ryosuke: Introducing that without priority inversion makes no sense. User action changes priorities of things, and it can create head-of-line blocking issues.

Andrew Clark: That’s already the status quo, as there’s no shared notion.

Ryosuke: this is a blocker. You need to solve this problem

Yoav: Is the problem that the current task is blocking, or the new task that we haven’t yet started?

Ryosuke: A new high priority task can rely on a current low priority one

Scott: But tasks are not interruputable unless they are yielding

Yoav: So you want reprioritization of submitted tasks that haven’t started yet

Ryosuke: right

Scott: That’s already in the proposal

Todd: Ryosuke, are you asking for declaring dependency

… we did this in .net

… the first primitive is you need a relationship. Then you need an algorithm that bump up priorities for everything together.

Scott: We’d consider related things to be in the same queue

Ryosuke: things may not be initially related. Background tasks may become relevant due to user action.

Andrew Clark: Hiding a subtree may be a good example. We’re hiding it in low priority and then if it becomes visible, all related tasks become high priority. We can dig further into this use-case, and it might be possible to do that with queues. But need to think this through.

Yoav: Can’t we build that on top of task queues?

Andrew Clark/Scott: Maybe. Need to flesh out the use case here

Todd: Can we list the scenarios we’re not sure are handled and try to push this proposal to those boundaries?

Nicole: Edge cases are very welcome

<discussing asset management server example>

Nicole: May need some things to live in userland at first

Ryosuke: Important to communicate the dependencies. In iOS we have multiple daemons running tasks, handling dependencies automatically

Todd: And you’re saying that if userland can’t communicate the dependencies that will not be automatic?

Andrew Clark: React users will use a React API that will call into this. So would be good to start with core use cases, rather than try to solve all edge cases upfront

Noam: Another use-case is scrolling. Fetch blocks for current viewport, and lazyload the rest, but if the user starts scrolling, we need to bump up the fetch’s priority.

CPU reporting - discussion

Video

Steven Bougon: Users complain about salesforce being slow. Currently, they need to ask the customer to run Octane as a baseline of “Is the machine/browser ok?”. Could this be integrated or could an API be provided?

Yoav: Can reveal background activity, which can be a privacy concern.

Tim: Can be short-lived to avoid fingerprinting concerns.

Gilles: Wikimedia is trying to figure out whether their website is getting faster/slower normalized against hardware. Without some type of API exposing the speed of the hardware, it is very difficult to reason about this. Today, Wikimedia does this by using Network Information to segment the data.

Utkarsh: Customer asked for CPU of device to block 3rd parties. If low score from Geekbench, consider blocking 3rd party content to improve performance. UA string model number and www.gsmarena.com model enabled this to be done for some devices. This is only an approximation as battery throttling and core choice have an impact. We need to know the CPU on which the page was loaded, as opposed to what all CPUs are available on the device.

Todd: 3rd party content blocking in order to improve 1st party performance?

Utkarsh: Correct

Gilles: Benchmarks can vary on the same device by 2X on a device with the same UA string.

Steven: Today, Salesforce uses Network Information to reason about Network. A benchmark can be run on a web worker to benchmark speed. Can the same use case be solved by directly exposing information?

Yoav: ECT is derived from bandwidth and RTT. What are the equivalents for processing?

Benjamin: CPU temperature is highly correlated to actual perf. Load is another signal. Clock speed vs. max clock speed can also be a good signal.

Gilles: The browser can also keep track of parse times and report that.

Tim: It should be based on an operation that’s more consistent

Todd: And GPUs or new CPU types will make this more complex…

… Use cases I’ve heard are around categorizing runtime perf. Another related to device score for debugging purposes.

Yoav: Maybe an “I’m currently unhappy” browser API can indicate that something else is happening on the machine, interfering with the browser’s resources compared to a “typical” run

Noam: Excel is interested in learning when not to do something. If Excel can know how fast the machine is, they can choose not to pre-emptively run background calculations.

Noam: The thing Excel cares about is how fast the Excel operation can be completed based upon the time needed to complete the operation.

Gilles: But we also want to know how devices are related to each other.

Yoav: So buckets of “device types” and in them, “under pressure” indicators.

Utkarsh: We must ensure that privacy/security is respected if anything is exposed. Also, recording CPU-related info in RUM reports can help associate web performance metrics with the CPU profiles on the device, rather than the device names/models. That association can also help build buckets to classify a device as slow/fast/etc.

Yoav: It sounds like there are multiple use cases. Are any volunteers available to write them up?

Utkarsh: I’m interested in writing this up. I’d like help.

Steven/Noam/Beng/Gilles/Benjamin are all interested in assisting.

Metrics for Single Page Apps - discussion

Video

Ilya: Many of our metrics are focused around page load. We hear that there’s a gap. Analytics are doing crazy heuristics, instrumenting frameworks in order to get a better view of soft navs.

We should be able to abstract some heuristics. Some of our metrics were designed to fire only once and we use that as a security boundary. We could reset when soft navs are detected, but it might not work for everything. We have many “first…” metrics, but don’t account for the rest of the lifecycle. Maybe there’s a way to define soft navs and all the current metrics would just work.

Yoav: Don’t have to reset, but maybe we can have more than one navTiming entry

Nic: in SPA mode we don’t report onload event as “primary time”. We start our own monitoring using mutation observer, look into “important” things that are loaded into the DOM, and once they have quieted down and added to RT, we consider the page done. Similar heuristics for in-page navigations - “soft nav”. We listen to many framework specific events (Angular/React events, history state). Also look back to the last input event to see when activity started from the user’s perspective. Don’t measure things like FirstPaint, etc in that mode, as the browser doesn’t provide those facilities.

... Lots of negative side effects of monitoring all of these. Sub-resources/CSS cannot be monitored. Want better understanding of in-flight requests. Standardizing something in this space could allow dev tools and different RUM providers to expose the same things.

Yoav: 2 approaches: 1. Heuristics triggered by browser 2. App triggers a state change

Tim: Do we know what is a single page app nav? Seems like there is a lot of room between a small response to input and a full single page app nav. Should we consider exposing reactions to input to expose all interactions?

Nicole: Adding to that, Ilya mentioned it’s an antipattern to instrument frameworks. I think it can be useful for sites with similar stack to be able to compare themselves to one another.

Nic: Akamai allows the customer to control what is recorded. They can enable heuristics or they can trigger a start in their app.

Steve: Sounds like we should do everything and the downside is: we have to do everything. A big customer that signals the beginning and end of softnavs had a lot of pageviews that had page load time of 0. They instrumented a custom modal dialog as a soft nav and found they had exceptionally good time. But they wanted to measure that.

Nicole: Can also happen with prefetching.

Noam: A complex DOM might trigger layout which could increase 0 to longer even without network

Steve: There are always customers who want to monitor everything or only to monitor a bit. Can we enable both? We need to give them a way to signal soft-navs

Ilya: Does push state indicate a soft nav is happening?

Many: Push state isn’t sufficient

Nicole: Modern frameworks may not implement a push state update correlated with user update

Ryosuke: Sometimes push state is too much. He has a charting app that updates push state regularly but he wouldn’t want timing.

Nic: They have ~60-70% of users that opt in to push state. For many of the remaining users, push state won’t be sufficient.

Yoav: Having an explicit signal will enable us to evaluate heuristics against it

Todd: If push state is the signal and it covers 60-70% of the web, is that a good start?

Yoav: What happens with the others?

Steve: Could we start by asking developers to implement a start/end to declare soft nav?

Tim: On top of Event Timing, considering the addition of a promise that is completed when the event handling chain completes. Resolve when website author considers the nav done.

Ryosuke: Can this be tied to User Timing? For the chart app, the signal may cross input to fetch to paint. Maybe some state changes aren’t like navigations. They are state changes and have different characteristics from a load. It may not be possible to abstract this completely.

Yoav: Explicit signals can help us to kick off “navigation timing” entries as well as paint metrics after that.

Ryosuke: But maybe in some apps it’s different from a page load. Maybe the devs need to measure a subset of the current metrics for soft-navs. (e.g. the chart app and load event)

Yoav: Load events may not make sense but paint timing would

Ryosuke: maybe, but maybe not. In SPAs the metric may vary based on your app.

Tim: Devs want something out of the box that works for most, but maybe more control for the rest of cases. Maybe we can’t get it perfect.. Can we get something good enough for the 80% case?

Ryosuke: Should the things being tracked be a combination of Event Timing and Element Timing and Paint Timing? Maybe we need to answer “when is this element painted next time?”.

Nicole: This feels like it would be the new normal

Tim: I imagine ElementTiming for this usecase. But you want to set the right time base (e.g. the input event, not the page navigation). So maybe we need to be able to set the timebase in ET.

Steve: When a soft nav is triggered, a lot of RUM information is reset to track the new soft nav information. Fundamentally different than the navigation-based view. Sometimes resource timing analysis tied to soft navs requires this to understand heavy resource use/queueing.

Yoav: Maybe this is more related to resource timing dependency graphs than to soft navs?

Nicole: We wanted to start by instrumenting React. User interaction->DOM Change->done?

Yoav: Does react have a link between input and consequent changes?

Nicole: Not really.

Houssein: React is looking at wrapping component renders in User Timing

Ryosuke: Shadow root and resource loads from within the shadow root could allow understanding. Each component could have its own performance timeline.

Tim: But resource loads are global

Ryosuke: Yeah, but you could report it to the component

Yoav: You could, but that’s not currently the case. Theoretically, we could move to that model for RT as well, and the WC is itself its own navigation.

Nicole: Component can be very small or a very large part of the page

Ryosuke: For long tasks, that’s complicated as they are not tied to a component.

Yoav: Neither are resources

Tim/Nicole/Ryosuke: Both global view and local view are useful to understand performance

Ilya: Are those separate issues that must both be solved? Lifecycle of WC and lifecycle of the page both need to be measured

Tim: A flag in the global timeline would be problematic in cases where we have 2 independent “component navigations” happening at the same time.

Yoav: What would be the URL in that case?

Ryosuke: That would be a good case for component based measurements.

Noam: Excel workbook has multiple tabs. The URL stays the same but a soft nav occurs when the tab is changed. We potentially issue requests, change the DOM and expect to measure when all that work is finished.

Ryosuke: If another is clicked before the previous is done?

Noam: It will be considered cancelled. Also, this “component navigation” triggers changes in other page components.

Yoav: So is that a component navigation, page navigation or both?

Noam: Component navigation

Nicole: Played with ideas that if devs provided us about a dependency graph, we can better know what to load

Noam: For us it’s not a static tree, it’s totally dynamic

Steve: Components seems like a separate issue compared to SPA soft navs.

Ilya: Seems like there’s a lot of value in exposing “start” and “end” signals for developers, even beyond SPAs. Onload is not relevant, but there’s no way for devs to tell the browser what is.

Todd: We have that in User Timing and nobody ever used

Ilya: People didn’t use them because they were hard

Ryosuke: Libraries are not consistent in when they kick off pushState.

Todd: So something like that could work?

Ryosuke: SPA can vary and end time is app specific.

Tim: Would that navigation initiation be associated with an event?

Ryosuke: Not necessarily. Could be timer based.

Ilya: Youtube kicking off another video after you finished the first is another case.

Tim: In cases where input is the trigger, we can use the HW timestamp

Steve Souders: We shouldn’t try to find the heuristics here. Need to define an API first.

Nicole: Can devs mark the “navigation ended” bit now? Do they have that info?

Nicolás: Once the developer marked navigation end and another started, we need to reset the performance timeline.

Yoav: Dunno is we want to reset, or just distinguish them from the new ones

Yoav: User Timing names didn’t work because it didn’t do anything. If they impacted NavTiming and Paint Timings that would motivate developers to use the new signal

Nicole: Framework authors are motivated to show download impact. Currently only measuring the first load and not SPA navs which are supposed to be faster.

Steve Souders: Would be great if frameworks adopted heuristics, and users will get them automatically.

Nicole: Could indicate “next paint after this”.

Noam: Yeah, need that also on a DOM basis

Yoav: Element Timing can give you that.

Tim: Can we just create a new Performance Timeline?

Yoav: So an array of Performance Timeline?

Tim: Yeah, with a new time origin

Ryosuke: A little worried about N performance timelines.  

Tim: That would enable us to get first contentful paint again.

Ryosuke: so yes we need a new Performance Timeline. But there are security issues.

Todd: so what are the core requirements?

Ilya: it seems we need an API to mark start and end.

Nicolás: why do we need an end mark?

Yoav: to replace onload. We tell people to ignore onload, and have nothing better to give them. But maybe they don’t need to be coupled.

Ilya: over time we can provide better tools to get to the right end state.

Ryosuke: maybe in UserTiming have a flag to mark the ‘start’.

Steve Souders: don’t think we want N timelines, but know start and end times and when someone is at the end of the SPA soft nav and they ask for the total longtasks the API could take start and end into account and only give the relevant entries.

Yoav: getEntriesByNav

Nic: and that’s essentially what we’re doing today.

Tim: it needs to be easy to make the start time be the input event.

Ryosuke: possible with UserTiming.

Tim: does not get new FCP but maybe ok for first attempt?

Todd: isn’t that the whole point? We need incentive for developers to do this.

Ryosuke: painting cannot account for cross origin stuff.

Ilya: any volunteers for documenting the use cases?

Tim / Nic / Noam volunteer.

Group processes

Video

Steven: We may want to pre-allocate scribes. Multiple scribes can be useful to catch up for one another. Also for the meetings, would be good to change scribes halfway. Also, we can create a pool of scribes and randomize between them.

Otherwise, we could have Slack as a complement to the meeting to see that run quick polls.

Hangout chat is not great as it disappears after the call.

Yoav: Is there W3C slack?

plh: Yes and no. Some working groups are creating their own slacks

Ilya: Stuff in slack isn’t linkable.

Yoav: Let’s put this as something we can improve.

AI - Todd/Yoav look into an IM solution to augment the calls

Ilya: We could also delegate the meeting scribe to someone later after the fact.

Tim: We could keep recording and scribe’s job could be just recording action items, etc…

Ilya: That would help making scribe’s job considerably easier

plh: Automated scribing can end up with a lot of garbage and end up spending a lot of time cleaning up

Ilya: Has been getting many contacts to get involved. Maybe some people in the community are interested in getting involved

Todd: I’m happy to get Skype team involved who are working on automated transcription

Ilya: It’s really hard to capture all the nuances of technical discussions happening at the speed of conversation

Nicole: I find past transcriptions to be very useful

Yoav: I often complete the minutes from the video

Benjamin: Agree that minutes are good

Steven: We could also use a paid scribe that knows the terminology

Ilya: Other feedback we got are creating onboarding guidelines and templates for use cases.

Yoav: Who wants to get more involved and don’t know how?

Beng, Warren, Thomas, Sam raise hands

Ilya: I have an onboarding guide in my team, and it has been very useful.

Todd: What’s the ratio of time we spend discussing new ideas versus working on old bugs?

Thomas: Old bugs tend to get into browser implementation which are highly irrelevant to my day-to-day work. It would be nice to separate things that are more relevant for developers and ones that are more for web developers.

Yoav: Any issue could be relevant to web developers. For example, some issues might make it impossible to use beacon or resource timing can’t be used until some issue is resolved. It’s just that a lot of discussions involve three years of work.

Ilya: Maybe we could create more focused topic-based groups. We can still have a working group wide meetings but have separate topic based meetings.

Noam: What’s the current process like?

Ilya: The current process is to have biweekly meetings.

Tim: Overall, both design and triage are useful to people contributing the them. Overall, I felt the balance is reasonable between new features and issue triages.

Todd: That was my impression as well. I wanted to solicit others’ opinions about spending 20+ minutes discussing old issues.

Yoav: Agenda should be available ahead of time, so folks can know if they should join or not.

Tim: Maybe we can put more exciting stuff at the beginning and let people leave as we get into more boring issues.

Todd: There is a lot of potential for status dashboard. The working group has a list of standards we’re trying to publish, and each standard needs a set of tests. At a high level, we have a large checklists that we have to complete as the working group. Would be great to make all of that linkable.

Benjamin: Operationaly, I think this working group is functional and in a pretty high state.

Plh: Great to hear that from Mozilla

Ilya: Thanks all!

Yoav: And see you at TPAC!!