Participants
Nic Jansma, Giaxomo Zecchini, Dan Shappir. Ian Clelland, Hao Liu, Michal Mocny, Neil Craig, Amiya Gupta, Patricija Cerkaite, Leon Brocard, Andy Luhrs, Sia Karamalegos, Carine Bournez, Patrick Meenan, Sean Feng
Admin
- Next call - July 20th 1pm PT (!!)
- (skipping July ~4th week)
Minutes
Recording
- Nic: dug into the timeOrigin attribute in the last few weeks
- … use a lot of the performance APIs and need to convert timestamps into wall time
- … When reviewing the JS code for boomerang.js, found it uses timeOrigin as well as performance.navigationStart interchangeably
- … would try to move to timeOrigin going forward, but wanted to talk to the group about it
- … The first NavTiming included performance.timing.navigationStart - large number based on epoch distance (in milliseconds???)
- … Deprecated but still shipped everywhere
- … DOMHighResTimestamp from performance.now() and other timestamps gives you a timestamps in milliseconds from the navigation start
- … So we can do math with them
- … Then we added timeOrigin, potentially with higher resolution to navigationStart
- …
- … Basically using timeOrigin where available and navigationStart when it isn’t
- … Moved to timeOrigin and got a lot of issues with their test suite
- … In some cases, timeOrigin and navigationStart show different distance from epoch
- … In Firefox/Safari they could be +- 1 ms (maybe due to rounding)
- … In Chrome there’s a difference between them because timeOrigin is more precise
- … Seems like these numbers should be the same
- … But can they be different?
- … We’ve said in the past that they should match, but looking through the spec they may be different when the browser is launched from a blank slate
- … Unclear if the spec allows them to be different
- … Also, seen browser bugs around all of these
- … Drift in timeOrigin in Chromium, resulting in it different than what you’d expect
- … Firefox has a different issue where the timeOrigin was shared between different tabs in the process
- … Safari has an issue where timeOrigin changes over time, where it drifts over time. Many days after the page was open, it created quite a gap from its original value
- … If there’s an intentional difference between timeOrigin and navigationStart, how can the navigationEntry’s startTime is 0? Some logical gap there
- …
- … RUM can’t use timeOrigin at all ATM, which is not ideal
- … Should timeOrigin and navigationStart be the same?
- Yoav: In my opinion they should be the same
- … Any discrepancies we’re seeing is that no one cares a lot about the deprecated API
- … On the implementation side, there’s no energy looking at the deprecated API
- … Talking to Noam, it seems like all issues you’re outlining are symptoms that we have two different clocks that are inherently misaligned and we’re trying to align
- … Wall-clock that is NTP corrected, or a user could change, that is one clock
- … Tied to some form of external reality
- … Then we have a monotonic clock that is counting the seconds that the browser sees, render process sees
- … Depending on the implementation it does different things
- … Sync’d once render process starts (or when window is created?), synced to system clock and gets timeOrigin value
- … Could get out of sync
- … Because hibernation, etc
- … The Chromium bug we changed syncing from every time a window is created to once when a renderer process is created. That created huge drifts, because render processes can be impacted by sync on timeOrigin
- … Any change in syncing frequency will impact that drift
- … One symptoms is that navigationStart and timeOrigin were further apart
- … Essentially navigationStart timestamps are constantly syncing or point-in-time than timeOrigin
- … If this is indeed a pain-point, that seems fixable.
- … Question is why does it matter
- Nic: In our RUM script we prefer to always use the same monotonic clock. If the system clock changes, that doesn’t matter to us
- … At other times, we’re sending wall clock timestamps just for convenience
- … Every time we do that, we have a mixture of timestamps from both clocks, and they drift
- … Given 2 timestamps that are different - why are they different and which one should I use?
- … If the answer is “timeOrigin”, we need all the browsers to fix their bugs
- … e.g. in Safari timeOrigin drifts
- Yoav: If we were to fix navigationStart to have the same sync points as timeOrigin, and timeOrigin is not sync’ing mid-way (like in Safari), we had well-defined sync points in the spec, everywhere?
- … If those two things are correct, could you ignore wallclock
- … Use timeOrigin as your anchor to reality?
- Nic: We never compare to Date.now(), so it’s more about having a consistent stable reality. Even if there’s drift that’s fine. Just need the timestamp to be stable
- … Some of the reasons we’re seeing this is code written a long time ago that’s using a mix of those different APIs, resulting in us seeing the drift in actual measurement
- … Chrome’s navStart and timeOrigin are the same. Other browsers see larger drift
- … Open issue for Firefox, none for Safari
- Sean: can check the Firefox bug
- Nic: Was able to repro in an older version. May have been fixed
- … May not be an issue for other folks
- Nic to follow-up with Safari on bug: https://bugs.webkit.org/show_bug.cgi?id=258572
- Ian: Had 2 different mitigations in NEL, but it turned out that subdomains were able to send success reports when they shouldn’t have.
- … An attacker who could hijack DNS reports, and also inject a NEL policy, could set a policy with a success fraction. Then, when the DNS issue is fixed:
- 1. All reports *including success reports* are "downgraded" to DNS errors, because of the IP change, and
- 2. Subdomain policies are only allowed to trigger DNS error reports, so
- 3. The success report is delivered, masked as a DNS error report.
- … Fixed in the spec, still an implementation issue
- Ian: This came up as a result of a bad reference, because trailers were removed from the Fetch spec.
- … NEL’s processing algorithms currently get the headers from headers and trailers
- … This was never actually implemented anywhere
- … Also, because trailers aren’t supported elsewhere, that’s fine
- … option 1 - remove trailers from NEL spec and then implement request and response headers
- … option 2 - remove request headers and response headers
- … option 3 - implement with full trailer support
- Neil: On response headers, I wasn’t aware that it wasn’t implemented
- … thought of using it to denote which HTTP edge service was active for a particular network report
- … serve based on geography, but that changes over time
- … Would love to not have to correlate NEL reports with time to figure out which edge servers were used for a particular request
- Ian: Definitely makes it easier to justify the work
- … bumped into that feature accidentally, would be a good use case for us
- … Not sure which response header needs to be specified to indicate an edge server
- Yoav: Trailer support was removed from Fetch, but Server-Timing has specified trailer support (implemented in Firefox).
- … Worthwhile to see what the situation is there
- … Spec has trailer support with some implementation backing (in Firefox)
- Ian: Not sure if Lucas is on the call, but had mentioned Noam had removed that section last year
- Yoav: Maybe we fixed the spec problem but made Firefox non-compliant in the process? Not sure
- … I know Fastly were keen on trailer support for Server-Timing
- … If implemented and used in the wild we should have it be part of the spec
- … Though for Fetch a single implementation is not enough, and it would have to be monkey patched
- Ian: Mentioned in HTTP spec
- Patrick: On headers for in-house vs. CDN is server IP reliable? Available for a lot more failure conditions than headers would be available. Only work for 4xx HTTP code, where edge was reachable but origin wasn’t. Server IP should be there for anything but DNS.
- Neil: Ideal world would have both. Marginally be easier if both were available.
- … One of the things that cropped up is we see a lot of abandoned events, if we run our own CDN vs. commercial CDN, and want to know more about what’s causing that
- Patrick: Useful in general for finding which edge node, e.g. Anycast situation, but kinds of errors NEL is available to report skew heavily towards not being able to reach the server or edge.
- Neil: Our NEL data is the other way, abandoned is a large chuck. ~90% abandoned and unknown. They’re tricky to take action on them.
- Patrick: For abandoned you don’t get headers, maybe? Started to respond and didn’t finish responding? More likely didn’t even get to respond.
- … Could be valuable to have both when debugging since they’re both hard to figure out
- Ian: Sounds like some support for adding response headers to NEL
- Neil: For us specifically, request headers isn’t useful, response headers are
- Ian: As far as trailers I can’t tell if they’re useful, only in Firefox (which doesn’t have NEL), I suspect it’s best to just pull that out of the spec text for now
- Yoav: Unless there is actual implementation interest to implement it, it should just be removed from the spec
- … If and when there’s interest, we can just add it back
- Ian: Should all these features be configurable independently of each other? Seems like a good idea. Unclear where those configuration goes
- … No current special headers for each one of them. So we could have a special reporting name for each of these?
- … Or maybe something else for reporting configuration in general
- Neil: Might be worthwhile to do something similar to what we’re talking about at NEL filtering, where you filter reports you don’t want to include
- … The vast majority of reports we get are deprecations, which are being ignored
- ... so could avoid receiving them
- Ian: The report-to header defined the endpoint and other headers defined what goes into the reports
- … putting the filter on the reporting headers makes sense. Not sure that the same is true on the reports
- Neil: There isn’t a deprecation reporting header though
- Andy: Looking at sampling recently, some overlap there. Zero sampling could handle opt-outs entirely
- Yoav: I’d love to come back to ignoring deprecation reports.
- … Deprecation reports are telling you that you’ll have breaking in X months, and ignoring means you will have breakage then
- Neil: Org challenges where engineering teams are detached from our reports, but may have their own ways of tracking this
- Yoav: No way to translate those reports you’re collecting into alerts on their end
- Neil: Not automatically. Mechanism is I raise an alert with our level1 response team, and they can raise with appropriate team, but that may be noisy.
- … We foot the bill for the header deprecation reports as well.
- … Would be nice to be able to control