WebPerfWG design call - July 10th 2019

Participants

Alex Christensen, Maxime Villancourt, Yoav Weiss, Todd Reifsteck, Benjamin De Kosnik, Andrew Comminos, Will Hawkins, Ryosuke Niwa, Phil Walton, Tim Dresser, Nicolás Peña, Nic Jansma, Eric Lawrence

Admin

Next meeting: Aug 1st 11am PST

Gargantua, the Working Group dashboard project - Philippe

Demo: https://www.w3.org/2019/07/webperf-example.html
Philippe

We have a lot of data, but you have to know where to look
Gargantua is about exposing this data
Needs

Public page for the rest of the world to see what we’re working on
A page that’s useful for us: GH issues, spec status
For the chairs - with more details

Underlying data: https://www.w3.org/PM/Groups/browse.html?gid=45211&query=active-specifications
Also links to wpt.fyi, but ad-hoc way to link directory to a spec. Trying to make it clearer
Done on the client, so risks DoSing the related servers

Plan to encourage folks at the Hackathon to optimize this

We can play with it with an API key. W3C accounts can also create their own API key
Continues to iterate over this, please surface data needs
Nicolás: way to see issues?
Philippe: yes, with a 24H delay, because we crawl GH every night, to avoid hitting limits
Todd: Wanted to show numbers and link to GH, rather than displaying everything
Yoav: possible to generate on the server?
Philippe: yeah, talked to our system folks
Todd: Looks really good. Only question is how fast can we have it?
Philippe: Code is on GH: https://github.com/w3c/gargantua

Created own framework, because it’s lazy loading the data, in order to avoid overwhelming the browser

Next steps - Hackathon before TPAC

All the tool folks will be there

Upload Compression - Andrew

Andrew: Idea informally floated - talked about adding gzip to JS profiling

Folks suggested breaking out the compression format
An interface for compression algorithms: gzip, brotli, zstd
Use-cases for complex apps as well as analytics vendors

FB saw large reliability improvements from client-side compression

Why not a polyfill?

WASM was too large and not as fast as native. Requires data copying
Also UAs already ship this, so why not

Requirements

Async
Uniform interface, not codec specific
Avoid extra allocations and copies
Throttle for low CPU devices - in control of the developer

Ben: Way to query what compression algorithms are available?
Andrew: yeah should be feature detectable. getCompressorType or something like that makes sense.
Andrew: Proposed API includes entry points through Streams and ArrayBuffer. To avoid extra allocations, the ArrayBuffer input transfers outside the script’s control, so we could reuse the same input allocation, which is useful regardless of codec.
Philippe: Are you aware of past proposals on WICG?
Andrew: didn’t know but will look into them.

Past discussions:

Philippe: also, did you think about video compression as well?
Andrew: We could… not familiar with WebRTC, but this could be leveraged for something like that
Todd: The dependency here is that not all APIs currently include streams, but they’d need to support it to benefit from that
Andrew: There’s also an ArrayBuffer entry point. We might also add a DOMString entry point in the future
Todd: Want to avoid web sites doing wasteful copies. Stream help devs do the right thing.
Andrew: Right. That was the motivation behind making the ArrayBuffer transferable. Devs would need to deliberately copy to shoot themselves in the foot.
Todd: Like it conceptually, but concerned it will cause footguns in the real world.
Andrew: Maybe we should name it to make it explicit that the ArrayBuffer is transferable.
Ryosuke: Primary use case is to send this back to the server, why isn’t this an option in Fetch?
Andrew: Wanted to make it general purpose. There are cases where you want compress localStorage, etc. Streams make it very easy to compose, also into Fetch.
Ryosuke: But now everything is single threaded, you have to go back to the main thread
Andrew: Not necessarily. You may need a sync spot on the main thread to spawn it off.
Yoav: why?
Andrew: Actually, readable streams should enable passing compressed things to fetch() without going through the main thread
Ryosuke: Oh, so in the example you’re waiting, but it’s not really needed.
Andrew: Yeah, we’d only need to await in the ArrayBuffer case
Ryosuke: still better to do this in Fetch
Yoav: We have use-cases to have that separately - enables flexibility on client side storage as well as mixing payloads that require different algorithms
Ryosuke: I see your point. But how does the decoder know how to decode? Headers?
Andrew: The server will have to interpret it and decode on the server side. No headers involved. So the server would need to always expect certain codecs, or insert headers manually. Doesn’t seem far-fetched that you’d want to always assume compressed data.
Ryosuke: That assumed you’re writing your servers. Standard servers won’t help you. You’d need to manually deal with decoding.
Yoav: Sure, but no standard server right now decompresses the payload (e.g. if you use content-encoding on the upload).
Nic: You can convince Apache, but it’s not on by default
Yoav: Yeah, so it requires some negotiation between the client and the server. Whereas here if the developer knows the backend collecting the data, handling the payload is their responsibility.
Nic: Want to add my support for this. Use a lot of CPU and ship code to compress RT and UT data. The browser can better do that: either stream based approach of a Fetch option
Todd: What ryosuke says is that server will eventually support this, so value to explore this. No reason for that not being the default.
Andrew: There’s value in exploring fetch as a separate topic. useful to have both.
Todd: useful to have the ability to compress separately. Maybe also useful to have a unique string that represents the input and is standardized and reusable, that goes with the data and informs the decoder how it was compressed. (regardless of decoder on the server, in localstorage after version changes, etc)
Yoav: that seems useful
Andrew: sounds good. Many use cases where one would vary the compression format?
Yoav: There are cases where you can invest more processing power in some parts of the stream (e.g. brotli 11), but other parts you want to compress on the fly. But could be tackled in separate streams that are combined.
Andrew: Yeah, making it general enough should be good enough
Ryosuke: some compression algorithms may be more efficient than others. A single algo does not cover everything. Maybe need something like Web Media where there’s a preferred algorithm based on compression efficiency, power efficiency.
Andrew: Sounds good. May want to provide abstractions across algorithms, or to have each codec take its own “bag of flags”.
Yoav: bag of flags is probably better, as there are many knobs involved beyond compression level. E.g. in order to implement SSH with gzip you need specific flushing parameters.
Yoav: Also, dictionaries. Brotli and gzip can have external dictionaries that can help reduce compressed data for specific formats and use cases. Downloading a dictionary can help upload transfers.
Andrew: Definitely thought of external dictionaries. Thought it would be tackled by “bag of flags”. Not sure how to persist them.
Yoav: Yeah, I thought of user land persistency.
Andrew: Anything beyond “bag of flags” for this use-case?
Yoav: Not sure. It’d be a large bag...
Phil: For the RUM/analytics case, making it async would be a burden and not usable in beforeunload. An option to Fetch would be better along with keep-alive, letting the browser do that work off-main-thread.
Yoav: There’s a real extensible web play here, we need to provide the primitives *and* integrate them into fetch()
Ben: around compression levels, we would prefer to pick the algorithm and its parameters.
Andrew: bag of flags it is!

Element Timing and text aggregation - Nicolás

https://docs.google.com/document/d/1WWFxpuLpbMs3Jri1b2Y6O0rNldv0FAFcMDe-IGGuQ54/edit
Nicolás: wanted to talk about the way we tackled text aggregation

High level problem: ET wants to expose “important” text content, but since text nodes are not elements, we need to specify which text nodes belong
Considered several approaches and ended up with aggregating a text node to the nearest containing block ancestor

Notion of depth: arbitrary, links increase depth, but shouldn’t have that impact
Top level elements: high maintenance
Phrasing content: could work, but harder to implement, so containing block was chosen

*shows examples*

Todd: notions are standardized, right?
Tim: Yeah, also devs know blocking elements way more than phrasing content
Will: If a dev wants to learn something on a specific text, can they wrap it and annotate it? Is that the idea?
Nicolás: block level element would change layout
Tim: we should think about that, that’s a great question
Will: want to see if devs can specify an override
Yoav: would a span work
Tim: currently it won’t work, we tried to avoid reporting things multiple times, but maybe we should. We need to figure it out
Nicolás: I’d think that the problem is often the other way around. Text nodes are generally very small. Not sure if this is a real problem, but worth looking into it