Nic Jansma, Mike Henniger, Yo'av Moshe, Yoav Weiss, Patrick Meenan, Alex Christensen, Giacomo Zecchini, Alon Kochba, Amiya Gupta, Carine, Sean Feng, Michal Mocny, Benjamin De Kosnik, Abin Paul, John Engebretson, Marcel Duran, Timo Tijhof, Steven Bougon, Ian Clelland, Tim Kadlec, Andrew Galloni,
Yo'av: Working for CloudFlare, part of acquisition for Zaraz
... Talking about something we started called Managed Components
... Managed Components are a way to redefine how third-party tools work online (and not online)
... How third-party script tags work
... <script> tag into HTML
... Performance: Another server we need to connect to, the script can do anything
... Also a problem in terms of security
... Can collect things it's not supposed to collect and could cause Privacy issues
... URL itself could include identifying information, collected as part of analytics tool
... You can't know, as the data leaves the browser and goes to a remote server
... With Zaraz we tried solving this with an attempt to reverse-engineer ~50 tools, try to figure out what they're doing and what information they're extracting from the browser. Replicate that server-side via CloudFlare Workers.
... Collected that information once. If you have avg ~21 different tools all collecting information (page, title, screen res), instead of each of them running JS and sending a request, we collect once and manipulate on the server
... This works until we join CloudFlare, since we moved from 100 users to 9000 websites running Zaraz
... As we move to more websites we hear requests for support of more tools
... We realized we needed another way to do this (instead of reverse-engineering everything), shift the power dynamics and ask vendors to write better tools
... For existing scripts, even before you start, you're already asking for another request from the browser to fetch the script
... Managed Components:
... Benefits include:
... Today each tool has their own event collection API
... With this, each component hooks into the events they care about and the tool dispatches
... Can define specific routes, and have different behavior for each route
... SSR embeds, i.e. twitter embed on articles, all of this can happen while the response is being sent to the browser. Appears to be first party. No layout shifts, since it's not loading async later.
... Client events is standardizing events like scroll, view, click. Cross-browser and cross-platform. Could run on a mobile or desktop application.
... Pre-page rendering actions, because they run on server, you know about the page view as part of the request from the browser instead of later when the browser loads the analytics and sends a beacon
... Tool to manage consent better
... Example of component
... On page view, event gets dispatched and the callback fires a beacon
... Runs zero JavaScript on website
... Server-side capabilities
... Managed Components require some type of runtime (manager)
... Manager API:
... Manager is the environment this gets executed
... Enforces user configuration:
... Could be implemented in different ways:
... Working on an open format and runtimes, with other vendors:
... Pushing forward WebCM, our OS reference implementation of Component Manager. Works as a proxy, and you can run Managed Components on your site now.
... We are taking 10 different components in Zaraz and open-sourcing them too
... FB analytics, LinkedIn, Twitter, etc
... Vision:
... Scanned top sites in US, ~21 third parties, Lighthouse score was reduced by 41% when third-party tools were included
Yoav: You were talking to third-party providers about integration. Did you get any pushback on lack of access to Client Storage, or is that part of permissions you're allowing some scripts to access
Yo'av: localStorage, cookies, etc?
Yoav: Yes
Yo'av: There's a client object, that represents a visitor to the website. You can get/set on that object. We're not revealing cookies directly, but a key-value storage. More transparent to the tool itself.
... Can also get information by another tool, e.g. you can get the Google Analytics identifier
... In general, the approach is there's fine line, to abstract some of those browser APIs, often because they're meant for backward compatibility, and for the sake of cross-platform
Nic: Thinking how this can be applied to a RUM analytics provider, trying to measure interactions inside the page. How does something that needs to run inside the page to get the data
Yo’av: The component would put an event listener on a thing and the manager collects the data and sends it to the manager, who’d dispatch it
Nic: Complexities in adapting existing tools, probably. Tools have their own unique way of collecting data. Part of the challenge in providing a compat layer
Yo’av: Definitely a challenge, we try to keep it similar to how it happens in the browser
Yoav: If a component needs direct access to the DOM e.g. to read DOM structure or sizes, is that something that's not possible?
Yo'av: Good question, we don't have a specific answer yet. If we hear the request once, from one vendor, maybe they need to request higher root permission. But if we hear the request more than once, we have to figure out how to do something together.
... With session recording tools, you need to have the root permission to run JavaScript in the page. They work better as a managed tool, the overall JavaScript the browser executes is less.
Yoav: From my perspective this is interesting for multiple reasons, I agree with the premise that the way 3P are loaded today is interesting for security and performance. Interesting to standardize API you're working on. Maybe this Working Group could be a good fit, or Sever-Side Web Interoperability CG (WinterCG).
... Ideally eventually standardizing the API would be interesting.
... The other angle of this is that we'll find out 3P will need a new API, maybe there's a new web API that the manager could handle those registrations and execute that on the client side in a way that's better than what we have today
Alex: Seems very ambitious but interesting
... If I was an analytics provider that loads scripts with a <script> tag, has full permission and gets the information that I want, what incentive do I have to use this new thing?
Yo'av: We're seeing interest coming from vendors, because they're seeing interest from their customers. For analytics, they're not making money from actual data, but from their customers paying for the data.
... Managed Components just gives you more control over your information
... Vendor can tell from their customers that there's a better way of running their tool, that it's the fastest possible, more secure, private
Patrick: Timing wise, it's a good time as browsers start doing more privacy partitioning of cookies, network stacks, IP anonymization etc. Are there controls on module/serving side as far as what gets set to the origin. For example, Safari/Firefox are starting to do IP anon proxies to protect IP and other information. In theory that privacy information could be going out to third parties that was trying to be blocked by browser.
Yo'av: This is the Component Manager's responsibility. For IP addresses, let's assume the real IP address gets revealed to the manager (run by first-party). The manager can say anonymize IPs before forwarding to the different tools. Depends on each manager, outside of the specs.
... Tools are expecting specific schema for each event, up to the manager to fulfill the schema.
Patrick: Feels like an area that might be right for standardization, before it becomes a battle
Yoav: I could see browsers anonymizing all requests they're sending so even the manager isn't aware of the IP
Timo: Also touches a bit on the GDPR cookies, does this manager know what the client told what to include and not include?
Yo'av: We are working on integration with some consent management tools. We know what tools are writing cookies. At this point we're not doing anything more automatically. Could be something the Manager takes care of?
... Components could provide hints on what's required when its' run
Amiya: When reporting RUM numbers to leadership, one factor is that browser launch can be a significant factor that affects many webpages
... Page A when it doesn't get browser launches, vs. Page B (email links, start page, etc), Page B can be reported as much worse as the browser is also launching
... Page and browser are competing for the same resources
... Proposal is NavigationType could be extended to also have useragent_launch in addition to navigate, reload, back_forward, prerender
... Can then factor out launches if needed. Apples to apples is just for Navigates.
Nic: Do you have an example of the extent for how browser launches affect it
Amiya: In P75/P90, it can be a number of seconds difference
Yoav: Do you have a sense of what it would mean to specify this? Right now the existing ones map easy to HTML concepts, but we probably don't have something specific for launch
Amiya: We have an Origin Trial in Edge
Yoav: Does it include opening a browser window to a new window? If it's re-using a browser process?
Amiya: Not if it's sharing the same process, but if it's a new profile, etc
... PWAs is another scenario where the launch is its own app
Yoav: Interesting to talk to HTML folks to see if this is something we can easily wire up, or worse-case hand-wave around it
Patrick: Is there a differentiation between the launch (from command-line or intent) and user opens new browser and immediately clicks bookmark while contending with background start
Amiya: The intent is for the second case not be counted as a useragent_launch. There could still be contention, but not included in the launch.
Timo: Most CI, would that affect it?
Patrick: For WPT, it waits for the browser to be idle before launching the navigation
Yoav: The goal of RUM collection is not to collect it from the lab, so I'd consider this being an edge-case
Patrick: The other place is things like Chrome Custom Tabs as a replacement for Web App Views, and how they fit into this model, as you can pre-warm the browser before navigating.
... Knowing your page was loaded as a launch from web view, and you're the first one to "warm up" the tab is probably useful data.
Abin: Give direct signal from browser about which resources are blocking
... Adding a new renderBlockingStatus to ResourceTiming object
... "blocking" or "non-blocking"
... Previous work:
... Additional previous work:
... Needs a few spec changes:
... Would it make sense to change to a boolean?
... Nic: Shared example in Chrome traces with 4-5 different states. Why not go with 5 sets of states
Yoav: Fetch boolean was added for blocking="render" attribute, and only the very first value represents render blocker resources, where others are parser blocking or running execution in the middle of other things, which isn't strictly render blocking
... Doesn't take a value that prevents browser from rendering unless it's blocked doing other work
... Preference to align on boolean value. Previous values represented other forms of blocking.
... WebPagetest which uses these values, only translates them into a boolean. They didn't find use for the 5 enums, they just use blocking vs. non-blocking
Nic: Seems like a very straightforward way of exposing useful information and as a RUM vendor it's something I would like to pick up