Generic model for causal attribution (non-sampling based) #89

patrickhulce · 2020-11-18T22:16:20Z

Several issues and previous design documents have demonstrated a need for developers to be able to identify work in order to fix the issue. Proposals thusfar have mostly focused on what script is currently being executed as opposed to what script is ultimately responsible for the long task occurring in the first place. I'd like to propose a generic model for attribution based on causality instead of sampling.

I discussed this proposal and the difference between these approaches at a WebPerfWG 2020 TPAC session. Recording Slides

Brief Summary of Benefits:

Provides unique insight not already available via the JS Sampling API.
More intuitive starting point for developers to investigate on pages with varied authorship (easily identifies third-party sources).
Does not require heavy _intra_task bookkeeping.
Proven track record for matching developer intuition in the Lighthouse project.

Very Rough Implementation Description:

Terminology:

initiating invocation: a specific invocation of a web API that schedules a new task
- Examples:
  - setTimeout
  - fetch
  - addEventListener
causal task: the task that was ultimately responsible for another task's existence
The causal task of any given task is the result of traversing the tree of initiating invocations until a task is reached that was the initial evaluation of a script resource with a URL.
- Generate a numeric identifier for each main-thread task and initiating invocation
- Maintain a map of initiating invocation ID to the causal task ID
- Upon future tasks scheduled as a result of an initiating invocation, associate any new initiating invocations with the same causal task ID as the current task.

Questions

How can the Lighthouse project or me personally help support this effort? :)

npm1 · 2020-11-20T14:58:21Z

Several questions come to mind:

If we wanted to expose this to RUM, what would be a sensible security model to do so?
Is it possible to compute this efficiently? (I see you mention no heavy intra_task bookkeeping, not sure I follow).
Do we have customers lined up to try this? It sounds like this feature is complex enough that it may benefit from an Origin Trial in Chrome to ensure we get the right API shape and implementation.

Thanks for the great talk, by the way! It's exciting to see some novel ideas on the long-standing problem of longtasks attribution.

patrickhulce · 2020-11-20T17:39:49Z

If we wanted to expose this to RUM, what would be a sensible security model to do so?

The brief justification for why this doesn't expose new information is that one could feasibly create pages that optionally include/exclude scripts from other origins and observe the delta in long tasks to identify which script caused which long task. The fact that this is burdensome is the problem developers have today (one must run expensive A/B testing in order to learn this information).

Is it possible to compute this efficiently? (I see you mention no heavy intra_task bookkeeping, not sure I follow).

The intratask comment is to highlight the overhead of this approach in contrast to a sampling approach where overhead of implementing a sampling profiler on a JS engine is non-trivial. With this approach only a select few JS APIs that already kick out to browser scheduling require lightweight instrumentation. A single unsigned int (maybe a long?) needs to be kept per toplevel task until a chain is resolved, but I imagine allowances should be made to evict tasks in the far past to maintain a low memory impact. The listener component to this is probably the heaviest part.

Do we have customers lined up to try this? It sounds like this feature is complex enough that it may benefit from an Origin Trial in Chrome to ensure we get the right API shape and implementation.

I agree an origin trial makes the most sense. I don't know of any immediate customers but I can ask around.

npm1 · 2021-02-02T21:52:06Z

Just wanted to ping this to ask: are you aware of any potential customers for this data? Also adding @spanicker as this seems related to their problems of finding the 'FID culprit'.

patrickhulce · 2021-02-02T22:00:46Z

I am not though I imagine RUM perf monitoring solutions might be interested based on casual conversation? @spanicker might have more leads on where to go first with the FID attribution overlap :) 🤞

omriariav · 2021-07-12T17:50:20Z

@npm1 as a 3rd party vendor, we will find this useful to easily isolate our TBT (long tasks) impact and optimize it; this goes for both ad hoc fixes and constant monitoring over field data that we collect. I hope it helps.

npm1 · 2021-12-22T16:38:53Z

Would it be possible to implement this with your task tracking idea @yoavweiss ? How standardizable is that

yoavweiss · 2021-12-27T09:30:54Z

Would it be possible to implement this with your task tracking idea @yoavweiss ?

I believe so, but haven't prototyped that specifically.

How standardizable is that

I'll need to think about it a bit, but in theory it seems like we could integrate with the event loop's task posting and keep track of ancestry there.

noamr · 2023-01-16T15:34:50Z

I'm currently prototyping this or something similar.

noamr · 2024-04-09T10:20:11Z

Closing this in favor of w3c/long-animation-frames

clelland assigned yoavweiss Jun 20, 2022

noamr self-assigned this Jan 16, 2023

This was referenced Feb 5, 2023

Can we find which script or function is taking longer than 50 ms? #28

Closed

Proposal: add more main thread activity information to PerformanceEventTiming entry w3c/event-timing#109

Closed

Long Animation Frame (LoAF) explainer #100

Merged

noamr mentioned this issue Feb 16, 2023

New proposal: long animation frames #103

Closed

w3c deleted a comment from cklim647 Jul 12, 2023

noamr closed this as completed Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generic model for causal attribution (non-sampling based) #89

Generic model for causal attribution (non-sampling based) #89

patrickhulce commented Nov 18, 2020

npm1 commented Nov 20, 2020

patrickhulce commented Nov 20, 2020

npm1 commented Feb 2, 2021

patrickhulce commented Feb 2, 2021

omriariav commented Jul 12, 2021

npm1 commented Dec 22, 2021

yoavweiss commented Dec 27, 2021

noamr commented Jan 16, 2023 •

edited

Loading

noamr commented Apr 9, 2024

Generic model for causal attribution (non-sampling based) #89

Generic model for causal attribution (non-sampling based) #89

Comments

patrickhulce commented Nov 18, 2020

npm1 commented Nov 20, 2020

patrickhulce commented Nov 20, 2020

npm1 commented Feb 2, 2021

patrickhulce commented Feb 2, 2021

omriariav commented Jul 12, 2021

npm1 commented Dec 22, 2021

yoavweiss commented Dec 27, 2021

noamr commented Jan 16, 2023 • edited Loading

noamr commented Apr 9, 2024

noamr commented Jan 16, 2023 •

edited

Loading