You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Similar to what criterion does, but I think a useful starting point would just be a ±% change in times between runs (if it's determined that the two runs differ significantly given the variance of each)!
I imagine this is somewhat blocked on writing out the previous benchmark results somewhere they can be referenced first!
The text was updated successfully, but these errors were encountered:
This is a bit of a nuanced issue. Currently benchmarks don't have any statistics outside of min/max/median/mean time. But I would very much like to do proper statistical analysis across benchmark runs to determine if a difference is distinguishable from random (i.e. statistically-significant).
The way the Stabilizer folks went about it resulted in a normal distribution of results. But being an easy-to-pickup userspace program, Divan doesn't have the same luxury of being an LLVM plugin. That said, benchmarks tend to follow a log-normal distribution. So perhaps we can make the same stats work from that?
As per the intent of the issue, my plan once #10/#42 is complete is to report a ±% change from the previous run given information previously recorded in target/divan.
Similar to what criterion does, but I think a useful starting point would just be a ±% change in times between runs (if it's determined that the two runs differ significantly given the variance of each)!
I imagine this is somewhat blocked on writing out the previous benchmark results somewhere they can be referenced first!
The text was updated successfully, but these errors were encountered: