Add statistically-significant improvement reporting #48

TheLostLambda · 2024-03-15T00:39:52Z

Similar to what criterion does, but I think a useful starting point would just be a ±% change in times between runs (if it's determined that the two runs differ significantly given the variance of each)!

I imagine this is somewhat blocked on writing out the previous benchmark results somewhere they can be referenced first!

nvzqz · 2024-06-30T10:02:56Z

This is a bit of a nuanced issue. Currently benchmarks don't have any statistics outside of min/max/median/mean time. But I would very much like to do proper statistical analysis across benchmark runs to determine if a difference is distinguishable from random (i.e. statistically-significant).

The way the Stabilizer folks went about it resulted in a normal distribution of results. But being an easy-to-pickup userspace program, Divan doesn't have the same luxury of being an LLVM plugin. That said, benchmarks tend to follow a log-normal distribution. So perhaps we can make the same stats work from that?

As per the intent of the issue, my plan once #10/#42 is complete is to report a ±% change from the previous run given information previously recorded in target/divan.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add statistically-significant improvement reporting #48

Add statistically-significant improvement reporting #48

TheLostLambda commented Mar 15, 2024

nvzqz commented Jun 30, 2024 •

edited

Loading

Add statistically-significant improvement reporting #48

Add statistically-significant improvement reporting #48

Comments

TheLostLambda commented Mar 15, 2024

nvzqz commented Jun 30, 2024 • edited Loading

nvzqz commented Jun 30, 2024 •

edited

Loading