-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove two xors by setting the hash keys for unreachable squares to zero #5754
base: master
Are you sure you want to change the base?
Conversation
performance before: 3.6714 +- 0.20% Gcycles 3.6620 +- 0.12% Gcycles 3.6704 +- 0.26% Gcycles 3.6602 +- 0.27% Gcycles 3.6799 +- 0.37% Gcycles after: 3.6540 +- 0.30% Gcycles 3.6388 +- 0.25% Gcycles 3.6557 +- 0.17% Gcycles 3.6449 +- 0.15% Gcycles 3.6460 +- 0.26% Gcycles (every line is a different `profile-build` and shows the number of cycles needed for `./stockfish bench`, measured with `perf stat -r 10`) speedup seems significant, probably around .5%
Come to think of it, I'm struggling to explain how removing those two instructions (plus maybe a Anyway, it might be better to evaluate the PR under the assumption that it's performance-neutral. I would still argue that it can be viewed as a simplification. |
Maybe the speedup isn't large but it's clear simply by inspection that removing extra operations can't be anything other than a speedup. Congrats! |
Okay there is something weird going on... hence the CI failures (which lack some output, separate issue to fix) If you run the following on this PR
you will get the following, this doesn't happen for master.
Stacktrace
|
Ah, I see what that is. There places like Line 388 in c76c179
where the psq tables are "misused". Let me think how to fix this. |
performance before (c76c179):
3.6714 +- 0.20% Gcycles
3.6620 +- 0.12% Gcycles
3.6704 +- 0.26% Gcycles
3.6602 +- 0.27% Gcycles
3.6799 +- 0.37% Gcycles
this pr:
3.6540 +- 0.30% Gcycles
3.6388 +- 0.25% Gcycles
3.6557 +- 0.17% Gcycles
3.6449 +- 0.15% Gcycles
3.6460 +- 0.26% Gcycles
every line is a different
profile-build
and shows the number of cycles needed for./stockfish bench
, measured withperf stat -r 10
. This is on an intel Meteor Lake P-core.speedup seems significant, probably around .5%
I'd argue that this is a simplification?
I could avoid setting the Zobrist key twice (first to a random number and then zeroing) but apparently changing the state of
rng
changes the bench.bench: 999324