Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove two xors by setting the hash keys for unreachable squares to zero #5754

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ces42
Copy link
Contributor

@ces42 ces42 commented Jan 7, 2025

performance before (c76c179):
3.6714 +- 0.20% Gcycles
3.6620 +- 0.12% Gcycles
3.6704 +- 0.26% Gcycles
3.6602 +- 0.27% Gcycles
3.6799 +- 0.37% Gcycles

this pr:
3.6540 +- 0.30% Gcycles
3.6388 +- 0.25% Gcycles
3.6557 +- 0.17% Gcycles
3.6449 +- 0.15% Gcycles
3.6460 +- 0.26% Gcycles

every line is a different profile-build and shows the number of cycles needed for ./stockfish bench, measured with perf stat -r 10. This is on an intel Meteor Lake P-core.
speedup seems significant, probably around .5%

I'd argue that this is a simplification?
I could avoid setting the Zobrist key twice (first to a random number and then zeroing) but apparently changing the state of rng changes the bench.

bench: 999324

performance before:
3.6714 +- 0.20%  Gcycles
3.6620 +- 0.12%  Gcycles
3.6704 +- 0.26%  Gcycles
3.6602 +- 0.27%  Gcycles
3.6799 +- 0.37%  Gcycles

after:
3.6540 +- 0.30%  Gcycles
3.6388 +- 0.25%  Gcycles
3.6557 +- 0.17%  Gcycles
3.6449 +- 0.15%  Gcycles
3.6460 +- 0.26%  Gcycles

(every line is a different `profile-build` and shows the number of cycles needed for `./stockfish bench`, measured with `perf stat -r 10`)

speedup seems significant, probably around .5%
@ces42
Copy link
Contributor Author

ces42 commented Jan 7, 2025

Come to think of it, I'm struggling to explain how removing those two instructions (plus maybe a mov for accessing pawnKey, but it's accessed further down anyway) would explain a .5% speedup, given that perf record shows that SF only speds 3% of cycles in do_move(). Maybe changes in binary layout could explain this?

Anyway, it might be better to evaluate the PR under the assumption that it's performance-neutral. I would still argue that it can be viewed as a simplification.

@mstembera
Copy link
Contributor

mstembera commented Jan 7, 2025

Maybe the speedup isn't large but it's clear simply by inspection that removing extra operations can't be anything other than a speedup. Congrats!

@Disservin
Copy link
Member

Okay there is something weird going on... hence the CI failures (which lack some output, separate issue to fix)

If you run the following on this PR

uci
setoption name SyzygyPath value /home/max/Github/Stockfish/tests/syzygy
bench 128 1 8 default depth

you will get the following, this doesn't happen for master.

Position: 36/48 (8/8/8/8/5kp1/P7/8/1K1N4 w - - 0 1)
info string info string NNUE evaluation using nn-1c0000000000.nnue (133MiB, (22528, 3072, 15, 32, 1))
info string info string NNUE evaluation using nn-37f18f62d772.nnue (6MiB, (22528, 128, 15, 32, 1))
info depth 1 seldepth 2 multipv 1 score cp 93 nodes 11 nps 11000 hashfull 0 tbhits 0 time 1 pv b1c2
info depth 2 seldepth 2 multipv 1 score cp 93 nodes 22 nps 22000 hashfull 0 tbhits 0 time 1 pv b1c2
info depth 3 seldepth 2 multipv 1 score cp 93 nodes 32 nps 32000 hashfull 0 tbhits 0 time 1 pv b1c2
info depth 4 seldepth 2 multipv 1 score cp 93 nodes 43 nps 43000 hashfull 0 tbhits 0 time 1 pv b1c2
info depth 5 seldepth 3 multipv 1 score cp 93 nodes 63 nps 63000 hashfull 0 tbhits 0 time 1 pv b1c2 g4g3
malloc(): invalid size (unsorted)

Stacktrace

#6  0x00007ffff7aa7cfc in malloc_printerr (str=str@entry=0x7ffff7be5bc0 "malloc(): invalid size (unsorted)") at ./malloc/malloc.c:5664
#7  0x00007ffff7aab0dc in _int_malloc (av=av@entry=0x7fffe8000030, bytes=bytes@entry=1272) at ./malloc/malloc.c:4002
#8  0x00007ffff7aac139 in __GI___libc_malloc (bytes=1272) at ./malloc/malloc.c:3329
#9  0x00007ffff7df251c in operator new(unsigned long) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#10 0x000055555558aa8b in Stockfish::(anonymous namespace)::set_sizes(Stockfish::(anonymous namespace)::PairsData*, unsigned char*) [clone .part.0] ()
#11 0x00005555555ab817 in void* Stockfish::(anonymous namespace)::mapped<(Stockfish::(anonymous namespace)::TBType)0>(Stockfish::(anonymous namespace)::TBTable<(Stockfish::(anonymous namespace)::TBType)0>&, Stockfish::Position const&) ()
#12 0x00005555555ac3db in Stockfish::Tablebases::WDLScore Stockfish::(anonymous namespace)::search<false>(Stockfish::Position&, Stockfish::Tablebases::ProbeState*) ()
#13 0x00005555555ac16b in Stockfish::Tablebases::WDLScore Stockfish::(anonymous namespace)::search<false>(Stockfish::Position&, Stockfish::Tablebases::ProbeState*) ()
#14 0x00005555555af8a5 in int Stockfish::Search::Worker::search<(Stockfish::NodeType)0>(Stockfish::Position&, Stockfish::Search::Stack*, int, int, int, bool) ()
#15 0x00005555555ad095 in int Stockfish::Search::Worker::search<(Stockfish::NodeType)0>(Stockfish::Position&, Stockfish::Search::Stack*, int, int, int, bool) ()
#16 0x00005555555af4ce in int Stockfish::Search::Worker::search<(Stockfish::NodeType)0>(Stockfish::Position&, Stockfish::Search::Stack*, int, int, int, bool) ()
#17 0x00005555555bb0a8 in int Stockfish::Search::Worker::search<(Stockfish::NodeType)2>(Stockfish::Position&, Stockfish::Search::Stack*, int, int, int, bool) [clone .constprop.0] ()
#18 0x00005555555bd25f in Stockfish::Search::Worker::iterative_deepening() ()
#19 0x00005555555c08fb in Stockfish::Search::Worker::start_searching() [clone .part.0] ()
#20 0x0000555555593409 in Stockfish::Thread::idle_loop() ()
#21 0x000055555558fa52 in Stockfish::NativeThread::NativeThread<void (Stockfish::Thread::*)(), Stockfish::Thread*>(void (Stockfish::Thread::*&&)(), Stockfish::Thread*&&)::{lambda(void*)#1}::_FUN(void*) ()
#22 0x00007ffff7a9bac3 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#23 0x00007ffff7b2d850 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81

@ces42
Copy link
Contributor Author

ces42 commented Jan 7, 2025

Ah, I see what that is. There places like

st->materialKey ^= Zobrist::psq[pc][cnt];

where the psq tables are "misused". Let me think how to fix this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants