-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce dual NNUE evaluation #4910
Conversation
bench: 1449578
bench: 1380121
bench 1380121
bench 1440404
Bench: 1440404
Bench: 1216398
clang-format 17 needs to be run on this PR. (execution 7141862302 / attempt 1) |
Nice result. But there are some issues with this VVLTC testing (currently equivalent to 480/4.8). What if next week someone "simplifies away" the extra net using the standard procedure STC+LTC.... ? Also: can the same gain be achieved just by using a bigger base net? |
Congrats! I think we should also test just the parameter changes in search by themselves at the same VLTC to prove the gain isn't coming just from those alone. |
btw the makefile for net2 could be simplified a bit to something like this, to avoid the duplicated rules. define evaluate_network
@echo "Default net: $(nnuenet)"
@if [ "x$(curl_or_wget)" = "x" ]; then \
echo "Neither curl nor wget is installed. Install one of these tools unless the net has been downloaded manually"; \
fi
@if [ "x$(shasum_command)" = "x" ]; then \
echo "shasum / sha256sum not found, skipping net validation"; \
elif test -f "$(nnuenet)"; then \
if [ "$(nnuenet)" != "nn-"`$(shasum_command) $(nnuenet) | cut -c1-12`".nnue" ]; then \
echo "Removing invalid network"; rm -f $(nnuenet); \
fi; \
fi;
@for nnuedownloadurl in "$(nnuedownloadurl1)" "$(nnuedownloadurl2)"; do \
if test -f "$(nnuenet)"; then \
echo "$(nnuenet) available : OK"; break; \
else \
if [ "x$(curl_or_wget)" != "x" ]; then \
echo "Downloading $${nnuedownloadurl}"; $(curl_or_wget) $${nnuedownloadurl} > $(nnuenet);\
else \
echo "No net found and download not possible"; exit 1;\
fi; \
fi; \
if [ "x$(shasum_command)" != "x" ]; then \
if [ "$(nnuenet)" != "nn-"`$(shasum_command) $(nnuenet) | cut -c1-12`".nnue" ]; then \
echo "Removing failed download"; rm -f $(nnuenet); \
fi; \
fi; \
done
@if ! test -f "$(nnuenet)"; then \
echo "Failed to download $(nnuenet)."; \
fi;
@if [ "x$(shasum_command)" != "x" ]; then \
if [ "$(nnuenet)" = "nn-"`$(shasum_command) $(nnuenet) | cut -c1-12`".nnue" ]; then \
echo "Network validated"; break; \
fi; \
fi;
endef
define netvariables =
$(eval nnuenet := $(shell grep $(1) evaluate.h | grep define | sed 's/.*\(nn-[a-z0-9]\{12\}.nnue\).*/\1/'))
$(eval nnuedownloadurl1 := https://tests.stockfishchess.org/api/nn/$(nnuenet))
$(eval nnuedownloadurl2 := https://github.com/official-stockfish/networks/raw/master/$(nnuenet))
$(eval curl_or_wget := $(shell if hash curl 2>/dev/null; then echo "curl -skL"; elif hash wget 2>/dev/null; then echo "wget -qO-"; fi))
$(eval shasum_command := $(shell if hash shasum 2>/dev/null; then echo "shasum -a 256 "; elif hash sha256sum 2>/dev/null; then echo "sha256sum "; fi))
endef
# evaluation network (nnue)
net:
$(call netvariables, EvalFileDefaultNameBig)
$(call evaluate_network)
$(call netvariables, EvalFileDefaultNameSmall)
$(call evaluate_network) |
I've marked it as a draft for now since there are a couple of things to first resolve.
and probably some more points which I cant think of right now |
btw test is running here https://tests.stockfishchess.org/tests/view/65731ccbf09ce1261f12246e |
The SPSA tuning was done for 44k games at 120+1.2. https://tests.stockfishchess.org/tests/view/656ee2a76980e15f69c7767f. Note that the tune was originally done in combination with the recent dual NNUE idea (see #4910). VLTC: https://tests.stockfishchess.org/tests/view/65731ccbf09ce1261f12246e LLR: 2.95 (-2.94,2.94) <0.00,2.00> Total: 52806 W: 13069 L: 12760 D: 26977 Ptnml(0-2): 19, 5498, 15056, 5815, 15 VLTC SMP: https://tests.stockfishchess.org/tests/view/65740ffaf09ce1261f1239ba LLR: 2.94 (-2.94,2.94) <0.50,2.50> Total: 27630 W: 6934 L: 6651 D: 14045 Ptnml(0-2): 1, 2643, 8243, 2928, 0 Estimated close to neutral at LTC: https://tests.stockfishchess.org/tests/view/6575485a8ec68176cf7d9423 Elo: -0.59 ± 1.8 (95%) LOS: 26.6% Total: 32060 W: 7859 L: 7913 D: 16288 Ptnml(0-2): 20, 3679, 8676, 3645, 10 nElo: -1.21 ± 3.8 (95%) PairsRatio: 0.99 closes #4912 Bench: 1283323
I think we can close this in the meantime, since the gainer likely came from #4912. |
The SPSA tuning was done for 44k games at 120+1.2. https://tests.stockfishchess.org/tests/view/656ee2a76980e15f69c7767f. Note that the tune was originally done in combination with the recent dual NNUE idea (see official-stockfish#4910). VLTC: https://tests.stockfishchess.org/tests/view/65731ccbf09ce1261f12246e LLR: 2.95 (-2.94,2.94) <0.00,2.00> Total: 52806 W: 13069 L: 12760 D: 26977 Ptnml(0-2): 19, 5498, 15056, 5815, 15 VLTC SMP: https://tests.stockfishchess.org/tests/view/65740ffaf09ce1261f1239ba LLR: 2.94 (-2.94,2.94) <0.50,2.50> Total: 27630 W: 6934 L: 6651 D: 14045 Ptnml(0-2): 1, 2643, 8243, 2928, 0 Estimated close to neutral at LTC: https://tests.stockfishchess.org/tests/view/6575485a8ec68176cf7d9423 Elo: -0.59 ± 1.8 (95%) LOS: 26.6% Total: 32060 W: 7859 L: 7913 D: 16288 Ptnml(0-2): 20, 3679, 8676, 3645, 10 nElo: -1.21 ± 3.8 (95%) PairsRatio: 0.99 closes official-stockfish#4912 Bench: 1283323
The SPSA tuning was done for 44k games at 120+1.2. https://tests.stockfishchess.org/tests/view/656ee2a76980e15f69c7767f. Note that the tune was originally done in combination with the recent dual NNUE idea (see official-stockfish#4910). VLTC: https://tests.stockfishchess.org/tests/view/65731ccbf09ce1261f12246e LLR: 2.95 (-2.94,2.94) <0.00,2.00> Total: 52806 W: 13069 L: 12760 D: 26977 Ptnml(0-2): 19, 5498, 15056, 5815, 15 VLTC SMP: https://tests.stockfishchess.org/tests/view/65740ffaf09ce1261f1239ba LLR: 2.94 (-2.94,2.94) <0.50,2.50> Total: 27630 W: 6934 L: 6651 D: 14045 Ptnml(0-2): 1, 2643, 8243, 2928, 0 Estimated close to neutral at LTC: https://tests.stockfishchess.org/tests/view/6575485a8ec68176cf7d9423 Elo: -0.59 ± 1.8 (95%) LOS: 26.6% Total: 32060 W: 7859 L: 7913 D: 16288 Ptnml(0-2): 20, 3679, 8676, 3645, 10 nElo: -1.21 ± 3.8 (95%) PairsRatio: 0.99 closes official-stockfish#4912 Bench: 1283323
Alongside the currently used NNUE net, introduce a much smaller L1-256 net that is used when the material imbalance in the position is high; an improvement over the previously used simple eval. This is then combined with search tuning (SPSA, 44k games at 120+1.2).
For more details, see the original PR: #4898.
Acknowledgements: Many thanks to @mstembera & @linrock for writing the original implementation code, and @linrock for training the net used in this patch!
VLTC: https://tests.stockfishchess.org/tests/view/657075336980e15f69c7998f
LLR: 2.95 (-2.94,2.94) <0.00,2.00>
Total: 81342 W: 20137 L: 19788 D: 41417
Ptnml(0-2): 13, 8678, 22933, 9041, 6
VLTC SMP: https://tests.stockfishchess.org/tests/view/657288776980e15f69c7c7e7
LLR: 2.94 (-2.94,2.94) <0.50,2.50>
Total: 19796 W: 5038 L: 4772 D: 9986
Ptnml(0-2): 3, 1829, 5965, 2101, 0
Bench: 1216398