-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stochastic optimization for parameters #774
Comments
Let me explain what the scirpt does. The script executes the following steps:
A point is the elo measurement doesn't need to be accurate. Because TPE assumes that the samples (the measured elo) contain noises. I measured elo with only 48 games. Another point is that 2-6 can be done in parallel. Hyperopt supports parallel search using MongoDB or other DB. We could be able to distribute tasks between fishtest nodes. There is one concern. If we measure elo with STC, the engine will get weak in LTC. This happened in computer shogi engine. We should use LTC to measure elo. |
See also: https://github.com/kiudee/chess-tuning-tools. It’s what is used by Leela developers with good success. It generates very pretty diagrams. |
@gonzalezjo Good one. I think we could also try it, compare it with the others and we'll see how it goes... |
@gonzalezjo I took a look at it and it seems that it tunes UCI options. We should then convert the parameters (from search) to UCI options I guess, right? Then take the resulting values from the tune and test them to see if they're good enough. Please correct me if I'm wrong :) |
Correct. That said, it’s not as bad as it sounds! The “TUNE” macro used for SPSA tuning does that for you, so the code is technically already written. |
@gonzalezjo I tried a simple tune, just for test, but didn't work for me (white always disconnects). I tuned the first engine like this: T1,510,460,560,5,0.0020 I have two executables (with the names engine1 (tuned) and engine2, both compiled with 'make build ARCH=x86-64-modern'), the network file and an opening book in the same folder. I finally have the simple_tune.json file. It looks like this:
I think this is okay, but when I run 'tune local -c simple_tune.json' I get the following 'out.pgn': [Event "?"] {White disconnects} 0-1 [Event "?"] {White disconnects} 0-1 And so on... What is wrong with this? I think it may be something related to the TUNE macro, but I'm not sure. Thanks :) |
@unaiic I found your issue here by accident and would like to help. I just recently converted my collection of scripts into a proper library and there might still be rough edges. Don’t hesitate to report bugs to the Issue Tracker. What happens if you run the following from the same directory? cutechess-cli -engine conf=engine1 tc=1 -engine conf=engine2 tc=1 -debug |
@kiudee Fine, I'll report it. BTW, this is the output I get: Warning: Unknown engine configuration: "engine1" |
Hm, is there a |
Yeah, my fault (I deleted them when I saw it didn't work). This is the actual output:
|
Ok, there are a few things you can try.
edit: I could try to make the library a bit more robust by allowing the user to set the working directory. |
Okay, now it seems to be playing games. I understand that this behaviour (from log.txt) is normal? 2020-08-23 16:47:44,956 INFO Starting iteration 0 |
Yes, that is the expected output. |
@kiudee Okay, thank you so much. I guess there is no need to report any issue then (it only needed the path to the net to be specified). |
@nodchip @gonzalezjo I suppose we should also consider this option and see its results. It's a simple option with an easy setup to work with. IMHO this is a good enough reason to test this out. Of course we also have @nodchip's implementation to work on and adapt it to SF. What are your thoughts on this? |
How about to compare three methods, SPSA, Hyperopt and chess-tuning-tools. The point will be computer resources vs elo improvements. |
@nodchip True. Computer resources are limited. The simple tune I started ~45mins ago has just started the 14th iteration (and I changed it to be TC 5'+0.05s), which shows that it's rather difficult to be able to compare them without external help. |
I would be interested in hearing @kiudee’s thoughts on his approach vs. Hyperopt and vs. SPSA, if he has the time and interest to share. |
@gonzalezjo I can write down a few general considerations which hold without having done extensive experiments of all three approaches. The biggest problem we have in optimizing several parameters at the same time is the curse of dimensionality. It is clear that the amount of points we have to collect blows up exponentially in the number of (effective) parameters (we basically have to build a "wall" for the optimum). The noise resulting from chess games inflates the number of iterations in addition by a constant factor. Now there are different ways to deal with this problem. With very strong regularity assumptions, you are able to optimize many parameters at the same time, but if those assumptions are violated you might only be able to find a local optimum (e.g. SPSA). What are the downsides? Since Hyperopt and chess-tuning-tools use a very flexible model, they eventually also suffer from the curse of dimensionality. If you want to optimize more than say 10 parameters, it will be difficult to model the target function in few iterations (< 2000). For Gaussian processes there are methods which are able to fit a lower dimensional subspace, allowing them to generalize with less number of iterations, but they are not yet implemented. What would be my recommendation?
|
Could we say then that there are two groups: 1) SPSA and 2) Hyperopt and chess-tuning-tools? If both methods from the second group behaved similarly and if Hyperopt didn't add any irrelevant advantages (@nodchip might have more insights on this), we could try to implement chess-tuning-tools (as it is by far the easiest one) and compare it to SPSA. And taking into account what @kiudee said, we could even try to mix SPSA and chess-tuning-tools for big tunes and see how it goes. IMO we should try to implement it right on fishtest; otherwise we won't have enough resources to test this things out. |
Thanks a lot for the explanation, kiudee! |
Also let me know if you need any specific functionality implemented. |
BTW, let me plug an advertisement for a framework I wrote a while ago, which allows for methods in the nevergrad suite of optimization methods to be used. Right now hardwired to TBPSA https://github.com/vondele/nevergrad4sf I can't say it was hugely successful, the resources needed to fine tune parameters in SF is just very large. |
I think that the best way is to try each method one by one, and compare the results. |
I agree with nodchip. There is no data yet to definitively favor one method over another. |
Okay, then I guess we should implement them into fishtest; otherwise we won't have the required resources to test them. |
Just stumbled over this very interesting discussion accidentally. Just in case you don't already know, @fsmosca implemented Optuna here https://github.com/fsmosca/Optuna-Game-Parameter-Tuner However, all these methods have one big disadvantage, they don't deal with the very noisy evaluations if we pass game results. One way to deal with this could be to return the averaged values of the x best evaluated points instead of simply returning the single best one. This looks like a very crude way, though ... Another interesting way I found here https://facebookresearch.github.io/nevergrad/optimizers_ref.html#nevergrad.optimization.optimizerlib.ParametrizedOnePlusOne |
@joergoster both Optuna and chess-tuning-tools explicitly model the noise of the objective function (it was even the main motivation why I forked off bayes-skopt to begin with). Could you clarify what you mean by that? edit (To give more context): |
@kiudee A match result can widely vary as we all know. Even when playing hundreds of games per match.
which you can find on top of page 3 in this paper: https://arxiv.org/abs/1807.02811# Note 1, I didn't try your chess-tuning-tools by now. |
Ah okay, I see what you mean. Historically, Bayesian optimization was used mainly for computer experiments, where it is possible to set the random seeds such that each experiment is deterministic. In such a setting it makes sense to return the point which received the best score so far. I totally agree that this does not make any sense for chess (well, unless you plan to run thousands of games per iteration). I don’t exactly know what the tuning tool based on Optuna is doing, but chess-tuning-tools is returning the global optimum of the mean Gaussian process. In addition, for the output plots it is also showing the "pessimistic optimum" which takes the current uncertainty into account (see here for more details). |
@kiudee Thank you, very interesting. So chess-tuning-tools is next on my list now. ;-) |
The default surrogate model or sampler in optuna is TPE. See optimize. Other samplers including the grid and random. The TPE sampler has some interesting parameters and methods too. The gaussian prior is enabled by default. |
@fsmosca do you know how Optuna selects the final point? Is it the best one tried so far or the optimum of the surrogate model. |
It looks like the param returned is the one that performed best in the trial. |
Ok, then you should be very careful with that. The point which performed best in our setting could just be a very lucky point (especially the fewer games you run per point). Another problem is that the longer you run the algorithm, more points will share the same match result. |
As you can see, this already happens with only 200 trials and 160 games per trial, as reported by me here. |
There are actually 2 methods of determining the best param and best value currently implemented on my tuner. The setup is test engine vs base engine. At trial 0 the base engine will use the default param and test engine will always take the param from the optimizer.
Looks like method 2 needs a higher games per trial than method 1 as the base engine is constantly using the default param. Method 1 is dynamic, the param that wins might be lucky on certain trial, but it might get corrected in the next trial. We all know that these optimizations need more tuning games to be more reliable. |
@joergoster I assume you refer the to mpi aspect of it? That's unfortunate, but the best I could come up with to allow distributed optimization (like fishtest, but without writing a new fishtest). The real problem with tuning (at least stockfish level engines) is that one needs 100000s of games, no matter how smart the optimizer is, and these games need to be at the relevant TC. The Elo differences are usually just too small to measure otherwise. |
@vondele I do understand this, yet the average user doesn't do distributed optimization but simply local on one computer. |
@joergoster chess-tuning-tools does not support manual testing of points yet. This is on my to do list though. |
untested, this might work (replace MPIPoolExecutor by ProcessPoolExecutor): diff --git a/nevergrad4sf.py b/nevergrad4sf.py
index b5e6813..5bbefa7 100644
--- a/nevergrad4sf.py
+++ b/nevergrad4sf.py
@@ -19,9 +19,8 @@ import textwrap
import nevergrad as ng
from subprocess import Popen, PIPE
from cutechess_batches import CutechessExecutorBatch, calc_stats
-from mpi4py import MPI
-from mpi4py.futures import MPIPoolExecutor
from concurrent.futures import ThreadPoolExecutor
+from concurrent.futures import ProcessPoolExecutor
def get_sf_parameters(stockfish_exe):
@@ -76,18 +75,6 @@ def ng4sf(
games per batch, cutechess concurrency, and evaluation batch concurrency
"""
- # ready to run with mpi
- size = MPI.COMM_WORLD.Get_size()
- print()
- if size > 1:
- print(
- "Launched ... with %d mpi ranks (1 master, %d workers)." % (size, size - 1)
- )
- print(flush=True)
- else:
- sys.stderr.write("ng4sf needs to run under mpi with at least 2 MPI ranks.\n")
- sys.exit(1)
-
# print summary
print("stockfish binary : ", stockfish)
print("stockfish reference binary : ", stockfishRef)
@@ -128,7 +115,7 @@ def ng4sf(
rounds=((games_per_batch + 1) // 2 + mpi_subbatches - 1) // mpi_subbatches,
concurrency=cutechess_concurrency,
batches=mpi_subbatches,
- executor=MPIPoolExecutor(),
+ executor=ProcessPoolExecutor(),
)
restartFileName = "ng_restart.pkl"
|
@kiudee Got it up and running! Here is the result of a very quick 1st testrun (5 rounds only).
|
@kiudee Would it make sense to successively increase the number of games, or will this make all of the calculations so far useless? |
@joergoster I tried to answer the question here in the FAQ. |
I have couple of updates on optuna tuner, now with skopt sampler from scikit with GP model along with different acquisition functions (acq_func) such as LCB, EI, PI and others. There are also explore and exploit factors on selected acq_func that can be interesting to play with. This is the results so far at depth 6 up to 100 trials. Scroll below to see the summary. Models such as GBRT (Gradient boosted regression trees) and ET (extra trees regressor) are not yet in the current repo will add it later. |
@fsmosca feel free to test on fishtest optimized parameters, I've found that to be an essential step. In particular, I find that parameters might be improved at the TC (or depth) used in optimization, but fail at higher TC. |
Thanks will do that once I get a promising param at longer TC. Currently exploring different optimization algorithm. |
I tried to implement that with a very interesting tuning. This is just a fixed depth 6 test to see how it behaves. Command line
The param to be optimized
Setup
Nevergrad optimizerI set the optimistic as default for noise handling.
Right on top it is already matching same params
On later budgetsIt is determined on targeting the param
The best after 100 budget
Game testThe tuning was done at depth 6, so the game test is also done at depth 6. NeverGrad's param won.
|
@fsmosca Very interesting! |
Lakas repo in github. |
@fsmosca Awesome! So much possibilities to try now ... :-) |
@fsmosca I saw that Lakas requires python 3.8.x, you could add a link to pyenv, see here an example to setup the latest python #778 (comment) |
@ppigazzini Thanks for the info on pyenv. I don't know about it. But will take a look at it. I use pycharm as IDE. I also developed on windows 10. |
@fsmosca on windows is easy to install several python versions, on linux it's suggested to use pyenv and virtual environment to keep clean the system python. |
official-stockfish/Stockfish#2915 (comment)
Here some interesting methods were proposed. But regarding the first one, and after having mentioned to @nodchip, he told me about the implementation he used. I took his scripts and created a repo (https://github.com/unaiic/optimizer) where we can adapt them to SF and see how it goes. The scripts make use of Hyperopt, although we could also use Optune; we should see what is best in this case. I think you could help with this :)
The text was updated successfully, but these errors were encountered: