Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stochastic optimization for parameters #774

Open
unaiic opened this issue Aug 22, 2020 · 62 comments
Open

Stochastic optimization for parameters #774

unaiic opened this issue Aug 22, 2020 · 62 comments
Labels
enhancement server server side changes worker update code changes requiring a worker update

Comments

@unaiic
Copy link
Contributor

unaiic commented Aug 22, 2020

official-stockfish/Stockfish#2915 (comment)

Here some interesting methods were proposed. But regarding the first one, and after having mentioned to @nodchip, he told me about the implementation he used. I took his scripts and created a repo (https://github.com/unaiic/optimizer) where we can adapt them to SF and see how it goes. The scripts make use of Hyperopt, although we could also use Optune; we should see what is best in this case. I think you could help with this :)

@nodchip
Copy link

nodchip commented Aug 22, 2020

Let me explain what the scirpt does. The script executes the following steps:

  1. Read search parameters from a config file.
    • This config file is formated as a header file with definitions for search parameters.
    • Each search parameter has the following data.
      • name
      • default value
      • min value
      • max value
  2. Select a search parameter set.
    • By the default settings, the first 20 search parameter sets are selected randomly, and the other parameter sets are selected with Tree of Parzen Estimators (TPE).
  3. Write the selected search parameters to a header file.
  4. Build an engine binary with the header file created in 3.
  5. Measure the elo difference between a baseline binary and the binary compiled in 4.
  6. Return the elo measured in 5 to Hyperopt.
  7. Repeat 2-6 specific times.
  8. Apply Gaussian Process to select the best search parameter set.

A point is the elo measurement doesn't need to be accurate. Because TPE assumes that the samples (the measured elo) contain noises. I measured elo with only 48 games.

Another point is that 2-6 can be done in parallel. Hyperopt supports parallel search using MongoDB or other DB. We could be able to distribute tasks between fishtest nodes.

There is one concern. If we measure elo with STC, the engine will get weak in LTC. This happened in computer shogi engine. We should use LTC to measure elo.

@gonzalezjo
Copy link

gonzalezjo commented Aug 22, 2020

See also: https://github.com/kiudee/chess-tuning-tools.

It’s what is used by Leela developers with good success. It generates very pretty diagrams.

@ppigazzini ppigazzini added enhancement server server side changes worker update code changes requiring a worker update labels Aug 22, 2020
@unaiic
Copy link
Contributor Author

unaiic commented Aug 22, 2020

@gonzalezjo Good one. I think we could also try it, compare it with the others and we'll see how it goes...

@unaiic
Copy link
Contributor Author

unaiic commented Aug 22, 2020

@gonzalezjo I took a look at it and it seems that it tunes UCI options. We should then convert the parameters (from search) to UCI options I guess, right? Then take the resulting values from the tune and test them to see if they're good enough. Please correct me if I'm wrong :)

@gonzalezjo
Copy link

gonzalezjo commented Aug 22, 2020

@gonzalezjo I took a look at it and it seems that it tunes UCI options. We should then convert the parameters (from search) to UCI options I guess, right? Then take the resulting values from the tune and test them to see if they're good enough. Please correct me if I'm wrong :)

Correct. That said, it’s not as bad as it sounds! The “TUNE” macro used for SPSA tuning does that for you, so the code is technically already written.

@unaiic
Copy link
Contributor Author

unaiic commented Aug 23, 2020

@gonzalezjo I tried a simple tune, just for test, but didn't work for me (white always disconnects). I tuned the first engine like this:

T1,510,460,560,5,0.0020
T2,223,210,236,1.3,0.0020

I have two executables (with the names engine1 (tuned) and engine2, both compiled with 'make build ARCH=x86-64-modern'), the network file and an opening book in the same folder. I finally have the simple_tune.json file. It looks like this:

 {
       "engines": [
           {
               "command": "engine1",
               "fixed_parameters": {
                   "Threads": 1
               }
           },
           {
               "command": "engine2",
               "fixed_parameters": {
                   "Threads": 1
               }
           }
       ],
       "parameter_ranges": {
           "T1": "Real(460, 560)",
           "T2": "Real(210, 236)"
       },
       "engine1_tc": "60+0.6",
       "engine2_tc": "60+0.6",
       "opening_file": "noob_3moves.pgn"
   }

I think this is okay, but when I run 'tune local -c simple_tune.json' I get the following 'out.pgn':

[Event "?"]
[Site "?"]
[Date "2020.08.23"]
[Round "1"]
[White "engine1"]
[Black "engine2"]
[Result "0-1"]
[FEN "rn1qkbnr/ppp1p1pp/4b3/1P1p1p2/8/B7/P1PPPPPP/RN1QKBNR w KQkq - 0 1"]
[PlyCount "0"]
[SetUp "1"]
[Termination "abandoned"]
[TimeControl "60+0.6"]

{White disconnects} 0-1

[Event "?"]
[Site "?"]
[Date "2020.08.23"]
[Round "1"]
[White "engine2"]
[Black "engine1"]
[Result "0-1"]
[FEN "rn1qkbnr/ppp1p1pp/4b3/1P1p1p2/8/B7/P1PPPPPP/RN1QKBNR w KQkq - 0 1"]
[PlyCount "0"]
[SetUp "1"]
[Termination "abandoned"]
[TimeControl "60+0.6"]

{White disconnects} 0-1

And so on...

What is wrong with this? I think it may be something related to the TUNE macro, but I'm not sure. Thanks :)

@kiudee
Copy link

kiudee commented Aug 23, 2020

@unaiic I found your issue here by accident and would like to help. I just recently converted my collection of scripts into a proper library and there might still be rough edges. Don’t hesitate to report bugs to the Issue Tracker.
For the next release I am planning to allow more verbose output of cutechess-cli to be forwarded.

What happens if you run the following from the same directory?

cutechess-cli -engine conf=engine1 tc=1 -engine conf=engine2 tc=1 -debug

@unaiic
Copy link
Contributor Author

unaiic commented Aug 23, 2020

@kiudee Fine, I'll report it. BTW, this is the output I get:

Warning: Unknown engine configuration: "engine1"
Warning: Invalid value for option "-engine": "conf=engine1 tc=1"

@kiudee
Copy link

kiudee commented Aug 23, 2020

Hm, is there a engines.json file in the folder? That should be created automatically by the chess-tuning-tools.

@unaiic
Copy link
Contributor Author

unaiic commented Aug 23, 2020

Yeah, my fault (I deleted them when I saw it didn't work). This is the actual output:

7 >engine1(0): setoption name Threads value 1
8 >engine1(0): setoption name T1 value 523.6574654907396
8 >engine1(0): setoption name T2 value 221.84373184658273
8 >engine1(0): uci
8 >engine2(1): setoption name Threads value 1
8 >engine2(1): uci
9 <engine1(0): Stockfish 220820 by the Stockfish developers (see AUTHORS file)
9 <engine1(0): T1,510,460,560,5,0.0020
9 <engine1(0): T2,223,210,236,1.3,0.0020
9 <engine2(1): Stockfish 220820 by the Stockfish developers (see AUTHORS file)
152 <engine2(1): id name Stockfish 220820
152 <engine2(1): id author the Stockfish developers (see AUTHORS file)
152 <engine2(1): option name Debug Log File type string default 
152 <engine2(1): option name Contempt type spin default 24 min -100 max 100
152 <engine2(1): option name Analysis Contempt type combo default Both var Off var White var Black var Both
152 <engine2(1): option name Threads type spin default 1 min 1 max 512
152 <engine2(1): option name Hash type spin default 16 min 1 max 33554432
152 <engine2(1): option name Clear Hash type button
153 <engine2(1): option name Ponder type check default false
153 <engine2(1): option name MultiPV type spin default 1 min 1 max 500
153 <engine2(1): option name Skill Level type spin default 20 min 0 max 20
153 <engine2(1): option name Move Overhead type spin default 10 min 0 max 5000
153 <engine2(1): option name Slow Mover type spin default 100 min 10 max 1000
153 <engine2(1): option name nodestime type spin default 0 min 0 max 10000
153 <engine2(1): option name UCI_Chess960 type check default false
153 <engine2(1): option name UCI_AnalyseMode type check default false
153 <engine2(1): option name UCI_LimitStrength type check default false
153 <engine2(1): option name UCI_Elo type spin default 1350 min 1350 max 2850
153 <engine2(1): option name UCI_ShowWDL type check default false
153 <engine2(1): option name SyzygyPath type string default <empty>
153 <engine2(1): option name SyzygyProbeDepth type spin default 1 min 1 max 100
153 <engine2(1): option name Syzygy50MoveRule type check default true
153 <engine2(1): option name SyzygyProbeLimit type spin default 7 min 0 max 7
153 <engine2(1): option name Use NNUE type check default true
153 <engine2(1): option name EvalFile type string default nn-82215d0fd0df.nnue
153 <engine2(1): uciok
153 >engine2(1): isready
153 <engine2(1): readyok
153 <engine1(0): id name Stockfish 220820
153 <engine1(0): id author the Stockfish developers (see AUTHORS file)
153 <engine1(0): option name Debug Log File type string default 
153 <engine1(0): option name Contempt type spin default 24 min -100 max 100
153 <engine1(0): option name Analysis Contempt type combo default Both var Off var White var Black var Both
153 <engine1(0): option name Threads type spin default 1 min 1 max 512
153 <engine1(0): option name Hash type spin default 16 min 1 max 33554432
153 <engine1(0): option name Clear Hash type button
153 <engine1(0): option name Ponder type check default false
153 <engine1(0): option name MultiPV type spin default 1 min 1 max 500
153 <engine1(0): option name Skill Level type spin default 20 min 0 max 20
153 <engine1(0): option name Move Overhead type spin default 10 min 0 max 5000
153 <engine1(0): option name Slow Mover type spin default 100 min 10 max 1000
153 <engine1(0): option name nodestime type spin default 0 min 0 max 10000
153 <engine1(0): option name UCI_Chess960 type check default false
153 <engine1(0): option name UCI_AnalyseMode type check default false
153 <engine1(0): option name UCI_LimitStrength type check default false
153 <engine1(0): option name UCI_Elo type spin default 1350 min 1350 max 2850
153 <engine1(0): option name UCI_ShowWDL type check default false
154 <engine1(0): option name SyzygyPath type string default <empty>
154 <engine1(0): option name SyzygyProbeDepth type spin default 1 min 1 max 100
154 <engine1(0): option name Syzygy50MoveRule type check default true
154 <engine1(0): option name SyzygyProbeLimit type spin default 7 min 0 max 7
154 <engine1(0): option name Use NNUE type check default true
154 <engine1(0): option name EvalFile type string default nn-82215d0fd0df.nnue
154 <engine1(0): option name T1 type spin default 510 min 460 max 560
154 <engine1(0): option name T2 type spin default 223 min 210 max 236
154 <engine1(0): uciok
154 >engine1(0): isready
154 <engine1(0): readyok
Started game 1 of 1 (engine1 vs engine2)
154 >engine1(0): ucinewgame
154 >engine1(0): setoption name Ponder value false
154 >engine1(0): position startpos
154 >engine2(1): ucinewgame
154 >engine2(1): setoption name Ponder value false
154 >engine2(1): position startpos
154 >engine1(0): isready
156 <engine1(0): readyok
156 >engine1(0): go wtime 1000 btime 1000
157 <engine1(0): info string ERROR: NNUE evaluation used, but the network file nn-82215d0fd0df.nnue was not loaded successfully.
157 <engine1(0): info string ERROR: The UCI option EvalFile might need to specify the full path, including the directory/folder name, to the file.
157 <engine1(0): info string ERROR: The default net can be downloaded from: https://tests.stockfishchess.org/api/nn/nn-82215d0fd0df.nnue
157 <engine1(0): info string ERROR: If the UCI option Use NNUE is set to true, network evaluation parameters compatible with the program must be available.
157 <engine1(0): info string ERROR: The engine will be terminated now.
Terminating process of engine engine1(0)
159 >engine2(1): isready
159 <engine2(1): readyok
Elo difference: -inf +/- nan
Finished match
Finished game 1 (engine1 vs engine2): 0-1 {White disconnects}
Score of engine1 vs engine2: 0 - 1 - 0  [0.000] 1
Elo difference: -inf +/- nan
Finished match
159 >engine2(1): quit

@kiudee
Copy link

kiudee commented Aug 23, 2020

Ok, there are a few things you can try.

  1. Add "EvalFile": "nn-9931db908a9b.nnue" or the full absolute path to your simple_tune.json file.
  2. Add "Use NNUE": "false" instead.

edit: I could try to make the library a bit more robust by allowing the user to set the working directory.

@unaiic
Copy link
Contributor Author

unaiic commented Aug 23, 2020

Okay, now it seems to be playing games. I understand that this behaviour (from log.txt) is normal?

2020-08-23 16:47:44,956 INFO Starting iteration 0
2020-08-23 16:47:44,956 INFO Testing {'T1': 474.6329987400783, 'T2': 225.4754269784815}
2020-08-23 16:47:44,956 INFO Start experiment
2020-08-23 16:50:15,376 INFO Experiment finished (150.419198s elapsed).
2020-08-23 16:50:16,006 INFO Got score: -0.4660222762857485 +- 0.26871653930371303
2020-08-23 16:50:16,007 INFO Updating model
2020-08-23 16:50:16,007 INFO GP sampling finished (0.000237s)
2020-08-23 16:50:16,105 INFO Starting iteration 1
2020-08-23 16:50:16,106 INFO Testing {'T1': 499.14523211540876, 'T2': 210.65957941253217}
2020-08-23 16:50:16,107 INFO Start experiment

@kiudee
Copy link

kiudee commented Aug 23, 2020

Yes, that is the expected output.

@unaiic
Copy link
Contributor Author

unaiic commented Aug 23, 2020

@kiudee Okay, thank you so much. I guess there is no need to report any issue then (it only needed the path to the net to be specified).
We'll continue to test things out and see how it goes :)

@unaiic
Copy link
Contributor Author

unaiic commented Aug 23, 2020

@nodchip @gonzalezjo I suppose we should also consider this option and see its results. It's a simple option with an easy setup to work with. IMHO this is a good enough reason to test this out. Of course we also have @nodchip's implementation to work on and adapt it to SF. What are your thoughts on this?

@nodchip
Copy link

nodchip commented Aug 23, 2020

How about to compare three methods, SPSA, Hyperopt and chess-tuning-tools. The point will be computer resources vs elo improvements.

@unaiic
Copy link
Contributor Author

unaiic commented Aug 23, 2020

@nodchip True. Computer resources are limited. The simple tune I started ~45mins ago has just started the 14th iteration (and I changed it to be TC 5'+0.05s), which shows that it's rather difficult to be able to compare them without external help.

@gonzalezjo
Copy link

I would be interested in hearing @kiudee’s thoughts on his approach vs. Hyperopt and vs. SPSA, if he has the time and interest to share.

@kiudee
Copy link

kiudee commented Aug 23, 2020

@gonzalezjo I can write down a few general considerations which hold without having done extensive experiments of all three approaches.

The biggest problem we have in optimizing several parameters at the same time is the curse of dimensionality. It is clear that the amount of points we have to collect blows up exponentially in the number of (effective) parameters (we basically have to build a "wall" for the optimum). The noise resulting from chess games inflates the number of iterations in addition by a constant factor.

Now there are different ways to deal with this problem. With very strong regularity assumptions, you are able to optimize many parameters at the same time, but if those assumptions are violated you might only be able to find a local optimum (e.g. SPSA).
Hyperopt and chess-tuning-tools use asymptotically non-parametric methods. That means that in principle if you run them for long enough and keep exploring, they will model the optimization landscape perfectly and be able to find the true global optimum. Hyperopt uses a tree of parzen estimators as the model which has problems accurately extrapolating the function and the uncertainty to unknown regions. chess-tuning-tools uses the Gaussian process implementation of bayes-skopt, which allows it to accurately quantify the uncertainty in each point. In addition we can use the properties of the Gaussian process to employ advanced optimization algorithms such as max-value entropy search (default in chess-tuning-tools) and predictive variance reduction search.

What are the downsides? Since Hyperopt and chess-tuning-tools use a very flexible model, they eventually also suffer from the curse of dimensionality. If you want to optimize more than say 10 parameters, it will be difficult to model the target function in few iterations (< 2000). For Gaussian processes there are methods which are able to fit a lower dimensional subspace, allowing them to generalize with less number of iterations, but they are not yet implemented.
Another consideration is running time: The Gaussian process implementation is optimized to be accurate with few iterations, but its complexity is cubic in the number of iterations. Thus if you collect over 2000 iterations the overhead per iteration could be several minutes. That’s why I would recommend to increase the number of games per iteration to make each datapoint less noisy and to scale the approach that way.

What would be my recommendation?

  • For large numbers of parameters (>20): I don’t expect any algorithm to work well. I think it might even be hard to beat random search. SPSA could already work adequately.
  • For small numbers of parameters (<7): I would definitely use something like chess-tuning-tools here, since that’s what it’s optimized for. A compromise could also be to start with SPSA to optimize a large number of parameters and then switch to chess-tuning-tools for finetuning.
  • For intermediate numbers of parameters (≥7, ≤20): I really don’t know. More experimentation is needed.

@unaiic
Copy link
Contributor Author

unaiic commented Aug 23, 2020

Could we say then that there are two groups: 1) SPSA and 2) Hyperopt and chess-tuning-tools? If both methods from the second group behaved similarly and if Hyperopt didn't add any irrelevant advantages (@nodchip might have more insights on this), we could try to implement chess-tuning-tools (as it is by far the easiest one) and compare it to SPSA. And taking into account what @kiudee said, we could even try to mix SPSA and chess-tuning-tools for big tunes and see how it goes. IMO we should try to implement it right on fishtest; otherwise we won't have enough resources to test this things out.

@gonzalezjo
Copy link

Thanks a lot for the explanation, kiudee!

@kiudee
Copy link

kiudee commented Aug 23, 2020

Also let me know if you need any specific functionality implemented.

@vondele
Copy link
Member

vondele commented Aug 24, 2020

BTW, let me plug an advertisement for a framework I wrote a while ago, which allows for methods in the nevergrad suite of optimization methods to be used. Right now hardwired to TBPSA

https://github.com/vondele/nevergrad4sf

I can't say it was hugely successful, the resources needed to fine tune parameters in SF is just very large.

@unaiic
Copy link
Contributor Author

unaiic commented Aug 24, 2020

@kiudee @nodchip What are your thoughts on this option (compared to the others)?

@nodchip
Copy link

nodchip commented Aug 24, 2020

I think that the best way is to try each method one by one, and compare the results.

@kiudee
Copy link

kiudee commented Aug 24, 2020

I agree with nodchip. There is no data yet to definitively favor one method over another.

@unaiic
Copy link
Contributor Author

unaiic commented Aug 24, 2020

Okay, then I guess we should implement them into fishtest; otherwise we won't have the required resources to test them.

@joergoster
Copy link
Contributor

Just stumbled over this very interesting discussion accidentally.

Just in case you don't already know, @fsmosca implemented Optuna here https://github.com/fsmosca/Optuna-Game-Parameter-Tuner

However, all these methods have one big disadvantage, they don't deal with the very noisy evaluations if we pass game results.
Although nowhere explicitly stated, to my knowledge, they rely on deterministic results of the evaluated objective function.

One way to deal with this could be to return the averaged values of the x best evaluated points instead of simply returning the single best one. This looks like a very crude way, though ...

Another interesting way I found here https://facebookresearch.github.io/nevergrad/optimizers_ref.html#nevergrad.optimization.optimizerlib.ParametrizedOnePlusOne
Allowing to re-evaluate the best point so far regularly, seems like a very interesting and promising way to me ...

@kiudee
Copy link

kiudee commented Sep 21, 2020

@joergoster both Optuna and chess-tuning-tools explicitly model the noise of the objective function (it was even the main motivation why I forked off bayes-skopt to begin with). Could you clarify what you mean by that?

edit (To give more context):
In chess-tuning-tools the match results are used in a Dirichlet-Pentanomial model to estimate the expected Elo score and also the error of that estimate. That error is given to the model, which in addition will estimate the residual noise using a Gaussian distribution.

@joergoster
Copy link
Contributor

@kiudee A match result can widely vary as we all know. Even when playing hundreds of games per match.
Yet all optimizers I know simply return the best evaluated point which simply didn't really work for me.

Return a solution: either the point evaluated with the largest f (x), or the point with the largest posterior mean.

which you can find on top of page 3 in this paper: https://arxiv.org/abs/1807.02811#

Note 1, I didn't try your chess-tuning-tools by now.
Note 2, I'm only a layman, yet I think I have some experience as I tried many different tuning methods over the last years.

@kiudee
Copy link

kiudee commented Sep 21, 2020

Ah okay, I see what you mean. Historically, Bayesian optimization was used mainly for computer experiments, where it is possible to set the random seeds such that each experiment is deterministic. In such a setting it makes sense to return the point which received the best score so far. I totally agree that this does not make any sense for chess (well, unless you plan to run thousands of games per iteration).

I don’t exactly know what the tuning tool based on Optuna is doing, but chess-tuning-tools is returning the global optimum of the mean Gaussian process. In addition, for the output plots it is also showing the "pessimistic optimum" which takes the current uncertainty into account (see here for more details).

@joergoster
Copy link
Contributor

@kiudee Thank you, very interesting. So chess-tuning-tools is next on my list now. ;-)

@fsmosca
Copy link

fsmosca commented Sep 23, 2020

I don’t exactly know what the tuning tool based on Optuna is doing, but chess-tuning-tools is returning the global optimum of the mean Gaussian process. In addition, for the output plots it is also showing the "pessimistic optimum" which takes the current uncertainty into account (see here for more details).

@kiudee

The default surrogate model or sampler in optuna is TPE. See optimize.

Other samplers including the grid and random.

The TPE sampler has some interesting parameters and methods too. The gaussian prior is enabled by default.

@kiudee
Copy link

kiudee commented Sep 23, 2020

@fsmosca do you know how Optuna selects the final point? Is it the best one tried so far or the optimum of the surrogate model.

@fsmosca
Copy link

fsmosca commented Sep 23, 2020

It looks like the param returned is the one that performed best in the trial.

@kiudee
Copy link

kiudee commented Sep 23, 2020

Ok, then you should be very careful with that. The point which performed best in our setting could just be a very lucky point (especially the fewer games you run per point). Another problem is that the longer you run the algorithm, more points will share the same match result.
Maybe you can find a way to extract the best predicted point from the model.

@joergoster
Copy link
Contributor

As you can see, this already happens with only 200 trials and 160 games per trial, as reported by me here.

@fsmosca
Copy link

fsmosca commented Sep 23, 2020

There are actually 2 methods of determining the best param and best value currently implemented on my tuner.

The setup is test engine vs base engine. At trial 0 the base engine will use the default param and test engine will always take the param from the optimizer.

  1. Whenever the test engine defeats the base engine, the param of optimizer used by test engine will become the best param. This best param will be used by base engine in the next trial and the test engine will take new param from the optimizer. I revised the actual result of the match before sending it to the optimizer as the base engine is changing its param. Few discussions at optuna.
    I put up a way of guarding against a lucky win by optimizer param. If the test engine defeats the base engine by only 0.54 or 54%, I don't update the best param and use it with base engine. What I need is for the test engine to defeat the base engine by a score of 0.6 (this is settable) or more and with that the best param will be updated and will be used by the base engine in the next trial. With this in place one can also set a lower games per trial.

  2. Whenever the test engine defeats the base engine, update the best param, but the base engine will always use the initial or default param and test engine will use new param from optimizer. The param that performs well against the base engine will be the best param.

Looks like method 2 needs a higher games per trial than method 1 as the base engine is constantly using the default param. Method 1 is dynamic, the param that wins might be lucky on certain trial, but it might get corrected in the next trial.

We all know that these optimizations need more tuning games to be more reliable.

@joergoster
Copy link
Contributor

It looks like there is no easy way to re-evaluate points with Optuna. Too bad.

@vondele Tried to get nevergrad4sf to work under Windows and failed. Way too complicated, imho.

@kiudee Trying chess-tuning-tools next.

@vondele
Copy link
Member

vondele commented Sep 26, 2020

@joergoster I assume you refer the to mpi aspect of it? That's unfortunate, but the best I could come up with to allow distributed optimization (like fishtest, but without writing a new fishtest).

The real problem with tuning (at least stockfish level engines) is that one needs 100000s of games, no matter how smart the optimizer is, and these games need to be at the relevant TC. The Elo differences are usually just too small to measure otherwise.

@joergoster
Copy link
Contributor

@vondele I do understand this, yet the average user doesn't do distributed optimization but simply local on one computer.
Maybe you find some time to strip off the mpi stuff? I would also like to define the tuneable parameters instead of relying them to be read from the executable.

@kiudee
Copy link

kiudee commented Sep 26, 2020

@joergoster chess-tuning-tools does not support manual testing of points yet. This is on my to do list though.

@vondele
Copy link
Member

vondele commented Sep 26, 2020

untested, this might work (replace MPIPoolExecutor by ProcessPoolExecutor):

diff --git a/nevergrad4sf.py b/nevergrad4sf.py
index b5e6813..5bbefa7 100644
--- a/nevergrad4sf.py
+++ b/nevergrad4sf.py
@@ -19,9 +19,8 @@ import textwrap
 import nevergrad as ng
 from subprocess import Popen, PIPE
 from cutechess_batches import CutechessExecutorBatch, calc_stats
-from mpi4py import MPI
-from mpi4py.futures import MPIPoolExecutor
 from concurrent.futures import ThreadPoolExecutor
+from concurrent.futures import ProcessPoolExecutor
 
 
 def get_sf_parameters(stockfish_exe):
@@ -76,18 +75,6 @@ def ng4sf(
       games per batch, cutechess concurrency, and evaluation batch concurrency
     """
 
-    # ready to run with mpi
-    size = MPI.COMM_WORLD.Get_size()
-    print()
-    if size > 1:
-        print(
-            "Launched ... with %d mpi ranks (1 master, %d workers)." % (size, size - 1)
-        )
-        print(flush=True)
-    else:
-        sys.stderr.write("ng4sf needs to run under mpi with at least 2 MPI ranks.\n")
-        sys.exit(1)
-
     # print summary
     print("stockfish binary                          : ", stockfish)
     print("stockfish reference binary                : ", stockfishRef)
@@ -128,7 +115,7 @@ def ng4sf(
         rounds=((games_per_batch + 1) // 2 + mpi_subbatches - 1) // mpi_subbatches,
         concurrency=cutechess_concurrency,
         batches=mpi_subbatches,
-        executor=MPIPoolExecutor(),
+        executor=ProcessPoolExecutor(),
     )
     restartFileName = "ng_restart.pkl"
 

@joergoster
Copy link
Contributor

@kiudee Got it up and running!
Reading the configuration from a json file is a very nice solution.

Here is the result of a very quick 1st testrun (5 rounds only).

2020-09-26 17:00:56,535 INFO     Saving a plot to plots\20200926-170032-50.png.
2020-09-26 17:00:56,544 INFO     Testing {'KingAttackWeights[2]': 66, 'KingAttackWeights[3]': 200, 'KingAttackWeights[4]': 14, 'KingAttackWeights[5]': 9}
2020-09-26 17:00:56,545 INFO     Start experiment
2020-09-26 17:01:17,978 INFO     Experiment finished (21.431533s elapsed).
2020-09-26 17:01:18,420 INFO     Got score: 0.7043650362227257 +- 0.7975918092189588
2020-09-26 17:01:18,420 INFO     Updating model
2020-09-26 17:02:26,737 INFO     GP sampling finished (68.315932s)
2020-09-26 17:02:26,742 INFO     Starting iteration 51
2020-09-26 17:02:35,888 INFO     Current optimum:
{'KingAttackWeights[2]': 14, 'KingAttackWeights[3]': 8, 'KingAttackWeights[4]': 0, 'KingAttackWeights[5]': 8}
2020-09-26 17:02:35,889 INFO     Estimated value: -0.914 +- 0.6213
2020-09-26 17:02:35,889 INFO     90.0% confidence interval of the value: (-1.9359, 0.108)
2020-09-26 17:02:36,055 INFO     90.0% confidence intervals of the parameters:
Parameter             Lower bound  Upper bound
----------------------------------------------
KingAttackWeights[2]            2          128
KingAttackWeights[3]            3          189
KingAttackWeights[4]            0          113
KingAttackWeights[5]            0          120

@joergoster
Copy link
Contributor

@kiudee Would it make sense to successively increase the number of games, or will this make all of the calculations so far useless?

@kiudee
Copy link

kiudee commented Sep 27, 2020

@joergoster I tried to answer the question here in the FAQ.

@fsmosca
Copy link

fsmosca commented Oct 3, 2020

I have couple of updates on optuna tuner, now with skopt sampler from scikit with GP model along with different acquisition functions (acq_func) such as LCB, EI, PI and others. There are also explore and exploit factors on selected acq_func that can be interesting to play with.

This is the results so far at depth 6 up to 100 trials. Scroll below to see the summary.

Models such as GBRT (Gradient boosted regression trees) and ET (extra trees regressor) are not yet in the current repo will add it later.

@vondele
Copy link
Member

vondele commented Oct 3, 2020

@fsmosca feel free to test on fishtest optimized parameters, I've found that to be an essential step. In particular, I find that parameters might be improved at the TC (or depth) used in optimization, but fail at higher TC.

@fsmosca
Copy link

fsmosca commented Oct 4, 2020

Thanks will do that once I get a promising param at longer TC. Currently exploring different optimization algorithm.

@fsmosca
Copy link

fsmosca commented Oct 6, 2020

@joergoster

Another interesting way I found here https://facebookresearch.github.io/nevergrad/optimizers_ref.html#nevergrad.optimization.optimizerlib.ParametrizedOnePlusOne
Allowing to re-evaluate the best point so far regularly, seems like a very interesting and promising way to me ...

I tried to implement that with a very interesting tuning. This is just a fixed depth 6 test to see how it behaves.

Command line

lakas.py --engine ./engines/stockfish-modern/stockfish.exe --budget 100 --concurrency 6 --games-per-budget 500 --input-param "{'RazorMargin': {'init':527, 'lower':250, 'upper':650}, 'FutMargin': {'init':227, 'lower':50, 'upper':350}}" --base-time-sec 30 --opening-file ./start_opening/ogpt_chess_startpos.epd --depth 6 --optimizer oneplusone

The param to be optimized

2020-10-06 18:11:57,175 | INFO  | input param: OrderedDict([('FutMargin', {'init': 227, 'lower': 50, 'upper': 350}), ('RazorMargin', {'init': 527, 'lower': 250, 'upper': 650})])

Setup

2020-10-06 18:11:57,175 | INFO  | total budget: 100
2020-10-06 18:11:57,175 | INFO  | games_per_budget: 500
2020-10-06 18:11:57,175 | INFO  | tuning match move control: base_time_sec: 30, inc_time_sec: 0.05, depth=6

Nevergrad optimizer

I set the optimistic as default for noise handling.
”optimistic”: the best optimistic point is reevaluated regularly, optimism in front of uncertainty

2020-10-06 18:11:57,175 | INFO | optimizer: oneplusone, noise_handling=optimistic, mutation=gaussian, crossover=False

Right on top it is already matching same params

2020-10-06 18:11:57,175 | INFO  | budget: 1
2020-10-06 18:11:57,175 | INFO  | recommended param: option.FutMargin=227 option.RazorMargin=527 
2020-10-06 18:11:57,175 | INFO  | default param: option.FutMargin=227 option.RazorMargin=527 
2020-10-06 18:12:31,868 | INFO  | actual result: 0.501 @500 games, minimized result: 0.499, pov: recommended_param

On later budgets

It is determined on targeting the param option.FutMargin=350 option.RazorMargin=250

2020-10-06 19:00:34,937 | INFO  | budget: 88
2020-10-06 19:00:34,937 | INFO  | recommended param: option.FutMargin=350 option.RazorMargin=250 
2020-10-06 19:00:34,937 | INFO  | default param: option.FutMargin=227 option.RazorMargin=527 
2020-10-06 19:01:08,227 | INFO  | actual result: 0.515 @500 games, minimized result: 0.485, pov: recommended_param

2020-10-06 19:01:08,242 | INFO  | budget: 89
2020-10-06 19:01:08,242 | INFO  | recommended param: option.FutMargin=350 option.RazorMargin=250 
2020-10-06 19:01:08,242 | INFO  | default param: option.FutMargin=227 option.RazorMargin=527 
2020-10-06 19:01:41,470 | INFO  | actual result: 0.503 @500 games, minimized result: 0.497, pov: recommended_param

2020-10-06 19:01:41,470 | INFO  | budget: 90
2020-10-06 19:01:41,470 | INFO  | recommended param: option.FutMargin=350 option.RazorMargin=250 
2020-10-06 19:01:41,470 | INFO  | default param: option.FutMargin=227 option.RazorMargin=527 
2020-10-06 19:02:14,804 | INFO  | actual result: 0.487 @500 games, minimized result: 0.513, pov: recommended_param

2020-10-06 19:02:14,820 | INFO  | budget: 91
2020-10-06 19:02:14,820 | INFO  | recommended param: option.FutMargin=350 option.RazorMargin=250 
2020-10-06 19:02:14,820 | INFO  | default param: option.FutMargin=227 option.RazorMargin=527 
2020-10-06 19:02:48,579 | INFO  | actual result: 0.501 @500 games, minimized result: 0.499, pov: recommended_param

2020-10-06 19:02:48,579 | INFO  | budget: 92
2020-10-06 19:02:48,579 | INFO  | recommended param: option.FutMargin=350 option.RazorMargin=250 
2020-10-06 19:02:48,579 | INFO  | default param: option.FutMargin=227 option.RazorMargin=527 
2020-10-06 19:03:22,992 | INFO  | actual result: 0.496 @500 games, minimized result: 0.504, pov: recommended_param

2020-10-06 19:03:22,992 | INFO  | budget: 93
2020-10-06 19:03:22,992 | INFO  | recommended param: option.FutMargin=350 option.RazorMargin=250 
2020-10-06 19:03:23,008 | INFO  | default param: option.FutMargin=227 option.RazorMargin=527 
2020-10-06 19:03:57,556 | INFO  | actual result: 0.509 @500 games, minimized result: 0.491, pov: recommended_param

2020-10-06 19:03:57,556 | INFO  | budget: 94
2020-10-06 19:03:57,556 | INFO  | recommended param: option.FutMargin=350 option.RazorMargin=250 
2020-10-06 19:03:57,556 | INFO  | default param: option.FutMargin=227 option.RazorMargin=527 
2020-10-06 19:04:30,800 | INFO  | actual result: 0.514 @500 games, minimized result: 0.486, pov: recommended_param

2020-10-06 19:04:30,808 | INFO  | budget: 95
2020-10-06 19:04:30,808 | INFO  | recommended param: option.FutMargin=350 option.RazorMargin=250 
2020-10-06 19:04:30,809 | INFO  | default param: option.FutMargin=227 option.RazorMargin=527 
2020-10-06 19:05:05,036 | INFO  | actual result: 0.502 @500 games, minimized result: 0.498, pov: recommended_param

2020-10-06 19:05:05,051 | INFO  | budget: 96
2020-10-06 19:05:05,051 | INFO  | recommended param: option.FutMargin=350 option.RazorMargin=250 
2020-10-06 19:05:05,051 | INFO  | default param: option.FutMargin=227 option.RazorMargin=527 
2020-10-06 19:05:38,169 | INFO  | actual result: 0.498 @500 games, minimized result: 0.502, pov: recommended_param

2020-10-06 19:05:38,184 | INFO  | budget: 97
2020-10-06 19:05:38,184 | INFO  | recommended param: option.FutMargin=350 option.RazorMargin=250 
2020-10-06 19:05:38,184 | INFO  | default param: option.FutMargin=227 option.RazorMargin=527 
2020-10-06 19:06:12,279 | INFO  | actual result: 0.513 @500 games, minimized result: 0.487, pov: recommended_param

2020-10-06 19:06:12,295 | INFO  | budget: 98
2020-10-06 19:06:12,295 | INFO  | recommended param: option.FutMargin=350 option.RazorMargin=250 
2020-10-06 19:06:12,295 | INFO  | default param: option.FutMargin=227 option.RazorMargin=527 
2020-10-06 19:06:46,631 | INFO  | actual result: 0.545 @500 games, minimized result: 0.45499999999999996, pov: recommended_param

2020-10-06 19:06:46,638 | INFO  | budget: 99
2020-10-06 19:06:46,638 | INFO  | recommended param: option.FutMargin=350 option.RazorMargin=250 
2020-10-06 19:06:46,639 | INFO  | default param: option.FutMargin=227 option.RazorMargin=527 
2020-10-06 19:07:21,183 | INFO  | actual result: 0.505 @500 games, minimized result: 0.495, pov: recommended_param

2020-10-06 19:07:21,199 | INFO  | budget: 100
2020-10-06 19:07:21,199 | INFO  | recommended param: option.FutMargin=350 option.RazorMargin=250 
2020-10-06 19:07:21,199 | INFO  | default param: option.FutMargin=227 option.RazorMargin=527 
2020-10-06 19:07:55,097 | INFO  | actual result: 0.516 @500 games, minimized result: 0.484, pov: recommended_param

The best after 100 budget

2020-10-06 19:07:55,113 | INFO | best_param: {'FutMargin': 350.0, 'RazorMargin': 250.0}

Game test

The tuning was done at depth 6, so the game test is also done at depth 6.

NeverGrad's param won.

Score of sf_ng_d6 vs sf_default: 4412 - 4097 - 1491  [0.516] 10000
...      sf_ng_d6 playing White: 2236 - 1999 - 765  [0.524] 5000
...      sf_ng_d6 playing Black: 2176 - 2098 - 726  [0.508] 5000
...      White vs Black: 4334 - 4175 - 1491  [0.508] 10000
Elo difference: 10.9 +/- 6.3, LOS: 100.0 %, DrawRatio: 14.9 %
Finished match

@joergoster
Copy link
Contributor

@fsmosca Very interesting!

@fsmosca
Copy link

fsmosca commented Oct 10, 2020

Lakas repo in github.

@joergoster
Copy link
Contributor

@fsmosca Awesome! So much possibilities to try now ... :-)

@ppigazzini
Copy link
Collaborator

@fsmosca I saw that Lakas requires python 3.8.x, you could add a link to pyenv, see here an example to setup the latest python #778 (comment)

@fsmosca
Copy link

fsmosca commented Oct 12, 2020

@ppigazzini Thanks for the info on pyenv. I don't know about it. But will take a look at it.

I use pycharm as IDE. I also developed on windows 10.

@ppigazzini
Copy link
Collaborator

@fsmosca on windows is easy to install several python versions, on linux it's suggested to use pyenv and virtual environment to keep clean the system python.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement server server side changes worker update code changes requiring a worker update
Projects
None yet
Development

No branches or pull requests

8 participants