Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use Q=0 for self-play following AZ paper behavior while keeping FPU reduction tuning #350

Closed
wants to merge 1 commit into from

Conversation

Mardak
Copy link
Contributor

@Mardak Mardak commented Sep 11, 2018

Fix #344. As documented in the issue, #344 (comment) neither lc0 or lczero ever used Q=0 diverging from the learning behavior of AZ.

Looking even further back in lczero history into leela-zero, Q/FPU was originally set to 1.1 (win rate on a scale of [0, 1]) based on gcp's experience with leela, so it seems that never in leela-related history has Q=0 been used for generating self-play for training.

https://github.com/gcp/leela-zero/blob/2f7463d2cfba1b4617b3bd73bbdf3e1f52382429/UCTNode.cpp#L291

@killerducky
Copy link
Contributor

I think what we do now is an improvement over the paper, and there are other things I'd prefer to test before this.

@Mardak
Copy link
Contributor Author

Mardak commented Sep 11, 2018

For even more context, killerducky asked DeepMind to clarify FPU:
http://computer-go.org/pipermail/computer-go/2017-December/010550.html

Aja Huang responded:
http://computer-go.org/pipermail/computer-go/2017-December/010567.html

All I can say is that first-play-urgency is not a significant technical detail, and what's why we didn't specify it in the paper.

One could infer that AGZ/AZ did not try different values for FPU/reduction, and just used Q=0 as documented in the AGZ paper to get good results for AZ chess. Additionally assuming DeepMind used Q=0 for match games, potentially this search inefficiency was overcome by just having significantly higher visits than the 800 used for self-play.

As I noted in the code comments, searching wider for losing positions and deeper for winning positions allows for two different learning targets for future networks trained on those self-play games. The current behavior of using parentQ seems to be optimized for match strength leading to just one learning target at the cost of the inability to learn some types of (tactical) moves.

@Tilps
Copy link
Contributor

Tilps commented Sep 21, 2018

Out of curiously I ran my 800 -> 8000 visit transition analysis with FPU of 0. The most obvious difference to my previous analysis runs was that moves that get 0 visits at 800, were much more likely to get more visits at 8000 compared to previous analysis. With our current FPU moves that get 0 visits typically get 1 visit after 8000 (and also after about 4000, suggesting that one visit is possibly excessive). With FPU of 0, moves that get 0 visits at 800 get more than 3 visits on average at 8000, which is similar to how much 1 visit nodes get, suggesting that a lot of places being given 0 are not being given a fare go.

@Mardak
Copy link
Contributor Author

Mardak commented Sep 21, 2018

Just to be clear for your analysis, did you do FPU 0 for all nodes or only root nodes? Non-root FPU favoring wider search ends up skewing accurate NN+search average action value towards losing.

@Tilps
Copy link
Contributor

Tilps commented Sep 21, 2018

hmm looks like I may have indeed accidentally applied it to all nodes - will retry!

@Tilps
Copy link
Contributor

Tilps commented Sep 22, 2018

Tested again with 0 only for root node - same outcome of 0 mapping to more than 3 visits at 8000, maybe even worse that with using 0 all the time although my new dataset isn't especially big yet, so the difference is quite possibly in the noise.

@Mardak
Copy link
Contributor Author

Mardak commented Dec 7, 2018

DeepMind initialized to loss instead of draw:

http://talkchess.com/forum3/viewtopic.php?f=2&t=69175&start=70#p781765

@Mardak Mardak closed this Dec 7, 2018
@Mardak Mardak deleted the noise-q0 branch March 20, 2019 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants