-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Certainty propagation negabound [WIP] #487
Certainty propagation negabound [WIP] #487
Conversation
efficient propagation of certainty, two-fold draw scoring, mate display and more. =1 suitable for training =2 for play Currently negabound search depth is one. Improves play in positions with many certain positions (nrear endgame TBs, mates). Sees repetitions faster and scores positions more accurately.
I was thinking about testing for you with egtb, but... |
High level comment: I would make |
Will do. But lets first wait for some "serious" bugs (hopefully none). Then i would propose: |
It's probably due to the slow NPS. go infinite from startpos
vs v19 much faster.
I also tried
and it still only got up to 7000 nps. This is on 2080 ti. In the match I used fp16 for both and the difference was much greater. about 3-5x worse nps it seemed. |
…ersion. Increasing threads (e.g. 4 or 6) will get to masters speed now. Further speed fixes (move generator) possible....
Yes, i found and fixed one speed problem. Get Certainty Parameter was not cached, and reading it via the normal Get was extremely slow (researched the options dictionary every time). That improved random backend performance by 2x-3x. Now with using more threads (try 4-6 and adjust batchsize accordingly) you get to masters speed. But more speed fixes are planned (move generator) to get the same speed with less threads, too. |
…with lto, this yields a speed up by 30-50% in backend=random. In order to fully use CP please use 4 threads+. Changed default temporarily to 4 threads with this commit, to collect more scaling data.
Does this do single legal move extension, like your fork? |
@gonzalezjo Yes, but slightly differently. It extends single legal moves, but only for terminal detection. The experimental branch also extended before fetching the NN eval. I found both roughly equal in testing with the former having the advantage to better fit the new negabound search paradigm and more elegant in implementation (less edge cases to worry about). |
…ds instant play of certain winning moves and avoidance of loosing moves regardless of visits. CP=3 now adds advanced pruning.
src/chess/uciloop.cc
Outdated
if (info.hashfull >= 0) res += " hashfull " + std::to_string(info.hashfull); | ||
if (info.nps >= 0) res += " nps " + std::to_string(info.nps); | ||
if (info.tb_hits >= 0) res += " tbhits " + std::to_string(info.tb_hits); | ||
if (info.certain >= 0) res += " certain " + std::to_string(info.certain); | ||
if (info.bounds >= 0) res += " bounds " + std::to_string(info.bounds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might be breaking UCI protocol by adding non-standard keywords?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its common practice to add keywords, they should not really matter. But its more that i looked what other engines are doing - some are also outputting bounds and other info like this. Does it break a specific gui? The uci specification is rather vague on this. It works in arena, cutechess, chessbase... What gui gives you problems?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not aware of a specific GUI, but I remember "winrate" being removed from lczero for this reason: glinscott/leela-chess#339.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I've not seen any custom keywords in uci protocols. I only saw them after string
(like we do with verbose move stats). I think it's better to remove them and output to the log only.
Also files in chess/
subdirectory are attempted to be engine-independent, and adding custom MCTS fields there couples components.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will remove these counters.
2 threads each
cp gets 4 threads
|
One thing I noticed in Josh's stream is that whenever it does the "counting 2-fold as a draw" thing, it spikes the eval graph down to zero even though the other engine is obviously not going to 3-fold when it thinks it is winning. Maybe it should instead report the best non 2-fold score to UCI in that case. |
Yes, in that sense it behaves like most traditional engines. But actually to implement something like you propose is possible. Is that sound and works in all edge cases? Is that what SF does? |
Note: currently please disable smart pruning. Time management logic might need some adjustment to CP |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First portion of comments. Didn't look into the main part of the change yet.
src/chess/board.cc
Outdated
@@ -186,6 +186,7 @@ BitBoard ChessBoard::en_passant() const { return pawns_ - pawns(); } | |||
|
|||
MoveList ChessBoard::GeneratePseudolegalMoves() const { | |||
MoveList result; | |||
result.reserve(60); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment why it is needed and why 60 is picked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It slighty speeded up move generation in profiling. Number is a guestimate - 20 was to little, and 100 seemingly to much. This is not based on rigerous testing.
src/chess/callbacks.h
Outdated
@@ -71,6 +71,9 @@ struct ThinkingInfo { | |||
optional<int> score; | |||
// Number of successful TB probes (not the same as playouts ending in TB hit). | |||
int tb_hits = -1; | |||
// Number of nodes made certain and nodes bounded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Period in the end of the sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the counters.
src/chess/uciloop.cc
Outdated
@@ -57,6 +57,7 @@ const std::unordered_map<std::string, std::unordered_set<std::string>> | |||
{{"stop"}, {}}, | |||
{{"ponderhit"}, {}}, | |||
{{"quit"}, {}}, | |||
{{"stats"}, {}}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stats is a non-standard UCI command. Instead of changing UCI interface/parser for every possible custom command, it's better to add:
virtual bool CustomCmd(const std::string& cmd, const std::string& params) { return false; }
into class UciLoop
and make it return true if cmd
is something handled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed that command. I had a talk with borg about this some time ago, and the conclusion was that we might implement a "go nodes 0" command that does the same thing. As to not burden this rather large PR with even more details, I removed the stats command and all references.
src/chess/uciloop.cc
Outdated
if (info.score) res += " score cp " + std::to_string(*info.score); | ||
// Mate display. If abs(score) > 20000 its a certain win (or loss). | ||
// Length to mate (or tbwin) is then (+ or -) abs(score)-20000. | ||
if (info.score) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not nice at all to reuse info.score
for checkmate display. Add info.mate
for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I know. Will fix. Will do this last, when everything else is cleaned up. We would still need an indicator if info.score or info.mate is valid when processing the info structure. Ideas?
src/chess/uciloop.cc
Outdated
if (info.hashfull >= 0) res += " hashfull " + std::to_string(info.hashfull); | ||
if (info.nps >= 0) res += " nps " + std::to_string(info.nps); | ||
if (info.tb_hits >= 0) res += " tbhits " + std::to_string(info.tb_hits); | ||
if (info.certain >= 0) res += " certain " + std::to_string(info.certain); | ||
if (info.bounds >= 0) res += " bounds " + std::to_string(info.bounds); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I've not seen any custom keywords in uci protocols. I only saw them after string
(like we do with verbose move stats). I think it's better to remove them and output to the log only.
Also files in chess/
subdirectory are attempted to be engine-independent, and adding custom MCTS fields there couples components.
return; | ||
} | ||
|
||
if ((history_.Last().GetRepetitions() >= 1) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Twofold repetition is not a draw.
Twofold repetition is useful and people generally like it, we should not remove it.
If it really adds some (>5) Elo, we may consider making it a flag though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know about this. The whole point about CP is to let leela find repetitions earlier. Under a "best play" assumption this is as zero as you can get to address this problem.
But to clarify: this change does not terminate the game if a two-fold arises, it just scores it as if its a draw. Its not making two-folds terminals, just certain. This means that leela will not waste and dilute visits to otherwise identical positionss
It is responsible for some of the +10 gain of elo, but i really don't know how many. But this is the one thing, that should compound really well in training. I have numerous real world examples from CCCC and TCEC where this finds the correct path to a win, while normal leela stumbles into a perpetual. Sometimes it sees the draw 16 ply earlier (two-fold plus CP) then normal leela. If we assume an effective branching factor of two in chess, we might just need 2^-16 of the training games we otherwise would need for leela to understand these positions. These are all guestimates and no gurantees can be given of course. But it seems like a good try.
The only downside is that you get 0.00 evals if leela sees it can repeat. If the opponent then deviates it will return to the normal eval. Most other engines do this too. But, if you turn CP off, you get normal 3-fold behavior. So I would hate to waste another option on this.
src/mcts/node.h
Outdated
void MakeTerminal(GameResult result); | ||
void MakeCertain(GameResult result, CertaintyTrigger trigger); | ||
void MakeCertain(float q, CertaintyTrigger trigger); | ||
void SetEQ(float eq); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is EQ? Add comments to all public functions.
src/mcts/node.h
Outdated
@@ -225,7 +270,8 @@ class Node { | |||
EdgeList edges_; | |||
|
|||
// 8 byte fields. | |||
// Pointer to a parent node. nullptr for the root. | |||
// Pointer to a parent node. nullptr for the root of tree, | |||
// root of tree != search->root_node_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a false statement. root of tree == search->root_node_
in startpos.
Better something like: "Note that root of the tree is game's startpos and not root of the search."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was meant to say that equality does not hold always, which the previous comment implied. But, I see your point and my comment is also ambigious. I wil clarify.
src/mcts/node.h
Outdated
// N-related getters, from Node (if exists). | ||
uint32_t GetN() const { return node_ ? node_->GetN() : 0; } | ||
int GetNStarted() const { return node_ ? node_->GetNStarted() : 0; } | ||
uint32_t GetNInFlight() const { return node_ ? node_->GetNInFlight() : 0; } | ||
|
||
// Whether the node is known to be terminal. | ||
bool IsTerminal() const { return node_ ? node_->IsTerminal() : false; } | ||
//bool IsTerminal() const { return node_ ? node_->IsTerminal() : false; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented line (uncomment or delete).
src/mcts/params.cc
Outdated
const OptionId SearchParams::kCertaintyPropagationId{ | ||
"certainty-propagation", "CertaintyPropagation", | ||
"Enables level of certainty propagation: " | ||
"Setting 1 is fully compatible with training " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make it a ChoiceOption instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second part, still most of search.cc is not reviewed.
src/mcts/node.cc
Outdated
for (const auto& child : Edges()) visits += child.GetN(); | ||
n_ = visits; | ||
} | ||
n_in_flight_ = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's better to make it assert, as it's expected that it is 0.
float Node::GetVisitedPolicy() const { return visited_policy_; } | ||
|
||
float Node::GetQ() const { | ||
if (parent_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like relying much on a parent, as it was a temporary optimization only.
Is Q conceptually edge's or node's property?
If edge's: Why do we need to check "if edge is certain" in a node rather than edge?
If node's: Why do we store that it's certain in an edge rather than a node?
Node should not go to parent to get some information about itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented it as to be as minimal invasive to the current codebase. You are correct that overall it would be "nicer" to move from a node centric style to a edge.node or even edge centric style. I left the parts of the code as is, if they checked for terminal/certain status by node. I would propose changing this only after CP is tested and with the complete codebase and future changes in mind.
So while I feel that there is currently a shift towards a more edge centric view with your new detachment scheme which might store N in the edge as well, and also even the original A0 seems rather edge centric), I would still make that a seperate PR sometime later.
src/mcts/node.cc
Outdated
Edge* Node::GetEdgeToNode(const Node* node) const { | ||
assert(node->parent_ == this); | ||
assert(node->index_ < edges_.size()); | ||
return &edges_[node->index_]; | ||
} | ||
|
||
Edge* Node::GetOwnEdge() const { return GetParent()->GetEdgeToNode(this); } | ||
Edge* Node::GetOwnEdge() const { | ||
if (GetParent()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curly braces around the bodies of conditional statement, as it's not one-line statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do.
src/mcts/node.h
Outdated
void MakeCertain(GameResult result, CertaintyTrigger trigger); | ||
void MakeCertain(float q, CertaintyTrigger trigger); | ||
void SetEQ(float eq); | ||
void ClearEdge() { certainty_state_ = 0; }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Poor name. It only clears certainty, not entire edge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any ideas for a name. Maybe ClearEdgeCertaintyStatus ?
src/mcts/node.cc
Outdated
@@ -358,6 +426,8 @@ void NodeTree::MakeMove(Move move) { | |||
current_head_->ReleaseChildrenExceptOne(new_head); | |||
current_head_ = | |||
new_head ? new_head : current_head_->CreateSingleChildNode(move); | |||
if (current_head_->GetParent()) current_head_->GetOwnEdge()->ClearEdge(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need that? Why would something need to look somewhere above search head?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sometimes we can prove bounds for root. Lets say root is lowerbounded -> we know that we have at least a draw, but might have something better. In these cases we do not need to search moves from root that are upperbounded (at maximum a draw). As we store all information about bounds/certainty in the edge leading to the node, I fetch it there. The only time you do not have that info is in the startposition of chess and if you provide a fen. But in gameplay you usually do. Also if we proved that a later move has bounds, and we finally arrive at that position it would be a pity to loose that info. ClearEdge clears everything except for the bounds (another good reason to rename ClearEdge).
src/selfplay/loop.cc
Outdated
float perct = -1, elo = 99999; | ||
float los = 99999; | ||
if ((winp1 + loosep1 + draws) > 0) | ||
perct = (((float)draws) / 2 + winp1) / (winp1 + loosep1 + draws); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
braces around body of conditional operator if it's on a separate line.
use static_cast<float>()
instead of C-style casts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do.
src/selfplay/loop.cc
Outdated
<< "%"; | ||
res += " Win: " + oss.str(); | ||
} | ||
if (elo < 99998) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are those magic constants? I don't understand what's going on here, but looks like a hack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its just for the case where no elo can be computed (its preinitialized to 99999). There are a number of cases where that can happen (all wins, all losses, no games etc.). One could of course check for all these cases, but that would also look bloated. The number is arbitrary.
src/syzygy/syzygy.cc
Outdated
@@ -1618,7 +1618,7 @@ int SyzygyTablebase::probe_dtz(const Position& pos, ProbeState* result) { | |||
// Use the DTZ tables to rank root moves. | |||
// | |||
// A return value false indicates that not all probes were successful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Outdated comment.
But actually, return value was intended to return error and now it just is let go?..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is strange. My file has the correct comment:
// A return value 0 indicates that not all probes were successful.
// Otherwise best rank is returned, with rank 1 = draw.```
src/mcts/search.cc
Outdated
if (params_.GetCertaintyPropagation() > 0) { | ||
if (edge.IsCertain() && edge.GetEQ() != 0.0f) | ||
uci_info.score = | ||
edge.GetEQ() * (20000 + ((uci_info.pv.size() + 1) / 2) + 1 + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I commented otherwise, add proper mate reporting in uci_info, without need for 20000s.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In his defense, in cases where the mate is provable, but not all variations have been checked (e.g., tablebase wins) it is normal chess engine behavior to output a score of 100 or 300 + static eval, or an otherwise similarly inflated eval. On the other hand, if all variations have been checked, I agree that a mate score should be properly reported via uci_info. The bigger issue here is the description of reporting M500 if a Syzygy win is found, instead of resorting to this behavior, which can lead to improper tournament adjudications.
edge.GetEQ() * (20000 + ((uci_info.pv.size() + 1) / 2) + 1 + | ||
(edge.IsPropagatedTBHit() ? 1000 : 0)); | ||
else if (root_syzygy_rank_) { | ||
int sign = (root_syzygy_rank_ - 1 > 0) - (root_syzygy_rank_ - 1 < 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May worth to make sign a separate function. Either in utils/misc.h
or just in this file above in an anonymous namespace.
- exposed depth parameter (0 is no-look-ahead) - only two modes CP=1 for training and CP=2 for play Todo: - change option from int to choiceoption - use info.mate to communicate mate scores
I wrote about this on another issue, but it seems to fit best with CP. Mate distance pruning: If PV ends in mate in x ply, no expansion should happen beyond x ply because all scores will be worse than mate in x. Not sure how this interacts with MCTS, whether the mates need to be proven or not to be worthwhile. Hopefully not but who knows.... anyway thought I would mention it here as this seemed the most appropriate place for it. |
That is among the things CP does. As soon as a win (mate or tb win) is found, its backpropagated one ply up as a loss for the other side, then the whole branch will no longer be visited beyond the backpropagated certain node. |
@Videodr0me what @jhellis3 is suggesting is slightly different. His is that once the root is proven mate, a search for ways to decrease mate distance should be started that never goes to a higher depth than the lowest depth found mate. |
With mate distance pruning you don't visit any nodes (regardless of branch), beyond x ply when you have a mate in x. Whether the mate needs to be proven, or if it is sufficient for it to just be the PV, I do not know. But certainly if the PV results in a mate for either side one should expect the opponent to attempt to deviate at an earlier point. |
I think you do require the mate to be proven. Since we use heuristic functions, there are plenty of times you need to visit an alternative line beyond the depth of a mate to accurately evaluate it. |
@Videodr0me Idea wrt 8. Would it make sense to have a check focused lookahead that tries to find perpets? |
The lookahead does this, as soon as a side can force a perpet (if in search horizon) it has proven bounds of being at least a draw, This propagates in the tree, and is used to correct scores if for example a different branch is sampled where the others sides NN eval is positive. In this case (as we proved we have at least a draw) that score is corrected to 0.00 at the point in the tree where the draw can be forced. Already with 2 move draw scoring and a one-ply-lookahead (the default) it leads to CP finding draws much much earlier, often 16 ply earlier than master. It only translates to +10 elo (newer testing suggests even better results, but lets not get over optimistic) because its rare that this makes a different in the game result, but it sometimes does. Especially in many of the positions where leela threw away a full point or could not hold a draw (yes it sees its own perpetuals much better too, and often saves otherwise lost games) in CCCC or TCEC it would have made a difference. So if you bump up certainty-propagation-depth to higher values, it will find more perpets (and mates) in that horizon and that will most likely lead to more proven bounds. If the move-generator gets optimized and we have a hash table in place i would like to bump depth up to 3, which should be sufficient. Currently that taxes the CPU to much. |
I am not quite sure i fully understand what you mean. Leelas PV is not like a/bs PVs, it is just a sequence of the "most likely" moves. If it ends in a mate, that does not mean, that the mate is forced. Even with CP a PV could end in a mate which is "unforced". CP of course would know that it is no proven mate, and not display a mate score. UCT averages all Qs, so even if the most-likely sequence of moves ends in a terminal mate, that does not imply that the score of that branch is winning. Of course this is only a transient possibility as soon UCT will explore moves avoiding the unforced mate more often, and the sequence of most likely moves will change. Does this somehow address what you meant? |
Not exactly, the original observation is, as previously mentioned, that in the case you do have a proven mate (it is my understanding here this is one of the features of CP: displays mate scores if proven mate (length is average length of mate when first proven)), then certainly there is no point in expanding any node beyond X ply. In alpha/beta, for example, you just instantly return (typically alpha) when the search ply is > X and thus do not search anything beyond that point. Basically, all this is doing is making the search for shorter mates much more efficient once a certain mate has been found. This is just free search efficiency gains with proven mates. The other part is that for MCTS, the uncertain state of mate may not matter in determining the most efficient use of nodes. The score of the mate in the PV doesn't really matter for this observation, just that it is mate and is the PV. If the PV is followed, certainly the game is terminal when the mate appears on the board. Thus, if the mate is to not happen, the PV must change and that divergence must come sometime before the mate is played. So since you know that to be the case, it may make the most sense to spend prioritize node expansion on nodes before that ply. I do not know if this part will work (result in a strength increase), but it seems worth looking into to me... ymmv. |
Ok now i understand what you originally meant. You are talking about finding shorter and shorter mates once a mate is found. The current version of CP does not search past the mate, in fact if a proven mate is found in timed (or fixed node play) the move is played immediately. In infinite mode (multi-pv) i opted to no longer search a poven mating move and rather search the other moves. Basically resting on the assumption that the user in multi-pv would rather like to know if the other moves are winning as well, instead of finding a "shorter" win for the main move. In non-multi-pv current behaviour does not find shorter mates but instead would go for mates that are more in line with prior policy. These might even be longer but at least to the NN are more "natural". I have not actually encountered this in practice though. With that being said, one could easily modify CP to find shorter and shorter mates if one really needs that, but it would entail storing depth-to-mate information in each node. Either by somehow overloading Q or a seperate depth entry. As we generally did not want to blow up the memory footprint of the node to much, i avoided that. But it could be implemented easisly in a future version.
Again, i might misunderstand, but this is what CP does, it propagates the certainty of a branch upward in the tree, so already proven sidelines are no longer sampled. |
I think you should be able to just use your already existing flags for CP (since you prefer a proven mate over other lines)... and the shorter mates only will be a natural consequence, as any longer mates would not receive any more visits. Something like: or for extendNode: && !(certainMate && currentNodePly > PvPly) |
The problem is that there are 2 different plys to consider. the ply of the current move, and the ply of the longest mating line. The first we have, the 2nd we don't. |
It is just the length of the pv. I know this used to be reported for depth, not sure if it is still there. |
Currently if a forced mate is found in a branch, that branch is a proven win and that branch is no longer searched at all. So the additonal condition currentNodePly > PvPly is not necessary. This makes sense deeper in the tree as you want to allocate visits efficiently on other sub-branches and not waste visits on finding shorter mates in a sub-branch that turns out to be irrelevant. If you already propagated the mate to root, then you could think about schemes to continue searching for shorter mates. Then your proposal makes sense, and it was probably meant as such in the first place. Just one thing to consider is that the PV at the time the mate was proven is neither guranteed to be the longest nor is it the shortest, its just the PV that got the most visits leading to mate at the moment the last sub-branch got proven. So it could be seen as a policy weighted average distance to mate, but even that is technically not entirely correct. So while in practice taking it as a upper bound will probably find shorter mates more quickly then just searching without a ply limit. That limit might sometimes be to low, to gurantee finding a shorter mate (for example if it is lower than the actual proven mate and the shorter mate is inbetween the real length of the proven mate and this bound). But as a heuristic, i agree that in a future refinement this might be a easy way to find shorter mates. One could argue - and i am not really of any definite opinion on this - that a "natural" mate (in terms of the NN policy) is probably easier to understand for humans then an "artificial" shorter mate with a lot of double !! moves. |
Ok, but It is not just for the branch, but for the entire tree. |
Hmm, thats dangerous. Again, maybe i am still misunderstanding, but there are two cases. (2) The certain win propagated to root, now we know there is a winning move and the opponent can't avoid it. Now you have several alternarives: You can stop searching and just play the win or you can continue searching with in order to maybe find shorter mates. If you want to continue searching in scenario 2, there is a problem in that a ply limit derived from length of pv is not exact nor is it an upper or lower bound. It is just the path of most visits at the exact moment when the last sub-branch of the mate was proven. As a heuristic this might work in many cases, but it might also miss shorter mates. |
You can find the implementation Stockfish uses at line 610 of search.cpp here: https://github.com/official-stockfish/Stockfish/blob/master/src/search.cpp Perhaps that will be of more use than my explanatory skills, which are clearly lacking. |
@Videodr0me Are you still around? |
I am still here, just had no time recently. Will try to slice it up into several commits to finally get it into master, but time is sparse currently. |
…r certain losses over terminal losses.
In #788 @MelleKoning linked a CP game where leela does not go for mate in one. I do believe there might be a bug in pr487 as it goes for non-mating move e7e8q while PR700 plays the mating move d8h8 immediately. Let's compare pr700 with cp on and Off vs pr487 using net 32991 with settings: To get to the position of 77th move use command:
To get to the 78th move, append moves 3 most visited moves for go nodes 800 on 77th move:
pr700 cp on:
pr487 cp on:
So here leelas with no CP and with basic CP both find the mating move without any trouble since the move has highest prior. CP proved the mate after the first batch. Let's do the same for the position in 78th move.
pr700 cp on:
pr487 cp on:
Again same story, cp off and basic cp on both go for the mating move e8h8. PR487 is certain of the mate but the mate it sees is not the shortest and happily plays a non-mating move. I believe the same keeps going on for the remaining of the game. |
Just to verify the bug, I set a 2 game match CP vs CP_full using the first 76 moves of above game as a book. The result was that CP as white mates in first move while CP_full drags the game up to 248th move showing mate score for the all the moves from 77th to 248th. So CP sees the mate in all positions but does not follow the mating sequence. |
Thanks for testing! Looks like basic CP in pr700 does the job while look ahead still has its issues. |
Certainty Propagation (finalized).
Should improve play (and visit distribution in training), in particular in positions transitioning into TBs, mates, forced repetitions etc. The parameter --certainty-propagation (also as UCI) can be set to true or false (CP).
CP=false:
CP=true
Parameter "--certainty-propagation-depth" (CPD) also as UCI:
Tested in gauntlets, self-play, with and without TB - yields consistently approx. >+10 Elo. The main effect is expected from training as it should through the feedback loop (fingers crossed) correct many of leelas endgame misevals, This effect should compound in training and reduce the number of required training games to handle these endings by a large factor.
Test result of final version:
This is with the new scaling of c in CPUCT as introduced by DM in their published paper. It seems to fit well with CP. Without the new scaling the test of the most recent version was at +20.