Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certainty propagation negabound [WIP] #487

Conversation

Videodr0me
Copy link
Collaborator

@Videodr0me Videodr0me commented Nov 3, 2018

Certainty Propagation (finalized).

Should improve play (and visit distribution in training), in particular in positions transitioning into TBs, mates, forced repetitions etc. The parameter --certainty-propagation (also as UCI) can be set to true or false (CP).

CP=false:

  • disabled

CP=true

  • two-fold draw scoring
  • propagates certainty/bounds (from terminals, TBs, two-folds etc.) up the tree.
  • displays mate scores if proven mate (length is average length of mate when first proven)
  • tracks if TB hit was involved in proving the win, if so displays mate in 1000 + distance to tb hit.
  • in root filtered syzygy territory, display mate in 500 instead of leelas eval.
  • Correcting out-of-proven-bounds children
  • play certain wins at root immediatly and avoid certain losses regardless of visits.

Parameter "--certainty-propagation-depth" (CPD) also as UCI:

  • CPD is the depth of look-ahead-search (range 0..4, defaults to 1)
  • e.g. CPD=0 is no look-ahead-search, and CPD=1 is one-ply lookahead.
  • On fast GPUs (> 1080ti) threads need to be increased from 2 to 4. Generally 2 threads per GPU + 1 should suffice.

Tested in gauntlets, self-play, with and without TB - yields consistently approx. >+10 Elo. The main effect is expected from training as it should through the feedback loop (fingers crossed) correct many of leelas endgame misevals, This effect should compound in training and reduce the number of required training games to handle these endings by a large factor.

Test result of final version:

Rank Name Elo +/- Games Score Draws
1 lc0CP 81 23 294 61.4% 63.6%
2 lc0 56 24 294 58.0% 62.9%
3 stockfish_18082021_x64 31 26 294 54.4% 56.5%
4 xiphos-0.4-w64-nopopcnt -180 30 294 26.2% 43.5%

This is with the new scaling of c in CPUCT as introduced by DM in their published paper. It seems to fit well with CP. Without the new scaling the test of the most recent version was at +20.

efficient propagation of certainty, two-fold draw scoring, mate display and more.
=1 suitable for training
=2 for play
Currently negabound search depth is one.
Improves play in positions with many certain positions (nrear endgame TBs, mates).
Sees repetitions faster and scores positions more accurately.
@jjoshua2
Copy link
Contributor

jjoshua2 commented Nov 3, 2018

I was thinking about testing for you with egtb, but...
AppVeyor was unable to build non-mergeable pull request

@borg323
Copy link
Member

borg323 commented Nov 4, 2018

High level comment: I would make --certainty-propagation a ChoiceOption with descriptive names.

@Videodr0me
Copy link
Collaborator Author

High level comment: I would make --certainty-propagation a ChoiceOption with descriptive names.

Will do. But lets first wait for some "serious" bugs (hopefully none). Then i would propose:
--certainty-propagation=training
--certainty-propagation=play
and maybe (if needed):
--certainty-propagation=experimental

@jjoshua2
Copy link
Contributor

jjoshua2 commented Nov 4, 2018

Score of lc0certainty vs lc0v19 11248: 0 - 5 - 11 [0.344]
Elo difference: -112.33 +/- 89.85

It's probably due to the slow NPS. go infinite from startpos

info depth 9 seldepth 28 time 18892 nodes 92237 score cp 25 hashfull 300 nps 4882

vs v19 much faster.

info depth 11 seldepth 29 time 16817 nodes 253991 score cp 22 hashfull 703 nps 15103 tbhits 0 pv e2e4 c7c5 b1c3 d7d6 g1f3 g8f6 d2d4 c5d4 f3d4 g7g6 f2f4 b8c6 d4c6 b7c6 e4e5 f6d7 e5d6 e7d6 d1d4 d7f6 c1e3
info depth 11 seldepth 29 time 21831 nodes 349433 score cp 22 hashfull 915 nps 16006

I also tried

setoption name CertaintyPropagation value 0
go infinite

and it still only got up to 7000 nps. This is on 2080 ti. In the match I used fp16 for both and the difference was much greater. about 3-5x worse nps it seemed.

…ersion. Increasing threads (e.g. 4 or 6) will get to masters speed now. Further speed fixes (move generator) possible....
@Videodr0me
Copy link
Collaborator Author

Videodr0me commented Nov 4, 2018

Yes, i found and fixed one speed problem. Get Certainty Parameter was not cached, and reading it via the normal Get was extremely slow (researched the options dictionary every time). That improved random backend performance by 2x-3x. Now with using more threads (try 4-6 and adjust batchsize accordingly) you get to masters speed. But more speed fixes are planned (move generator) to get the same speed with less threads, too.

…with lto, this yields a speed up by 30-50% in backend=random. In order to fully use CP please use 4 threads+. Changed default temporarily to 4 threads with this commit, to collect more scaling data.
@gonzalezjo
Copy link

Does this do single legal move extension, like your fork?

@Videodr0me
Copy link
Collaborator Author

@gonzalezjo Yes, but slightly differently. It extends single legal moves, but only for terminal detection. The experimental branch also extended before fetching the NN eval. I found both roughly equal in testing with the former having the advantage to better fit the new negabound search paradigm and more elegant in implementation (less edge cases to worry about).

…ds instant play of certain winning moves and avoidance of loosing moves regardless of visits. CP=3 now adds advanced pruning.
if (info.hashfull >= 0) res += " hashfull " + std::to_string(info.hashfull);
if (info.nps >= 0) res += " nps " + std::to_string(info.nps);
if (info.tb_hits >= 0) res += " tbhits " + std::to_string(info.tb_hits);
if (info.certain >= 0) res += " certain " + std::to_string(info.certain);
if (info.bounds >= 0) res += " bounds " + std::to_string(info.bounds);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be breaking UCI protocol by adding non-standard keywords?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its common practice to add keywords, they should not really matter. But its more that i looked what other engines are doing - some are also outputting bounds and other info like this. Does it break a specific gui? The uci specification is rather vague on this. It works in arena, cutechess, chessbase... What gui gives you problems?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not aware of a specific GUI, but I remember "winrate" being removed from lczero for this reason: glinscott/leela-chess#339.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I've not seen any custom keywords in uci protocols. I only saw them after string (like we do with verbose move stats). I think it's better to remove them and output to the log only.

Also files in chess/ subdirectory are attempted to be engine-independent, and adding custom MCTS fields there couples components.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove these counters.

@jjoshua2
Copy link
Contributor

jjoshua2 commented Nov 8, 2018

2 threads each

Score of lc0certainty vs lc0v19 11248: 2 - 8 - 26 [0.417]
Elo difference: -58.45 +/- 58.89

cp gets 4 threads

Score of lc0certainty vs lc0v19 11248: 8 - 6 - 35 [0.520]
Elo difference: 14.19 +/- 52.25

@cwbriscoe
Copy link
Contributor

One thing I noticed in Josh's stream is that whenever it does the "counting 2-fold as a draw" thing, it spikes the eval graph down to zero even though the other engine is obviously not going to 3-fold when it thinks it is winning. Maybe it should instead report the best non 2-fold score to UCI in that case.

@Videodr0me
Copy link
Collaborator Author

One thing I noticed in Josh's stream is that whenever it does the "counting 2-fold as a draw" thing, it spikes the eval graph down to zero even though the other engine is obviously not going to 3-fold when it thinks it is winning. Maybe it should instead report the best non 2-fold score to UCI in that case.

Yes, in that sense it behaves like most traditional engines. But actually to implement something like you propose is possible. Is that sound and works in all edge cases? Is that what SF does?

@Videodr0me
Copy link
Collaborator Author

Note: currently please disable smart pruning. Time management logic might need some adjustment to CP

Copy link
Member

@mooskagh mooskagh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First portion of comments. Didn't look into the main part of the change yet.

@@ -186,6 +186,7 @@ BitBoard ChessBoard::en_passant() const { return pawns_ - pawns(); }

MoveList ChessBoard::GeneratePseudolegalMoves() const {
MoveList result;
result.reserve(60);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment why it is needed and why 60 is picked.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It slighty speeded up move generation in profiling. Number is a guestimate - 20 was to little, and 100 seemingly to much. This is not based on rigerous testing.

@@ -71,6 +71,9 @@ struct ThinkingInfo {
optional<int> score;
// Number of successful TB probes (not the same as playouts ending in TB hit).
int tb_hits = -1;
// Number of nodes made certain and nodes bounded
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Period in the end of the sentence.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the counters.

@@ -57,6 +57,7 @@ const std::unordered_map<std::string, std::unordered_set<std::string>>
{{"stop"}, {}},
{{"ponderhit"}, {}},
{{"quit"}, {}},
{{"stats"}, {}},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stats is a non-standard UCI command. Instead of changing UCI interface/parser for every possible custom command, it's better to add:

virtual bool CustomCmd(const std::string& cmd, const std::string& params) { return false; }

into class UciLoop and make it return true if cmd is something handled.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed that command. I had a talk with borg about this some time ago, and the conclusion was that we might implement a "go nodes 0" command that does the same thing. As to not burden this rather large PR with even more details, I removed the stats command and all references.

if (info.score) res += " score cp " + std::to_string(*info.score);
// Mate display. If abs(score) > 20000 its a certain win (or loss).
// Length to mate (or tbwin) is then (+ or -) abs(score)-20000.
if (info.score) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not nice at all to reuse info.score for checkmate display. Add info.mate for that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I know. Will fix. Will do this last, when everything else is cleaned up. We would still need an indicator if info.score or info.mate is valid when processing the info structure. Ideas?

if (info.hashfull >= 0) res += " hashfull " + std::to_string(info.hashfull);
if (info.nps >= 0) res += " nps " + std::to_string(info.nps);
if (info.tb_hits >= 0) res += " tbhits " + std::to_string(info.tb_hits);
if (info.certain >= 0) res += " certain " + std::to_string(info.certain);
if (info.bounds >= 0) res += " bounds " + std::to_string(info.bounds);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I've not seen any custom keywords in uci protocols. I only saw them after string (like we do with verbose move stats). I think it's better to remove them and output to the log only.

Also files in chess/ subdirectory are attempted to be engine-independent, and adding custom MCTS fields there couples components.

return;
}

if ((history_.Last().GetRepetitions() >= 1) &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Twofold repetition is not a draw.
Twofold repetition is useful and people generally like it, we should not remove it.
If it really adds some (>5) Elo, we may consider making it a flag though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know about this. The whole point about CP is to let leela find repetitions earlier. Under a "best play" assumption this is as zero as you can get to address this problem.

But to clarify: this change does not terminate the game if a two-fold arises, it just scores it as if its a draw. Its not making two-folds terminals, just certain. This means that leela will not waste and dilute visits to otherwise identical positionss

It is responsible for some of the +10 gain of elo, but i really don't know how many. But this is the one thing, that should compound really well in training. I have numerous real world examples from CCCC and TCEC where this finds the correct path to a win, while normal leela stumbles into a perpetual. Sometimes it sees the draw 16 ply earlier (two-fold plus CP) then normal leela. If we assume an effective branching factor of two in chess, we might just need 2^-16 of the training games we otherwise would need for leela to understand these positions. These are all guestimates and no gurantees can be given of course. But it seems like a good try.

The only downside is that you get 0.00 evals if leela sees it can repeat. If the opponent then deviates it will return to the normal eval. Most other engines do this too. But, if you turn CP off, you get normal 3-fold behavior. So I would hate to waste another option on this.

src/mcts/node.h Outdated
void MakeTerminal(GameResult result);
void MakeCertain(GameResult result, CertaintyTrigger trigger);
void MakeCertain(float q, CertaintyTrigger trigger);
void SetEQ(float eq);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is EQ? Add comments to all public functions.

src/mcts/node.h Outdated
@@ -225,7 +270,8 @@ class Node {
EdgeList edges_;

// 8 byte fields.
// Pointer to a parent node. nullptr for the root.
// Pointer to a parent node. nullptr for the root of tree,
// root of tree != search->root_node_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a false statement. root of tree == search->root_node_ in startpos.
Better something like: "Note that root of the tree is game's startpos and not root of the search."

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was meant to say that equality does not hold always, which the previous comment implied. But, I see your point and my comment is also ambigious. I wil clarify.

src/mcts/node.h Outdated
// N-related getters, from Node (if exists).
uint32_t GetN() const { return node_ ? node_->GetN() : 0; }
int GetNStarted() const { return node_ ? node_->GetNStarted() : 0; }
uint32_t GetNInFlight() const { return node_ ? node_->GetNInFlight() : 0; }

// Whether the node is known to be terminal.
bool IsTerminal() const { return node_ ? node_->IsTerminal() : false; }
//bool IsTerminal() const { return node_ ? node_->IsTerminal() : false; }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented line (uncomment or delete).

const OptionId SearchParams::kCertaintyPropagationId{
"certainty-propagation", "CertaintyPropagation",
"Enables level of certainty propagation: "
"Setting 1 is fully compatible with training "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it a ChoiceOption instead.

Copy link
Member

@mooskagh mooskagh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second part, still most of search.cc is not reviewed.

src/mcts/node.cc Outdated
for (const auto& child : Edges()) visits += child.GetN();
n_ = visits;
}
n_in_flight_ = 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to make it assert, as it's expected that it is 0.

float Node::GetVisitedPolicy() const { return visited_policy_; }

float Node::GetQ() const {
if (parent_) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like relying much on a parent, as it was a temporary optimization only.
Is Q conceptually edge's or node's property?
If edge's: Why do we need to check "if edge is certain" in a node rather than edge?
If node's: Why do we store that it's certain in an edge rather than a node?

Node should not go to parent to get some information about itself.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented it as to be as minimal invasive to the current codebase. You are correct that overall it would be "nicer" to move from a node centric style to a edge.node or even edge centric style. I left the parts of the code as is, if they checked for terminal/certain status by node. I would propose changing this only after CP is tested and with the complete codebase and future changes in mind.

So while I feel that there is currently a shift towards a more edge centric view with your new detachment scheme which might store N in the edge as well, and also even the original A0 seems rather edge centric), I would still make that a seperate PR sometime later.

src/mcts/node.cc Outdated
Edge* Node::GetEdgeToNode(const Node* node) const {
assert(node->parent_ == this);
assert(node->index_ < edges_.size());
return &edges_[node->index_];
}

Edge* Node::GetOwnEdge() const { return GetParent()->GetEdgeToNode(this); }
Edge* Node::GetOwnEdge() const {
if (GetParent())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curly braces around the bodies of conditional statement, as it's not one-line statement.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

src/mcts/node.h Outdated
void MakeCertain(GameResult result, CertaintyTrigger trigger);
void MakeCertain(float q, CertaintyTrigger trigger);
void SetEQ(float eq);
void ClearEdge() { certainty_state_ = 0; };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Poor name. It only clears certainty, not entire edge.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any ideas for a name. Maybe ClearEdgeCertaintyStatus ?

src/mcts/node.cc Outdated
@@ -358,6 +426,8 @@ void NodeTree::MakeMove(Move move) {
current_head_->ReleaseChildrenExceptOne(new_head);
current_head_ =
new_head ? new_head : current_head_->CreateSingleChildNode(move);
if (current_head_->GetParent()) current_head_->GetOwnEdge()->ClearEdge();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need that? Why would something need to look somewhere above search head?

Copy link
Collaborator Author

@Videodr0me Videodr0me Dec 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sometimes we can prove bounds for root. Lets say root is lowerbounded -> we know that we have at least a draw, but might have something better. In these cases we do not need to search moves from root that are upperbounded (at maximum a draw). As we store all information about bounds/certainty in the edge leading to the node, I fetch it there. The only time you do not have that info is in the startposition of chess and if you provide a fen. But in gameplay you usually do. Also if we proved that a later move has bounds, and we finally arrive at that position it would be a pity to loose that info. ClearEdge clears everything except for the bounds (another good reason to rename ClearEdge).

float perct = -1, elo = 99999;
float los = 99999;
if ((winp1 + loosep1 + draws) > 0)
perct = (((float)draws) / 2 + winp1) / (winp1 + loosep1 + draws);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

braces around body of conditional operator if it's on a separate line.
use static_cast<float>() instead of C-style casts.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do.

<< "%";
res += " Win: " + oss.str();
}
if (elo < 99998) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are those magic constants? I don't understand what's going on here, but looks like a hack.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its just for the case where no elo can be computed (its preinitialized to 99999). There are a number of cases where that can happen (all wins, all losses, no games etc.). One could of course check for all these cases, but that would also look bloated. The number is arbitrary.

@@ -1618,7 +1618,7 @@ int SyzygyTablebase::probe_dtz(const Position& pos, ProbeState* result) {
// Use the DTZ tables to rank root moves.
//
// A return value false indicates that not all probes were successful.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outdated comment.
But actually, return value was intended to return error and now it just is let go?..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is strange. My file has the correct comment:

// A return value 0 indicates that not all probes were successful.
// Otherwise best rank is returned, with rank 1 = draw.```

if (params_.GetCertaintyPropagation() > 0) {
if (edge.IsCertain() && edge.GetEQ() != 0.0f)
uci_info.score =
edge.GetEQ() * (20000 + ((uci_info.pv.size() + 1) / 2) + 1 +
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I commented otherwise, add proper mate reporting in uci_info, without need for 20000s.

Copy link

@gonzalezjo gonzalezjo Dec 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In his defense, in cases where the mate is provable, but not all variations have been checked (e.g., tablebase wins) it is normal chess engine behavior to output a score of 100 or 300 + static eval, or an otherwise similarly inflated eval. On the other hand, if all variations have been checked, I agree that a mate score should be properly reported via uci_info. The bigger issue here is the description of reporting M500 if a Syzygy win is found, instead of resorting to this behavior, which can lead to improper tournament adjudications.

edge.GetEQ() * (20000 + ((uci_info.pv.size() + 1) / 2) + 1 +
(edge.IsPropagatedTBHit() ? 1000 : 0));
else if (root_syzygy_rank_) {
int sign = (root_syzygy_rank_ - 1 > 0) - (root_syzygy_rank_ - 1 < 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May worth to make sign a separate function. Either in utils/misc.h or just in this file above in an anonymous namespace.

- exposed depth parameter (0 is no-look-ahead)
- only two modes CP=1 for training and CP=2 for play
Todo:
- change option from int to choiceoption
- use info.mate to communicate mate scores
@jhellis3
Copy link

I wrote about this on another issue, but it seems to fit best with CP.

Mate distance pruning: If PV ends in mate in x ply, no expansion should happen beyond x ply because all scores will be worse than mate in x. Not sure how this interacts with MCTS, whether the mates need to be proven or not to be worthwhile. Hopefully not but who knows.... anyway thought I would mention it here as this seemed the most appropriate place for it.

@Videodr0me
Copy link
Collaborator Author

I wrote about this on another issue, but it seems to fit best with CP.

Mate distance pruning: If PV ends in mate in x ply, no expansion should happen beyond x ply because all scores will be worse than mate in x. Not sure how this interacts with MCTS, whether the mates need to be proven or not to be worthwhile. Hopefully not but who knows.... anyway thought I would mention it here as this seemed the most appropriate place for it.

That is among the things CP does. As soon as a win (mate or tb win) is found, its backpropagated one ply up as a loss for the other side, then the whole branch will no longer be visited beyond the backpropagated certain node.

@oscardssmith
Copy link
Contributor

@Videodr0me what @jhellis3 is suggesting is slightly different. His is that once the root is proven mate, a search for ways to decrease mate distance should be started that never goes to a higher depth than the lowest depth found mate.

@jhellis3
Copy link

then the whole branch will no longer be visited beyond the backpropagated certain node.

With mate distance pruning you don't visit any nodes (regardless of branch), beyond x ply when you have a mate in x. Whether the mate needs to be proven, or if it is sufficient for it to just be the PV, I do not know. But certainly if the PV results in a mate for either side one should expect the opponent to attempt to deviate at an earlier point.

@oscardssmith
Copy link
Contributor

I think you do require the mate to be proven. Since we use heuristic functions, there are plenty of times you need to visit an alternative line beyond the depth of a mate to accurately evaluate it.

@oscardssmith
Copy link
Contributor

oscardssmith commented Dec 16, 2018

@Videodr0me Idea wrt 8. Would it make sense to have a check focused lookahead that tries to find perpets?
if we assume that there are an average of 8 ways to get out of check, and 2 ways to re-initiate check, that's only 4096 positions to check for depth 6 search (assuming none of the checks have immediate bad effects, in reality, most would probably fail by depth 1).
To reduce the cost, you could only trigger it if you thought it was the last time you were going to visit the node, if another node nearby was being CP'd, or if the last 2 moves in the tree (or history) have been checks. This should mean yield a CP'd draw most of the times it was called.

@Videodr0me
Copy link
Collaborator Author

Videodr0me commented Dec 23, 2018

@Videodr0me Idea wrt 8. Would it make sense to have a check focused lookahead that tries to find perpets?
if we assume that there are an average of 8 ways to get out of check, and 2 ways to re-initiate check, that's only 4096 positions to check for depth 6 search (assuming none of the checks have immediate bad effects, in reality, most would probably fail by depth 1).
To reduce the cost, you could only trigger it if you thought it was the last time you were going to visit the node, if another node nearby was being CP'd, or if the last 2 moves in the tree (or history) have been checks. This should mean yield a CP'd draw most of the times it was called.

The lookahead does this, as soon as a side can force a perpet (if in search horizon) it has proven bounds of being at least a draw, This propagates in the tree, and is used to correct scores if for example a different branch is sampled where the others sides NN eval is positive. In this case (as we proved we have at least a draw) that score is corrected to 0.00 at the point in the tree where the draw can be forced. Already with 2 move draw scoring and a one-ply-lookahead (the default) it leads to CP finding draws much much earlier, often 16 ply earlier than master. It only translates to +10 elo (newer testing suggests even better results, but lets not get over optimistic) because its rare that this makes a different in the game result, but it sometimes does. Especially in many of the positions where leela threw away a full point or could not hold a draw (yes it sees its own perpetuals much better too, and often saves otherwise lost games) in CCCC or TCEC it would have made a difference.

So if you bump up certainty-propagation-depth to higher values, it will find more perpets (and mates) in that horizon and that will most likely lead to more proven bounds. If the move-generator gets optimized and we have a hash table in place i would like to bump depth up to 3, which should be sufficient. Currently that taxes the CPU to much.

@Videodr0me
Copy link
Collaborator Author

With mate distance pruning you don't visit any nodes (regardless of branch), beyond x ply when you have a mate in x. Whether the mate needs to be proven, or if it is sufficient for it to just be the PV, I do not know. But certainly if the PV results in a mate for either side one should expect the opponent to attempt to deviate at an earlier point.

I am not quite sure i fully understand what you mean. Leelas PV is not like a/bs PVs, it is just a sequence of the "most likely" moves. If it ends in a mate, that does not mean, that the mate is forced. Even with CP a PV could end in a mate which is "unforced". CP of course would know that it is no proven mate, and not display a mate score. UCT averages all Qs, so even if the most-likely sequence of moves ends in a terminal mate, that does not imply that the score of that branch is winning. Of course this is only a transient possibility as soon UCT will explore moves avoiding the unforced mate more often, and the sequence of most likely moves will change. Does this somehow address what you meant?

@jhellis3
Copy link

jhellis3 commented Dec 23, 2018

Not exactly, the original observation is, as previously mentioned, that in the case you do have a proven mate (it is my understanding here this is one of the features of CP: displays mate scores if proven mate (length is average length of mate when first proven)), then certainly there is no point in expanding any node beyond X ply. In alpha/beta, for example, you just instantly return (typically alpha) when the search ply is > X and thus do not search anything beyond that point. Basically, all this is doing is making the search for shorter mates much more efficient once a certain mate has been found. This is just free search efficiency gains with proven mates.

The other part is that for MCTS, the uncertain state of mate may not matter in determining the most efficient use of nodes. The score of the mate in the PV doesn't really matter for this observation, just that it is mate and is the PV. If the PV is followed, certainly the game is terminal when the mate appears on the board. Thus, if the mate is to not happen, the PV must change and that divergence must come sometime before the mate is played. So since you know that to be the case, it may make the most sense to spend prioritize node expansion on nodes before that ply. I do not know if this part will work (result in a strength increase), but it seems worth looking into to me... ymmv.

@Videodr0me
Copy link
Collaborator Author

Not exactly, the original observation is, as previously mentioned, that in the case you do have a proven mate (it is my understanding here this is one of the features of CP: displays mate scores if proven mate (length is average length of mate when first proven)), then certainly there is no point in expanding any node beyond X ply. In alpha/beta, for example, you just instantly return (typically alpha) when the search ply is > X and thus do not search anything beyond that point. Basically, all this is doing is making the search for shorter mates much more efficient once a certain mate has been found. This is just free search efficiency gains with proven mates.

Ok now i understand what you originally meant. You are talking about finding shorter and shorter mates once a mate is found. The current version of CP does not search past the mate, in fact if a proven mate is found in timed (or fixed node play) the move is played immediately. In infinite mode (multi-pv) i opted to no longer search a poven mating move and rather search the other moves. Basically resting on the assumption that the user in multi-pv would rather like to know if the other moves are winning as well, instead of finding a "shorter" win for the main move. In non-multi-pv current behaviour does not find shorter mates but instead would go for mates that are more in line with prior policy. These might even be longer but at least to the NN are more "natural". I have not actually encountered this in practice though.

With that being said, one could easily modify CP to find shorter and shorter mates if one really needs that, but it would entail storing depth-to-mate information in each node. Either by somehow overloading Q or a seperate depth entry. As we generally did not want to blow up the memory footprint of the node to much, i avoided that. But it could be implemented easisly in a future version.

The other part is that for MCTS, the uncertain state of mate may not matter in determining the most efficient use of nodes. The score of the mate in the PV doesn't really matter for this observation, just that it is mate and is the PV. If the PV is followed, certainly the game is terminal when the mate appears on the board. Thus, if the mate is to not happen, the PV must change and that divergence must come sometime before the mate is played. So since you know that to be the case, it may make the most sense to spend prioritize node expansion on nodes before that ply. I do not know if this part will work (result in a strength increase), but it seems worth looking into to me... ymmv.

Again, i might misunderstand, but this is what CP does, it propagates the certainty of a branch upward in the tree, so already proven sidelines are no longer sampled.

@jhellis3
Copy link

jhellis3 commented Dec 24, 2018

With that being said, one could easily modify CP to find shorter and shorter mates if one really needs that, but it would entail storing depth-to-mate information in each node.

I think you should be able to just use your already existing flags for CP (since you prefer a proven mate over other lines)... and the shorter mates only will be a natural consequence, as any longer mates would not receive any more visits.

Something like:
if (certainMate && currentNodePly > PvPly )
skipNode;

or for extendNode:

&& !(certainMate && currentNodePly > PvPly)

@oscardssmith
Copy link
Contributor

The problem is that there are 2 different plys to consider. the ply of the current move, and the ply of the longest mating line. The first we have, the 2nd we don't.

@jhellis3
Copy link

It is just the length of the pv. I know this used to be reported for depth, not sure if it is still there.

@Videodr0me
Copy link
Collaborator Author

Videodr0me commented Dec 29, 2018

It is just the length of the pv. I know this used to be reported for depth, not sure if it is still there.

Something like:
if (certainMate && currentNodePly > PvPly )
skipNode;

or for extendNode:

&& !(certainMate && currentNodePly > PvPly)

Currently if a forced mate is found in a branch, that branch is a proven win and that branch is no longer searched at all. So the additonal condition currentNodePly > PvPly is not necessary. This makes sense deeper in the tree as you want to allocate visits efficiently on other sub-branches and not waste visits on finding shorter mates in a sub-branch that turns out to be irrelevant.

If you already propagated the mate to root, then you could think about schemes to continue searching for shorter mates. Then your proposal makes sense, and it was probably meant as such in the first place.

Just one thing to consider is that the PV at the time the mate was proven is neither guranteed to be the longest nor is it the shortest, its just the PV that got the most visits leading to mate at the moment the last sub-branch got proven. So it could be seen as a policy weighted average distance to mate, but even that is technically not entirely correct. So while in practice taking it as a upper bound will probably find shorter mates more quickly then just searching without a ply limit. That limit might sometimes be to low, to gurantee finding a shorter mate (for example if it is lower than the actual proven mate and the shorter mate is inbetween the real length of the proven mate and this bound). But as a heuristic, i agree that in a future refinement this might be a easy way to find shorter mates.

One could argue - and i am not really of any definite opinion on this - that a "natural" mate (in terms of the NN policy) is probably easier to understand for humans then an "artificial" shorter mate with a lot of double !! moves.

@jhellis3
Copy link

Currently if a forced mate is found in a branch, that branch is a proven win and that branch is no longer searched at all. So the additonal condition currentNodePly > PvPly is not necessary.

Ok, but It is not just for the branch, but for the entire tree.

@Videodr0me
Copy link
Collaborator Author

Videodr0me commented Dec 30, 2018

Currently if a forced mate is found in a branch, that branch is a proven win and that branch is no longer searched at all. So the additonal condition currentNodePly > PvPly is not necessary.

Ok, but It is not just for the branch, but for the entire tree.

Hmm, thats dangerous. Again, maybe i am still misunderstanding, but there are two cases.
(1) A proven win did not propagate to root. Only a sub-branch is proven to be a win deep somewhere in the tree. In this case you can't really enforce any ply limit on the tree, as you then could not differentiate between moves avoiding the mate. It is very likely that the choice among moves that avoid the mate can only be made by also looking at deeper lines exceeding this ply limit.

(2) The certain win propagated to root, now we know there is a winning move and the opponent can't avoid it. Now you have several alternarives: You can stop searching and just play the win or you can continue searching with in order to maybe find shorter mates.

If you want to continue searching in scenario 2, there is a problem in that a ply limit derived from length of pv is not exact nor is it an upper or lower bound. It is just the path of most visits at the exact moment when the last sub-branch of the mate was proven. As a heuristic this might work in many cases, but it might also miss shorter mates.

@jhellis3
Copy link

You can find the implementation Stockfish uses at line 610 of search.cpp here: https://github.com/official-stockfish/Stockfish/blob/master/src/search.cpp

Perhaps that will be of more use than my explanatory skills, which are clearly lacking.

@gonzalezjo
Copy link

@Videodr0me Are you still around?

@Videodr0me
Copy link
Collaborator Author

I am still here, just had no time recently. Will try to slice it up into several commits to finally get it into master, but time is sparse currently.

@jkormu
Copy link

jkormu commented Mar 9, 2019

In #788 @MelleKoning linked a CP game where leela does not go for mate in one.
Link to the game https://lichess.org/i0Cxcq3y. The trolling starts in move 77 where mate in one is missed for the first time.
image

I do believe there might be a bug in pr487 as it goes for non-mating move e7e8q while PR700 plays the mating move d8h8 immediately.

Let's compare pr700 with cp on and Off vs pr487 using net 32991 with settings:
pr700 cp off: --verbose-move-stats --threads=1 --minibatch-size=32
pr700 cp on, pr487 cp on: --verbose-move-stats --threads=1 --minibatch-size=32 --certainty-propagation
in positions of 77th and 78th move that are both mates in one.

To get to the position of 77th move use command:

position startpos moves b1c3 d7d5 e2e4 d5d4 c3e2 c7c5 e2g3 b8c6 f1c4 g8f6 d2d3 b7b5 c4b3 e7e6 g1h3 d8b6 e1g1 h7h5 h3g5 h5h4 g3e2 f8e7 h2h3 f6d7 g5f3 c8b7 a2a4 a7a6 c1f4 e7f6 g1h1 c6a5 b3a2 b5a4 f3d2 e8g8 f4h2 a8c8 a2c4 b7c6 e2f4 b6b4 c4a2 c8e8 f4h5 f6d8 h2d6 g7g6 f2f4 g6h5 f4f5 e6e5 g2g3 d7f6 g3h4 g8h8 f1g1 f8g8 a2f7 g8g1 h1g1 b4b7 f7e8 c6e8 d6e5 c5c4 g1h1 a5c6 e5d6 c4d3 c2d3 d8c7 d6c7 b7c7 d1g1 c6e5 g1g3 c7e7 a1g1 e8c6 g1g2 a6a5 g3f4 c6e8 f4h6 f6h7 h6b6 e5c6 d2f3 e7d6 b6b7 d6d7 b7d7 e8d7 g2g6 c6b4 f3e5 h7f8 g6b6 h8g8 h1g2 d7e8 b6b8 e8c6 g2f3 g8g7 b8d8 c6b5 d8d4 g7f6 d4d8 f6e5 d3d4 e5f6 d8f8 f6g7 f8b8 b5c4 d4d5 g7f7 d5d6 b4c6 b8c8 c4b5 f3f4 c6b4 f4g5 b4d3 c8c7 f7g8 g5g6 d3f4 g6f6 a4a3 b2a3 g8h8 d6d7 b5d7 c7d7 h8g8 e4e5 f4e2 e5e6 e2f4 e6e7 f4d5 d7d5 g8h7 d5d8 a5a4 f6f7 h7h6

To get to the 78th move, append moves e7e8q h6h7.

3 most visited moves for go nodes 800 on 77th move:
pr700 cp off:

info string d8c8  (1690) N:      21 (+ 0) (P:  3.23%) (Q:  0.99112) (U: 0.11177) (Q+U:  1.10290) (V:  0.9840)  C:00000000
info string d8e8  (1691) N:      23 (+ 0) (P:  3.51%) (Q:  0.99215) (U: 0.11131) (Q+U:  1.10347) (V:  0.9891)  C:00000000
info string d8h8  (1694) N:     332 (+ 0) (P: 41.26%) (Q:  1.00000) (U: 0.09428) (Q+U:  1.09428) (V:  1.0000)  C:01001111
bestmove d8h8

pr700 cp on:

info string f7e8  (1543) N:       0 (+ 0) (P:  3.56%) (Q:  0.22914) (U: 0.60502) (Q+U:  0.83416) (V:  -.----)  C:00000000
info string d8d1  (1670) N:       0 (+ 0) (P:  4.65%) (Q:  0.22914) (U: 0.79036) (Q+U:  1.01950) (V:  -.----)  C:00000000
info string d8h8  (1694) N:      32 (+ 0) (P: 41.26%) (Q:  1.00000) (U: 0.21240) (Q+U:  1.21240) (V:  1.0000)  C:01001111
bestmove d8h8

pr487 cp on:

info string d8d1  (1670) N:       0 (+ 0) (P:  4.65%) (Q:  0.85788) (U: 0.79036) (Q+U:  1.64824) (V:  -.----)  C:00001000
info string d8h8  (1694) N:       0 (+ 0) (P: 41.26%) (Q:  0.85788) (U: 7.00932) (Q+U:  7.86720) (V:  1.0000)  C:01001110
info string e7e8q (1828) N:      32 (+ 0) (P:  1.40%) (Q:  1.00000) (U: 0.00721) (Q+U:  1.00722) (V:  1.0000)  C:01001110
bestmove e7e8q

So here leelas with no CP and with basic CP both find the mating move without any trouble since the move has highest prior. CP proved the mate after the first batch.
pr487 seems to also have found a proven mate after first batch but the mate is not mate in one.

Let's do the same for the position in 78th move.
pr700 cp off:

info string f5f6  (1076) N:      46 (+ 0) (P:  5.39%) (Q:  0.95422) (U: 0.09474) (Q+U:  1.04897) (V:  0.9915)  C:00000000
info string e8g8  (1718) N:      67 (+ 0) (P:  5.38%) (Q:  0.98479) (U: 0.06530) (Q+U:  1.05009) (V:  0.9952)  C:00000000
info string e8h8  (1719) N:     246 (+ 0) (P: 16.04%) (Q:  1.00000) (U: 0.05361) (Q+U:  1.05361) (V:  1.0000)  C:01001111
bestmove e8h8

pr700 cp on:

info string e8e7  (1710) N:       0 (+ 0) (P:  7.87%) (Q:  0.51935) (U: 1.33758) (Q+U:  1.85693) (V:  -.----)  C:00000000
info string e8f8  (1717) N:       0 (+ 0) (P:  8.52%) (Q:  0.51935) (U: 1.44801) (Q+U:  1.96736) (V:  -.----)  C:00000000
info string e8h8  (1719) N:      32 (+ 0) (P: 16.04%) (Q:  1.00000) (U: 0.08257) (Q+U:  1.08257) (V:  1.0000)  C:01001111
bestmove e8h8

pr487 cp on:

info string e8f8  (1717) N:       0 (+ 0) (P:  8.52%) (Q:  0.72125) (U: 1.44801) (Q+U:  2.16926) (V:  0.0000)  C:00001110
info string e8h8  (1719) N:       0 (+ 0) (P: 16.04%) (Q:  0.72125) (U: 2.72493) (Q+U:  3.44618) (V:  1.0000)  C:01001110
info string f5f6  (1076) N:      32 (+ 0) (P:  5.39%) (Q:  1.00000) (U: 0.02777) (Q+U:  1.02777) (V:  1.0000)  C:01001110
bestmove f5f6

Again same story, cp off and basic cp on both go for the mating move e8h8. PR487 is certain of the mate but the mate it sees is not the shortest and happily plays a non-mating move. I believe the same keeps going on for the remaining of the game.

@jkormu
Copy link

jkormu commented Mar 9, 2019

Just to verify the bug, I set a 2 game match CP vs CP_full using the first 76 moves of above game as a book.
The pgns: https://pastebin.com/8Y0hRg68

The result was that CP as white mates in first move while CP_full drags the game up to 248th move showing mate score for the all the moves from 77th to 248th. So CP sees the mate in all positions but does not follow the mating sequence.

@MelleKoning
Copy link
Contributor

Thanks for testing! Looks like basic CP in pr700 does the job while look ahead still has its issues.

@Naphthalin
Copy link
Contributor

As with the newer certainty propagation PRs #800, #700 and others, this is probably mostly obsolete as well now given that MLH is in v0.25, so I recommend closing this PR.

@mooskagh mooskagh closed this Apr 28, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.