-
Notifications
You must be signed in to change notification settings - Fork 2
/
TODO
260 lines (196 loc) · 8.77 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
worker stuff
- PS1='\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$ '
- cd /tmp; wget -N --no-check-certificate https://dropbox.ewalds.ca/worker.sh; screen -d -m sh worker.sh; logout
garbage collection on pns
- if you remove a whole bunch of nodes, a backup will remove its values, changing the shape of the tree...
Features for weighted random rollouts:
- pattern - empty, white, black, offboard - 4^6 = 4096, minus symmetry = ~300
- distance to last move
- distance to second last move
- how many edges - 0,1,2
- how many corners - 0,1
- group size
- distance to win
- forms vc
Board printing
- mark the last move (bold colour?)
- move to board.cpp ?
- make a generalized version for outputing values for empty cells
Add a small amount of random knowledge, or shuffle children order to avoid bias towards early moves
Move iterator already computes xy(move), so might as well save it, and return a MoveValid instead
Continue playing past the end of a rollout until the losing player makes a winning formation. Use this as an estimate of the winning advantage
Test correlation between each cell to statistically determine virtual connections and important areas
Test local replies in rollouts based on patterns
- if an important pattern is found local to the last move, make it, otherwise play globally.
Test
- dynamic widening
- combinations of knowledge?
- weighted random with knowledge?
weighted random
- better weighted random based on new features?
- different weights for each player?
player_solved claims an empty board is a win by black, needs the backup to be rewritten, probably should copy Player::PlayerUCT::do_backup
compact tree memory manager
- move old unused elements of the freelist to the front so they get used first? or is it better that they're at the back?
- store a Chunk pointer in each data block, then traverse the freelist instead of the memory
- remove elements that are outside the generation size, compact the rest
- compact Chunks that have old unused segments
- might be faster, but has an extra pointer...
Store nodes in a hash table instead of tree
- how to garbage collect?
- how to keep tree between moves?
- use a persistent db instead?
- use an array of hash tables, resizing one at a time so little extra memory is needed
Build a large persistent tree in some database
- save and reload heavy Nodes between games
- tokyo or kyoto cabinet? leveldb?
lambda DFPN http://www.aaai.org/Papers/IJCAI/2007/IJCAI07-387.pdf
Proof number search player
- choose by amount of work spent in a node
Count nodes expanded per move
- how to deal with garbage collection?
- how to deal with multi-threading/pondering?
tree checkpointing to deal with a crash?
detect ties
- if not using lbdists, use few flood fill instead of 24
- if edges/corners can reach each other, no need to run extra flood fills
- keep bits for corners and edges for each colour
- if crossing empty or your own, propagate connections, otherwise don't
- how to go backwards?
- start a flood fill from each corner/edge
- mark each edge/corner position as reached by the other flood fills
- skip if it's already reached
PV doesn't take win_or_draw, loss_or_draw into account
final move selection doesn't choose good moves for proven draws
if not keeptree, and the number of non-losing moves at the root is 1, just make that move
- what if the last move is also a loss? should choose hardest loss
any ideas to handle ladders?
- always generate threats?
reuse (partial?) proof trees
- if a move is backed up as a win, it is likely a win under its siblings too
- give it a bonus to its equivalent cousins
- can we try to prove siblings by reusing the proof tree?
- store which sibling was used to prove the current tree, to avoid having to store it
Test the accuracy of the rollouts
- do a tree search to get a value of a position
- run 1000 rollouts without a tree search to get the value of the position
- compare the two for many positions
- compare at various stages in a game
- test different rollout policies to minimize error
- look at error distribution
- error and absolute error
Test accuracy of knowledge
- do a tree search with exploration to get the value of a position and the order of moves
- try various knowledge combinations to see how they order moves
- consider equal knowledge value when determining distance
move ordering by knowledge
- lower bound on distance to edges/corners
- maybe based on number of paths? handles virtual connections
rework swap
- rave doesn't give swap any time, so useless as a searchable move
- swap can't be changed after ponder is enabled
- waste of time anyway
- crashes when swap isn't a valid move
- new way:
- only handle in the opening book?
- maximize over min(exp, 1-exp) on empty board
- swap if exp < 0.5 after first move
do proven nodes need
- rave updates?
- to be considered in move choice?
- can this be improved by sorting nodes:
- solved win
- unsolved, sorted by x,y
- solved ties
- solved losses
- how to do this multi-threaded?
- do sorting during garbage collection?
detect dead areas, areas where your opponent has limited you to one corner or two sides and no rings
- negative knowledge?
- really dead (completely enclosed) vs almost/temporarily dead (enclosed with VCs)?
- still useful to block opponent wins?
- may be related to lower bound on win?
do 2 ply search at node initialization if certain conditions are met
- 1 corner and group size >= board size - 1
- 2 edges and group size >= board size + 1
- ring if group size >= 5
- how to interact with lower bound?
build a linked list of empty moves so the move iterator can skip filled moves
- probably not worth it since the iterator is used so rarely
- could build a doubly linked list since memory in this area is cheap
- more expensive in the solver than the player
final move selection
- what to do when the node with the most work is solved as a loss, but there is an unsolved node with little experience
- might be an easy win for the opponent
- take extra time if possible
- fast wins
- slow loss/ties
multi-thread
- different parameters (ie ravefactor) for each player thread?
- useful for testing parameters at different parts of the game
- solver threads should use a TT instead of using the tree?
share the tree between the player and solver
- add phi/delta to UCTNode
- need way of knowing whether the node has been initialized by the player or solver
- how to deal with areas of the tree that are initialized by the other?
- rave updates for parts of the tree that aren't yet initialized?
- phi/delta as knowledge again?
- thread safety?
- run multiple different solvers? dfpnsab/pnsab
- knowledge for phi/delta
prior knowledge
- criticality
- copy knowledge from proof tree
- avoid clumping? penalty for being near many of your own stones
- lower bound needs virtual connection bonus / penalty for crossing opponents VC
rollout policy
- avoid attacking virtual connections
- local reply?
- locality?
- connections?
Time control
- quiecense search - if the highest winrate != most played, keep simulating until it is
- if only one move doesn't lose, stop simulating and just make that move
- good idea with total time per game, but not per move
Pros of MCTS:
- very general heuristic
- doesn't get lost
- uses previous experience
- easy parallelization
Cons of MCTS:
- slow
- memory heavy
- magic and witchcraft relates to both the exploration factor, as well as developing good simulation policies/patterns.
Pros of PNS:
- game independent
- works well when branching factor is important
- independent of position in the tree, so fast
- different
Cons of PNS:
- linear decay instead of exponential decay
- gets lost since branching factor is very short sighted
- doesn't use much knowledge or past experience
PNS based:
- Add heuristic knowledge:
- random rollouts as knowledge?
- Keep knowledge for future move selection
- expected win value of a node, then minimize (delta*(1-value))
- take amount of work into account in move selection
- maximize the expected increase of: 1/p2 + 1/d2
MCTS based:
- maintain phi/delta, and add them as an exploration criteria to the UCT formula
- Add thresholds so updates don't need to go all the way to the root
- change updates so it's not a complete weighted average
- do minimax backup
- weighted average ignoring solved nodes
- hard, since it isn't clear what to do when multiple threads are down a node
- weighted average over the best 50% of work
Hybrid:
- have a few threads start with MCTS, then switch to PNS at some point:
- once the branching factor is not uniform amoung siblings
- once the node has fewer than some amount of simulations
- once valuation is polarized (ie 60%+ or 40%-)
- have a few threads watching a work queue
- share a tree
- some threads doing each, both expanding and proving nodes
- threads switch between proof and player based on success