TODO


worker stuff
- PS1='\[\033[01;32m\]\u@\h\[\033[00m\]:\[\033[01;34m\]\w\[\033[00m\]\$ '
- cd /tmp; wget -N --no-check-certificate https://dropbox.ewalds.ca/worker.sh; screen -d -m sh worker.sh; logout


garbage collection on pns
- if you remove a whole bunch of nodes, a backup will remove its values, changing the shape of the tree...


Features for weighted random rollouts:
- pattern - empty, white, black, offboard - 4^6 = 4096, minus symmetry = ~300
- distance to last move
- distance to second last move
- how many edges - 0,1,2
- how many corners - 0,1
- group size
- distance to win
- forms vc

Board printing
- mark the last move (bold colour?)
- move to board.cpp ?
- make a generalized version for outputing values for empty cells

Add a small amount of random knowledge, or shuffle children order to avoid bias towards early moves

Move iterator already computes xy(move), so might as well save it, and return a MoveValid instead

Continue playing past the end of a rollout until the losing player makes a winning formation. Use this as an estimate of the winning advantage

Test correlation between each cell to statistically determine virtual connections and important areas

Test local replies in rollouts based on patterns
- if an important pattern is found local to the last move, make it, otherwise play globally.

Test
- dynamic widening
- combinations of knowledge?
- weighted random with knowledge?


weighted random
- better weighted random based on new features?
- different weights for each player?


player_solved claims an empty board is a win by black, needs the backup to be rewritten, probably should copy Player::PlayerUCT::do_backup


compact tree memory manager
- move old unused elements of the freelist to the front so they get used first? or is it better that they're at the back?
- store a Chunk pointer in each data block, then traverse the freelist instead of the memory
  - remove elements that are outside the generation size, compact the rest
  - compact Chunks that have old unused segments
  - might be faster, but has an extra pointer...


Store nodes in a hash table instead of tree
- how to garbage collect?
- how to keep tree between moves?
- use a persistent db instead?
- use an array of hash tables, resizing one at a time so little extra memory is needed

Build a large persistent tree in some database
- save and reload heavy Nodes between games
- tokyo or kyoto cabinet? leveldb?


lambda DFPN http://www.aaai.org/Papers/IJCAI/2007/IJCAI07-387.pdf

Proof number search player
- choose by amount of work spent in a node

Count nodes expanded per move
- how to deal with garbage collection?
- how to deal with multi-threading/pondering?

tree checkpointing to deal with a crash?

detect ties
- if not using lbdists, use few flood fill instead of 24
  - if edges/corners can reach each other, no need to run extra flood fills
- keep bits for corners and edges for each colour
- if crossing empty or your own, propagate connections, otherwise don't
- how to go backwards?
- start a flood fill from each corner/edge
  - mark each edge/corner position as reached by the other flood fills
  - skip if it's already reached

PV doesn't take win_or_draw, loss_or_draw into account
final move selection doesn't choose good moves for proven draws

if not keeptree, and the number of non-losing moves at the root is 1, just make that move
- what if the last move is also a loss? should choose hardest loss

any ideas to handle ladders?
- always generate threats?

reuse (partial?) proof trees
- if a move is backed up as a win, it is likely a win under its siblings too
- give it a bonus to its equivalent cousins
- can we try to prove siblings by reusing the proof tree?
- store which sibling was used to prove the current tree, to avoid having to store it

Test the accuracy of the rollouts
- do a tree search to get a value of a position
- run 1000 rollouts without a tree search to get the value of the position
- compare the two for many positions
- compare at various stages in a game
- test different rollout policies to minimize error
- look at error distribution
- error and absolute error

Test accuracy of knowledge
- do a tree search with exploration to get the value of a position and the order of moves
- try various knowledge combinations to see how they order moves
- consider equal knowledge value when determining distance


move ordering by knowledge
- lower bound on distance to edges/corners
- maybe based on number of paths? handles virtual connections


rework swap
- rave doesn't give swap any time, so useless as a searchable move
- swap can't be changed after ponder is enabled
- waste of time anyway
- crashes when swap isn't a valid move
- new way:
  - only handle in the opening book?
  - maximize over min(exp, 1-exp) on empty board
  - swap if exp < 0.5 after first move

do proven nodes need
- rave updates?
- to be considered in move choice?
- can this be improved by sorting nodes:
  - solved win
  - unsolved, sorted by x,y
  - solved ties
  - solved losses
- how to do this multi-threaded?
  - do sorting during garbage collection?

detect dead areas, areas where your opponent has limited you to one corner or two sides and no rings
- negative knowledge?
- really dead (completely enclosed) vs almost/temporarily dead (enclosed with VCs)?
- still useful to block opponent wins?
- may be related to lower bound on win?

do 2 ply search at node initialization if certain conditions are met
- 1 corner and group size >= board size - 1
- 2 edges and group size >= board size + 1
- ring if group size >= 5
- how to interact with lower bound?

build a linked list of empty moves so the move iterator can skip filled moves
- probably not worth it since the iterator is used so rarely
- could build a doubly linked list since memory in this area is cheap
- more expensive in the solver than the player

final move selection
- what to do when the node with the most work is solved as a loss, but there is an unsolved node with little experience
  - might be an easy win for the opponent
  - take extra time if possible
- fast wins
- slow loss/ties


multi-thread
- different parameters (ie ravefactor) for each player thread?
  - useful for testing parameters at different parts of the game
- solver threads should use a TT instead of using the tree?


share the tree between the player and solver
- add phi/delta to UCTNode
- need way of knowing whether the node has been initialized by the player or solver
- how to deal with areas of the tree that are initialized by the other?
- rave updates for parts of the tree that aren't yet initialized?
- phi/delta as knowledge again?
- thread safety?
- run multiple different solvers? dfpnsab/pnsab
- knowledge for phi/delta


prior knowledge
- criticality
- copy knowledge from proof tree
- avoid clumping? penalty for being near many of your own stones
- lower bound needs virtual connection bonus / penalty for crossing opponents VC

rollout policy
- avoid attacking virtual connections
- local reply?
- locality?
- connections?


Time control
- quiecense search - if the highest winrate != most played, keep simulating until it is
- if only one move doesn't lose, stop simulating and just make that move
  - good idea with total time per game, but not per move


Pros of MCTS:
- very general heuristic
- doesn't get lost
- uses previous experience
- easy parallelization

Cons of MCTS:
- slow
- memory heavy
- magic and witchcraft relates to both the exploration factor, as well as developing good simulation policies/patterns.

Pros of PNS:
- game independent
- works well when branching factor is important
- independent of position in the tree, so fast
- different

Cons of PNS:
- linear decay instead of exponential decay
- gets lost since branching factor is very short sighted
- doesn't use much knowledge or past experience


PNS based:
- Add heuristic knowledge:
  - random rollouts as knowledge?
- Keep knowledge for future move selection
- expected win value of a node, then minimize (delta*(1-value))
- take amount of work into account in move selection
- maximize the expected increase of: 1/p2 + 1/d2

MCTS based:
- maintain phi/delta, and add them as an exploration criteria to the UCT formula
- Add thresholds so updates don't need to go all the way to the root
- change updates so it's not a complete weighted average
  - do minimax backup
  - weighted average ignoring solved nodes
    - hard, since it isn't clear what to do when multiple threads are down a node
  - weighted average over the best 50% of work

Hybrid:
- have a few threads start with MCTS, then switch to PNS at some point:
  - once the branching factor is not uniform amoung siblings
  - once the node has fewer than some amount of simulations
  - once valuation is polarized (ie 60%+ or 40%-)
- have a few threads watching a work queue
- share a tree
  - some threads doing each, both expanding and proving nodes
  - threads switch between proof and player based on success