Name		Name	Last commit message	Last commit date
parent directory ..
.ipynb_checkpoints		.ipynb_checkpoints
gym_go		gym_go
README.md		README.md
gogym_play.py		gogym_play.py
mcts.py		mcts.py
neuralnet.py		neuralnet.py
output.txt		output.txt
politer-selfplay.py		politer-selfplay.py
setup.py		setup.py
utils.py		utils.py

README.md

Files

File	Description
mcts.py	holds the MonteCarloTree class that is responsible for the underlying data structures and algorithms to ultimately run the big algorithm in the room: Monte Carlo Tree Search
politer-selfplay.py	responsible for the policy iteration algorithm and executing each episode, all for the purpose of training and updating the neural network involved

How the Training Pipeline Works

This is the understanding to our best knowledge, from the AlphaGo Zero paper.

We initialize the neural network parameters randomly. Then, we generate 25,000 games at each iteration. We send off these games to a training database queue whose maximum capacity is 500,000 games. (When we fill up the queue to 500,000 games, each subsequent game will be appended with the first game being removed).
Asynchronously, the neural network trains with mini-batches of 2,048 timesteps across the previous 500,000 games. We do this 1,000 times and compare the 1,000th checkpoint against the current best neural network, which is the current network used to generate the data (so in this case, it is the randomly initialized neural network.)
If the win rate of this checkpoint is > 55%, then we replace the current best neural network with this checkpoint and subsequently use it to generate new self-play data. We clear up all the games in the queue to start cleanly. If the win rate is any less than 55%, then we continue using the current best and keep checking every 1,000 training iterations (2,000th checkpoint, 3000th, etc.)

Note: the neural network stays constant while generating self-training data. We have two copies then: one neural network that is used to generate the training data and another that is purely being trained.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

algorithms

algorithms

README.md

Files

How the Training Pipeline Works

Files

algorithms

Directory actions

More options

Directory actions

More options

Latest commit

History

algorithms

Folders and files

parent directory

README.md

Files

How the Training Pipeline Works