graph LR;
A[Conv Block] --> B[Conv Block]
B --> C[FF block]
C --> D[Value head]
C --> G[Policy head]
Where:
Conv Block
:Conv
: 4n in, 4xn out, 5x5 convolution with stride 1, padding 2Activation
: SeluMaxPool
: 2x2 max poolingDropout
: 0.1
FF block
:LazyLinear
with output 256, Selu activation and dropoutValue head
:Linear
: 256, 64 outActivation
: SeluDropout
: 0.5Linear
: 64, 1 outActivation
: Tanh
Policy head
:Linear
: 256, 128 outActivation
: SeluLinear
: 128, 128 outActivation
: SeluLinear
: 128, 64 outActivation
: SeluDropout
: 0.5Linear
: 64, 8 out
In order to train the model, the following sequence of steps is applied:
- For each episode do the following:
- Create two agents, randomly choose one to start.
- Play the game until the game is over.
- Record the choices of each player.
- The winner will take positive score whereas the loser will take negative score. Draws result with score of 0.
- Run around 50 episodes in parallel and record the results
- Train the model on the recorded results
- Repeat the process
- To train the model: run the
trainer_main.py
file. If you want to use the recorded model, use theload
option to load the saved model. - To test the model in an actual main, use the normal main file. You can use the
load
to use the latest checkpoint of the model