-
Notifications
You must be signed in to change notification settings - Fork 40
BASE/IQL_S3 v20221124
Cryolite edited this page Dec 5, 2022
·
2 revisions
- Type: Transformer encoder layers (the same network structure as the one used for BERTBASE)
- Dimension: 768
- # of heads: 12
- Dimension of feedforward networks: 3072
- # of layers: 12
- Activation function: GELU
- Dropout rate in training: 0.1
- Initialization: Transferred from the trained encoder of BASE/IQL_S3 v20221003
- Type: Single-layer position-wise feedforward network
- Dimension: 3072
- Activation function: GELU
- Dropout rate in training: 0.1
- Initialization: Transferred from the trained decoder of BASE/IQL_S3 v20221003
- Type: Transformer encoder layers (the same network structure as the one used for BERTBASE)
- Dimension: 768
- # of heads: 12
- Dimension of feedforward networks: 3072
- # of layers: 12
- Activation function: GELU
- Dropout rate in training: 0.1
- Initialization: Transferred from the trained encoder of BASE/IQL_S3 v20221003
- Type: Dueling network with two single-layer position-wise feedforward networks
- Dimension: 3072
- Activation function: GELU
- Dropout rate in training: 0.1
- Initialization: Transferred from the trained decoder of BASE/IQL_S3 v20221003
- Type: Implicit Q-learning (IQL)
- Reward: Game delta of grading points as a Saint 3 player in the Jade room
Crawled Game Records v202007_202109
200000000 samples randomly sampled from the crawled game records and shuffled.
- Discount factor (γ): 1.0
- Expectile (τ): 0.90
- Soft update (Polyak averaging) rate of target networks (α): 0.1
- Optimizer: LAMB
- Learning rate: 0.001
- ε: 1.0e-6
- Batch size: 131072
- # of training epochs: N/A
(TODO)
Quantitative Comparison with BASE/BC_H13 v20220210 as the Baseline
(TODO)