Skip to content

Commit ab2aafd

Browse files
splion-360sekyondaMetasvekars
authored
Update hyper params and set seeds (#3384)
* Update hyper params and set seeds * Updated hyper params * Added commented paragraph --------- Co-authored-by: sekyondaMeta <[email protected]> Co-authored-by: Svetlana Karslioglu <[email protected]>
1 parent 2c4c99d commit ab2aafd

File tree

1 file changed

+23
-3
lines changed

1 file changed

+23
-3
lines changed

intermediate_source/reinforcement_q_learning.py

Lines changed: 23 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,24 @@
9292
)
9393

9494

95+
# To ensure reproducibility during training, you can fix the random seeds
96+
# by uncommenting the lines below. This makes the results consistent across
97+
# runs, which is helpful for debugging or comparing different approaches.
98+
#
99+
# That said, allowing randomness can be beneficial in practice, as it lets
100+
# the model explore different training trajectories.
101+
102+
103+
# seed = 42
104+
# random.seed(seed)
105+
# torch.manual_seed(seed)
106+
# env.reset(seed=seed)
107+
# env.action_space.seed(seed)
108+
# env.observation_space.seed(seed)
109+
# if torch.cuda.is_available():
110+
# torch.cuda.manual_seed(seed)
111+
112+
95113
######################################################################
96114
# Replay Memory
97115
# -------------
@@ -253,13 +271,15 @@ def forward(self, x):
253271
# EPS_DECAY controls the rate of exponential decay of epsilon, higher means a slower decay
254272
# TAU is the update rate of the target network
255273
# LR is the learning rate of the ``AdamW`` optimizer
274+
256275
BATCH_SIZE = 128
257276
GAMMA = 0.99
258277
EPS_START = 0.9
259-
EPS_END = 0.05
260-
EPS_DECAY = 1000
278+
EPS_END = 0.01
279+
EPS_DECAY = 2500
261280
TAU = 0.005
262-
LR = 1e-4
281+
LR = 3e-4
282+
263283

264284
# Get number of actions from gym action space
265285
n_actions = env.action_space.n

0 commit comments

Comments
 (0)