You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I obtained optimal hyperparameters for training CartPole-v1 from RLZoo3. I have created a minimal example demonstrating the performance of my CartPole agent. As oer the official docs, the agent should obtain a score of 500, to have a successful episode. Unfortunately, the score doesn't rise above 300.
Here is my code -
importgymnasiumasgymimportnumpyasnpimporttorchfromstable_baselines3importDQNfromstable_baselines3.common.vec_envimportDummyVecEnvfromstable_baselines3.common.callbacksimportBaseCallbackfromtorch.utils.tensorboardimportSummaryWriterimportosdefset_seed(seed):
np.random.seed(seed)
torch.manual_seed(seed)
torch.backends.cudnn.deterministic=TrueclassTensorBoardCallback(BaseCallback):
def__init__(self, log_dir):
super().__init__()
self.writer=SummaryWriter(log_dir=log_dir)
self.episode_rewards= []
self.current_episode_reward=0def_on_step(self):
self.current_episode_reward+=self.locals['rewards'][0]
ifself.locals['dones'][0]:
self.episode_rewards.append(self.current_episode_reward)
self.writer.add_scalar('train/episode_reward', self.current_episode_reward, self.num_timesteps)
self.current_episode_reward=0iflen(self.episode_rewards) >=100:
avg_reward=sum(self.episode_rewards[-100:]) /100self.writer.add_scalar('train/average_reward', avg_reward, self.num_timesteps)
returnTruedefon_training_end(self):
self.writer.close()
# Set up logging directorylog_dir="tensorboard_logs"os.makedirs(log_dir, exist_ok=True)
# Set seed for reproducibilityseed=42set_seed(seed)
# Create environmentenv=gym.make("CartPole-v1")
env=DummyVecEnv([lambda: env])
# Create model with hyperparameters from rlzoo3model=DQN(
policy="MlpPolicy",
env=env,
learning_rate=2.3e-3,
batch_size=64,
buffer_size=100000,
learning_starts=1000,
gamma=0.99,
target_update_interval=10,
train_freq=256,
gradient_steps=128,
exploration_fraction=0.16,
exploration_final_eps=0.04,
policy_kwargs=dict(net_arch=[256, 256]),
verbose=1,
tensorboard_log=log_dir,
seed=seed
)
# Create callbacktb_callback=TensorBoardCallback(log_dir)
# Train the modeltotal_timesteps=50000model.learn(total_timesteps=total_timesteps, callback=tb_callback)
print("Training completed. You can view the results using TensorBoard.")
print(f"Run the following command in your terminal: tensorboard --logdir {log_dir}")
env.close()
Here is the final result -
Perhaps I am using RLZoo3 wrong? Anyways, I would truly appreciate any and all help regarding this.
Checklist
I have checked that there is no similar issue in the repo
The text was updated successfully, but these errors were encountered:
📚 Documentation
I obtained optimal hyperparameters for training CartPole-v1 from RLZoo3. I have created a minimal example demonstrating the performance of my CartPole agent. As oer the official docs, the agent should obtain a score of 500, to have a successful episode. Unfortunately, the score doesn't rise above 300.
Here is my code -
Here is the final result -
Perhaps I am using RLZoo3 wrong? Anyways, I would truly appreciate any and all help regarding this.
Checklist
The text was updated successfully, but these errors were encountered: