Why is my SB3 DQN agent unable to learn CartPole-v1 despite using optimal hyperparameters from RLZoo3? #472

Deepakgthomas · 2024-10-13T20:19:12Z

📚 Documentation

I obtained optimal hyperparameters for training CartPole-v1 from RLZoo3. I have created a minimal example demonstrating the performance of my CartPole agent. As oer the official docs, the agent should obtain a score of 500, to have a successful episode. Unfortunately, the score doesn't rise above 300.

Here is my code -

import gymnasium as gym
import numpy as np
import torch
from stable_baselines3 import DQN
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.callbacks import BaseCallback
from torch.utils.tensorboard import SummaryWriter
import os

def set_seed(seed):
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.backends.cudnn.deterministic = True

class TensorBoardCallback(BaseCallback):
    def __init__(self, log_dir):
        super().__init__()
        self.writer = SummaryWriter(log_dir=log_dir)
        self.episode_rewards = []
        self.current_episode_reward = 0

    def _on_step(self):
        self.current_episode_reward += self.locals['rewards'][0]

        if self.locals['dones'][0]:
            self.episode_rewards.append(self.current_episode_reward)
            self.writer.add_scalar('train/episode_reward', self.current_episode_reward, self.num_timesteps)
            self.current_episode_reward = 0

            if len(self.episode_rewards) >= 100:
                avg_reward = sum(self.episode_rewards[-100:]) / 100
                self.writer.add_scalar('train/average_reward', avg_reward, self.num_timesteps)

        return True

    def on_training_end(self):
        self.writer.close()

# Set up logging directory
log_dir = "tensorboard_logs"
os.makedirs(log_dir, exist_ok=True)

# Set seed for reproducibility
seed = 42
set_seed(seed)

# Create environment
env = gym.make("CartPole-v1")
env = DummyVecEnv([lambda: env])

# Create model with hyperparameters from rlzoo3
model = DQN(
    policy="MlpPolicy",
    env=env,
    learning_rate=2.3e-3,
    batch_size=64,
    buffer_size=100000,
    learning_starts=1000,
    gamma=0.99,
    target_update_interval=10,
    train_freq=256,
    gradient_steps=128,
    exploration_fraction=0.16,
    exploration_final_eps=0.04,
    policy_kwargs=dict(net_arch=[256, 256]),
    verbose=1,
    tensorboard_log=log_dir,
    seed=seed
)

# Create callback
tb_callback = TensorBoardCallback(log_dir)

# Train the model
total_timesteps = 50000
model.learn(total_timesteps=total_timesteps, callback=tb_callback)

print("Training completed. You can view the results using TensorBoard.")
print(f"Run the following command in your terminal: tensorboard --logdir {log_dir}")

env.close()

Here is the final result -

Perhaps I am using RLZoo3 wrong? Anyways, I would truly appreciate any and all help regarding this.

Checklist

I have checked that there is no similar issue in the repo

Deepakgthomas · 2024-10-13T20:19:50Z

I also put up a SO post about it here - https://stackoverflow.com/questions/79083972/why-is-my-sb3-dqn-agent-unable-to-learn-cartpole-v1-despite-using-optimal-hyperp

araffin · 2024-10-14T06:42:35Z

Unfortunately, the score doesn't rise above 300.

Are you talking about the training reward (average over many episodes) or about the final performance using the (quasi)-deterministic policy?

How many runs did you do?

Did you try using the RL Zoo:
python -m rl_zoo3.train --algo dqn --env CartPole-v1 --eval-freq 10000 -P

A simple solution is to also increase the training budget.

Deepakgthomas added the documentation Improvements or additions to documentation label Oct 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is my SB3 DQN agent unable to learn CartPole-v1 despite using optimal hyperparameters from RLZoo3? #472

Why is my SB3 DQN agent unable to learn CartPole-v1 despite using optimal hyperparameters from RLZoo3? #472

Deepakgthomas commented Oct 13, 2024 •

edited

Loading

Deepakgthomas commented Oct 13, 2024

araffin commented Oct 14, 2024

Why is my SB3 DQN agent unable to learn CartPole-v1 despite using optimal hyperparameters from RLZoo3? #472

Why is my SB3 DQN agent unable to learn CartPole-v1 despite using optimal hyperparameters from RLZoo3? #472

Comments

Deepakgthomas commented Oct 13, 2024 • edited Loading

📚 Documentation

Checklist

Deepakgthomas commented Oct 13, 2024

araffin commented Oct 14, 2024

Deepakgthomas commented Oct 13, 2024 •

edited

Loading