Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is my SB3 DQN agent unable to learn CartPole-v1 despite using optimal hyperparameters from RLZoo3? #472

Open
1 task done
Deepakgthomas opened this issue Oct 13, 2024 · 2 comments
Labels
documentation Improvements or additions to documentation

Comments

@Deepakgthomas
Copy link

Deepakgthomas commented Oct 13, 2024

📚 Documentation

I obtained optimal hyperparameters for training CartPole-v1 from RLZoo3. I have created a minimal example demonstrating the performance of my CartPole agent. As oer the official docs, the agent should obtain a score of 500, to have a successful episode. Unfortunately, the score doesn't rise above 300.

Here is my code -

import gymnasium as gym
import numpy as np
import torch
from stable_baselines3 import DQN
from stable_baselines3.common.vec_env import DummyVecEnv
from stable_baselines3.common.callbacks import BaseCallback
from torch.utils.tensorboard import SummaryWriter
import os

def set_seed(seed):
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.backends.cudnn.deterministic = True

class TensorBoardCallback(BaseCallback):
    def __init__(self, log_dir):
        super().__init__()
        self.writer = SummaryWriter(log_dir=log_dir)
        self.episode_rewards = []
        self.current_episode_reward = 0

    def _on_step(self):
        self.current_episode_reward += self.locals['rewards'][0]

        if self.locals['dones'][0]:
            self.episode_rewards.append(self.current_episode_reward)
            self.writer.add_scalar('train/episode_reward', self.current_episode_reward, self.num_timesteps)
            self.current_episode_reward = 0

            if len(self.episode_rewards) >= 100:
                avg_reward = sum(self.episode_rewards[-100:]) / 100
                self.writer.add_scalar('train/average_reward', avg_reward, self.num_timesteps)

        return True

    def on_training_end(self):
        self.writer.close()

# Set up logging directory
log_dir = "tensorboard_logs"
os.makedirs(log_dir, exist_ok=True)

# Set seed for reproducibility
seed = 42
set_seed(seed)

# Create environment
env = gym.make("CartPole-v1")
env = DummyVecEnv([lambda: env])

# Create model with hyperparameters from rlzoo3
model = DQN(
    policy="MlpPolicy",
    env=env,
    learning_rate=2.3e-3,
    batch_size=64,
    buffer_size=100000,
    learning_starts=1000,
    gamma=0.99,
    target_update_interval=10,
    train_freq=256,
    gradient_steps=128,
    exploration_fraction=0.16,
    exploration_final_eps=0.04,
    policy_kwargs=dict(net_arch=[256, 256]),
    verbose=1,
    tensorboard_log=log_dir,
    seed=seed
)

# Create callback
tb_callback = TensorBoardCallback(log_dir)

# Train the model
total_timesteps = 50000
model.learn(total_timesteps=total_timesteps, callback=tb_callback)

print("Training completed. You can view the results using TensorBoard.")
print(f"Run the following command in your terminal: tensorboard --logdir {log_dir}")

env.close()

Here is the final result -

enter image description here

Perhaps I am using RLZoo3 wrong? Anyways, I would truly appreciate any and all help regarding this.

Checklist

  • I have checked that there is no similar issue in the repo
@Deepakgthomas Deepakgthomas added the documentation Improvements or additions to documentation label Oct 13, 2024
@Deepakgthomas
Copy link
Author

@araffin
Copy link
Member

araffin commented Oct 14, 2024

Unfortunately, the score doesn't rise above 300.

Are you talking about the training reward (average over many episodes) or about the final performance using the (quasi)-deterministic policy?

How many runs did you do?

Did you try using the RL Zoo:
python -m rl_zoo3.train --algo dqn --env CartPole-v1 --eval-freq 10000 -P

A simple solution is to also increase the training budget.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

2 participants