Add TD3 and SAC support for multiple envs #481

noahfarr · 2024-08-27T20:31:47Z

Description

TD3 and SAC currently doesnt support running multiple environments. Its easy to add by adding a num_envs param and passing it to the env creation and replay buffer initialization.

Types of changes

Bug fix
New feature
New algorithm
Documentation

Checklist:

I've read the CONTRIBUTION guide (required).
I have ensured pre-commit run --all-files passes (required).
I have updated the tests accordingly (if applicable).
I have updated the documentation and previewed the changes via mkdocs serve.
- I have explained note-worthy implementation details.
- I have explained the logged metrics.
- I have added links to the original paper and related papers.

If you need to run benchmark experiments for a performance-impacting changes:

I have contacted @vwxyzjn to obtain access to the openrlbenchmark W&B team.
I have used the benchmark utility to submit the tracked experiments to the openrlbenchmark/cleanrl W&B project, optionally with --capture_video.
I have performed RLops with python -m openrlbenchmark.rlops.
- For new feature or bug fix:
  - I have used the RLops utility to understand the performance impact of the changes and confirmed there is no regression.
- For new algorithm:
  - I have created a table comparing my results against those from reputable sources (i.e., the original paper or other reference implementation).
- I have added the learning curves generated by the python -m openrlbenchmark.rlops utility to the documentation.
- I have added links to the tracked experiments in W&B, generated by python -m openrlbenchmark.rlops ....your_args... --report, to the documentation.

vercel · 2024-08-27T20:31:52Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
cleanrl	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Aug 28, 2024 6:10pm

pseudo-rnd-thoughts · 2024-08-28T09:46:46Z

cleanrl/td3_continuous_action.py

@@ -47,6 +47,8 @@ class Args:
    """total timesteps of the experiments"""
    learning_rate: float = 3e-4
    """the learning rate of the optimizer"""
+    num_envs: int = 2


I would default this to 1

Sure. Probably also worth discussing how to handle total_timesteps with multiple environments.

@pseudo-rnd-thoughts It seems like in sb3 they do it like this:
Lets say total_timesteps is 100_000.
Then they actually run 100_000 * num_envs steps, because for each timestep num_envs step are executed.

noahfarr added 2 commits August 27, 2024 22:28

Add support for multiple envs

36d1d3a

Run pre-commit

d348e9f

vercel bot deployed to Preview August 27, 2024 20:32 View deployment

pseudo-rnd-thoughts reviewed Aug 28, 2024

View reviewed changes

Default num_envs to 1

180e302

vercel bot deployed to Preview August 28, 2024 18:02 View deployment

Run pre-commit

3f0cd76

vercel bot deployed to Preview August 28, 2024 18:04 View deployment

noahfarr changed the title ~~Add TD3 support for multiple envs~~ Add TD3 and SAC support for multiple envs Aug 28, 2024

Add multi env support for sac_continuous_action

bc29059

vercel bot deployed to Preview August 28, 2024 18:10 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TD3 and SAC support for multiple envs #481

Add TD3 and SAC support for multiple envs #481

noahfarr commented Aug 27, 2024 •

edited

Loading

vercel bot commented Aug 27, 2024 •

edited

Loading

pseudo-rnd-thoughts Aug 28, 2024

noahfarr Aug 28, 2024

noahfarr Aug 28, 2024

Add TD3 and SAC support for multiple envs #481

Are you sure you want to change the base?

Add TD3 and SAC support for multiple envs #481

Conversation

noahfarr commented Aug 27, 2024 • edited Loading

Description

Types of changes

Checklist:

vercel bot commented Aug 27, 2024 • edited Loading

pseudo-rnd-thoughts Aug 28, 2024

Choose a reason for hiding this comment

noahfarr Aug 28, 2024

Choose a reason for hiding this comment

noahfarr Aug 28, 2024

Choose a reason for hiding this comment

noahfarr commented Aug 27, 2024 •

edited

Loading

vercel bot commented Aug 27, 2024 •

edited

Loading