Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes setting the device from CLI in the RL training scripts #1013

Merged
merged 2 commits into from
Sep 24, 2024

Conversation

amrmousa144
Copy link
Contributor

@amrmousa144 amrmousa144 commented Sep 21, 2024

Description

This pull request fixes the issue where the device (CPU or CUDA) is not set correctly when using the --device argument in Hydra-configured scripts like rsl_rl/train.py and skrl/train.py. The bug caused the scripts to always default to cuda:0, even when cpu or a specific CUDA device (e.g., cuda:1) was selected.

The fix adds the following line to ensure that the selected device is properly set in env_cfg before initializing the environment with gym.make():

env_cfg.sim.device = args_cli.device

Fixes #1012

Type of change

  • Bug fix (non-breaking change which fixes an issue)

Screenshots

Before:

  • skrl/train, when running the script with --device cpu, it defaults to cuda:0.
  • rsl_rl/train.py, the script freezes at [INFO]: Starting the simulation. This may take a few seconds. Please wait....

After:

  • Both scripts run correctly on the specified device (e.g., cpu or cuda:1) without defaulting to cuda:0 or freezing.

Checklist

  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@Mayankm96 Mayankm96 changed the title Fix setting incorrect device in env_cfg Fixes setting the device from CLI in the RL training scripts Sep 23, 2024
@@ -120,7 +120,8 @@ def main(env_cfg: ManagerBasedRLEnvCfg | DirectRLEnvCfg | DirectMARLEnvCfg, agen
# set the environment seed
# note: certain randomizations occur in the environment initialization so we set the seed here
env_cfg.seed = args_cli.seed if args_cli.seed is not None else agent_cfg["seed"]

env_cfg.sim.device = args_cli.device
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move it up to before if args_cli.distributed
and replace with env_cfg.sim.device = args_cli.device if args_cli.device is not None else env_cfg.sim.device

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need to remove L139-142 (multi-gpu training config). It is a duplication of L-110-111.

@Dhoeller19
Copy link
Contributor

Thanks a lot for the fix!
Can you add the fix for rl_games as well please?

Co-authored-by: David Hoeller <[email protected]>
Signed-off-by: Mayank Mittal <[email protected]>
@Mayankm96
Copy link
Contributor

Since it is a critical fix, I am merging this right now. Will push the remaining changes directly.

@Mayankm96 Mayankm96 merged commit 0b26ae8 into isaac-sim:main Sep 24, 2024
0 of 2 checks passed
Mayankm96 pushed a commit that referenced this pull request Sep 24, 2024
This pull request fixes the issue where the device (`CPU` or `CUDA`) is
not set correctly when using the `--device` argument in Hydra-configured
scripts like `rsl_rl/train.py` and `skrl/train.py`. The bug caused the
scripts to always default to `cuda:0`, even when `cpu` or a specific
CUDA device (e.g., `cuda:1`) was selected.

The fix adds the following line to ensure that the selected device is
properly set in `env_cfg` before initializing the environment with
`gym.make()`:

```python
env_cfg.sim.device = args_cli.device
```

Fixes #1012

- Bug fix (non-breaking change which fixes an issue)

Before:
- skrl/train, when running the script with --device cpu, it defaults to
cuda:0.
- rsl_rl/train.py, the script freezes at `[INFO]: Starting the
simulation. This may take a few seconds. Please wait....`

After:
- Both scripts run correctly on the specified device (e.g., cpu or
cuda:1) without defaulting to cuda:0 or freezing.

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [ ] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there
iamdrfly pushed a commit to iamdrfly/IsaacLab that referenced this pull request Nov 21, 2024
…im#1013)

This pull request fixes the issue where the device (`CPU` or `CUDA`) is
not set correctly when using the `--device` argument in Hydra-configured
scripts like `rsl_rl/train.py` and `skrl/train.py`. The bug caused the
scripts to always default to `cuda:0`, even when `cpu` or a specific
CUDA device (e.g., `cuda:1`) was selected.

The fix adds the following line to ensure that the selected device is
properly set in `env_cfg` before initializing the environment with
`gym.make()`:

```python
env_cfg.sim.device = args_cli.device
```

Fixes isaac-sim#1012

- Bug fix (non-breaking change which fixes an issue)

Before:
- skrl/train, when running the script with --device cpu, it defaults to
cuda:0.
- rsl_rl/train.py, the script freezes at `[INFO]: Starting the
simulation. This may take a few seconds. Please wait....`

After:
- Both scripts run correctly on the specified device (e.g., cpu or
cuda:1) without defaulting to cuda:0 or freezing.

- [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with
`./isaaclab.sh --format`
- [ ] I have made corresponding changes to the documentation
- [x] My changes generate no new warnings
- [ ] I have added tests that prove my fix is effective or that my
feature works
- [ ] I have updated the changelog and the corresponding version in the
extension's `config/extension.toml` file
- [x] I have added my name to the `CONTRIBUTORS.md` or my name already
exists there
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug Report] Incorrect device selection with Hydra configurations
3 participants