[Proposal] RewardManager cannot handle multiple critics #220

lorenwel · 2024-01-29T16:46:55Z

Proposal

Currently, the reward manager assumes that there is a single reward, made up of multiple rewards terms. However, in multi-critic scenarios (e.g. CMDP) we require multiple critics, as the name implies.
I suggest generalizing the reward manager to multiple reward terms, akin to how the observation manager does with observation groups.

Motivation

Allows for multi-critics.

Pitch

Allow multiple groups of rewards with different reward terms.

Alternatives

Create multi-reward manager or a dedicated manager for different critics (e.g. a "cost manager" for CMDP).

Additional context

I'll open a PR.

Checklist

I have checked that there is no similar issue in the repo (required)

lorenwel · 2024-01-29T16:50:53Z

PR is open: #221

Mayankm96 · 2024-01-29T18:22:48Z

Thank you for bringing this up and the MR with it.

Does it also make sense to do the same for other MDP signals? Once that is done, we should also be able to incorporate multi-agent learning setups.

@Dhoeller19 This is akin to the discussions we had on generalizing the managers further

…saac-sim#220) # Description This MR does the following: * Changes the way environment and agent configs are handled. These are now moved to their respective `env` folder. * Replaces YAML file with configclasses for rsl_rl library * Adds examples of other robots for locomotion-velocity task: anymal-b, unitree-a1, and anymal-d (placeholder) The new environment folder structure is as follows: ``` my_task (folder) |- mdp (folder where all user's task-specific terms are stored) |- config |- robot_name_1 (folder) |- robot_specific_env_cfg.py | agents (folder) |- rsl_rl_cfg.py |- sb3_cfg.py |- .... |- __init__.py (register robot-specific tasks) |- robot_name_2 (folder) |- my_task_env_cfg.py (base env config) |- my_task_env.py (base env -- optional -- should normally use RLEnv) ``` Fixes isaac-sim#142 ## Type of change - New feature (non-breaking change which adds functionality) - Breaking change (fix or feature that would cause existing functionality to not work as expected) ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./orbit.sh --format` - [ ] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file --------- Co-authored-by: Nikita Rudin <[email protected]>

lorenwel · 2024-01-30T09:17:00Z

I guess for the actions and commands it might make sense to do this, for the multi-agent setups you mention.
Observations are already generalized and for terminations and randomization it does not seem necessary to me.
Depending on the types of curriculums you want, that might also be useful in some fringe cases.

Since these are parallel efforts to the multi-critic though, I'd appreciate if we would not bulk up this issue with multi-agent things on top of the multi-critic changes.

EDIT: Or I guess what I'm mostly trying to say is that I would prefer not to delay #221 due to multi-agent stuff.

…saac-sim#220) # Description This MR does the following: * Changes the way environment and agent configs are handled. These are now moved to their respective `env` folder. * Replaces YAML file with configclasses for rsl_rl library * Adds examples of other robots for locomotion-velocity task: anymal-b, unitree-a1, and anymal-d (placeholder) The new environment folder structure is as follows: ``` my_task (folder) |- mdp (folder where all user's task-specific terms are stored) |- config |- robot_name_1 (folder) |- robot_specific_env_cfg.py | agents (folder) |- rsl_rl_cfg.py |- sb3_cfg.py |- .... |- __init__.py (register robot-specific tasks) |- robot_name_2 (folder) |- my_task_env_cfg.py (base env config) |- my_task_env.py (base env -- optional -- should normally use RLEnv) ``` Fixes isaac-sim#142 ## Type of change - New feature (non-breaking change which adds functionality) - Breaking change (fix or feature that would cause existing functionality to not work as expected) ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./orbit.sh --format` - [ ] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file --------- Co-authored-by: Nikita Rudin <[email protected]>

lorenwel mentioned this issue Jan 29, 2024

Generalize rewards manager #221

Closed

7 tasks

Mayankm96 added the enhancement New feature or request label Jan 29, 2024

Mayankm96 assigned Dhoeller19 Jul 4, 2024

isaac-sim locked and limited conversation to collaborators Oct 2, 2024

Dhoeller19 converted this issue into discussion #1102 Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

[Proposal] RewardManager cannot handle multiple critics #220

[Proposal] RewardManager cannot handle multiple critics #220

lorenwel commented Jan 29, 2024

lorenwel commented Jan 29, 2024

Mayankm96 commented Jan 29, 2024 •

edited

Loading

lorenwel commented Jan 30, 2024 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

[Proposal] RewardManager cannot handle multiple critics #220

[Proposal] RewardManager cannot handle multiple critics #220

Comments

lorenwel commented Jan 29, 2024

Proposal

Motivation

Pitch

Alternatives

Additional context

Checklist

lorenwel commented Jan 29, 2024

Mayankm96 commented Jan 29, 2024 • edited Loading

lorenwel commented Jan 30, 2024 • edited Loading

This issue was moved to a discussion.

Mayankm96 commented Jan 29, 2024 •

edited

Loading

lorenwel commented Jan 30, 2024 •

edited

Loading