-
Notifications
You must be signed in to change notification settings - Fork 1.1k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Proposal] RewardManager cannot handle multiple critics #220
Comments
PR is open: #221 |
Thank you for bringing this up and the MR with it. Does it also make sense to do the same for other MDP signals? Once that is done, we should also be able to incorporate multi-agent learning setups. @Dhoeller19 This is akin to the discussions we had on generalizing the managers further |
…saac-sim#220) # Description This MR does the following: * Changes the way environment and agent configs are handled. These are now moved to their respective `env` folder. * Replaces YAML file with configclasses for rsl_rl library * Adds examples of other robots for locomotion-velocity task: anymal-b, unitree-a1, and anymal-d (placeholder) The new environment folder structure is as follows: ``` my_task (folder) |- mdp (folder where all user's task-specific terms are stored) |- config |- robot_name_1 (folder) |- robot_specific_env_cfg.py | agents (folder) |- rsl_rl_cfg.py |- sb3_cfg.py |- .... |- __init__.py (register robot-specific tasks) |- robot_name_2 (folder) |- my_task_env_cfg.py (base env config) |- my_task_env.py (base env -- optional -- should normally use RLEnv) ``` Fixes isaac-sim#142 ## Type of change - New feature (non-breaking change which adds functionality) - Breaking change (fix or feature that would cause existing functionality to not work as expected) ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./orbit.sh --format` - [ ] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file --------- Co-authored-by: Nikita Rudin <[email protected]>
I guess for the actions and commands it might make sense to do this, for the multi-agent setups you mention. Since these are parallel efforts to the multi-critic though, I'd appreciate if we would not bulk up this issue with multi-agent things on top of the multi-critic changes. EDIT: Or I guess what I'm mostly trying to say is that I would prefer not to delay #221 due to multi-agent stuff. |
…saac-sim#220) # Description This MR does the following: * Changes the way environment and agent configs are handled. These are now moved to their respective `env` folder. * Replaces YAML file with configclasses for rsl_rl library * Adds examples of other robots for locomotion-velocity task: anymal-b, unitree-a1, and anymal-d (placeholder) The new environment folder structure is as follows: ``` my_task (folder) |- mdp (folder where all user's task-specific terms are stored) |- config |- robot_name_1 (folder) |- robot_specific_env_cfg.py | agents (folder) |- rsl_rl_cfg.py |- sb3_cfg.py |- .... |- __init__.py (register robot-specific tasks) |- robot_name_2 (folder) |- my_task_env_cfg.py (base env config) |- my_task_env.py (base env -- optional -- should normally use RLEnv) ``` Fixes isaac-sim#142 ## Type of change - New feature (non-breaking change which adds functionality) - Breaking change (fix or feature that would cause existing functionality to not work as expected) ## Checklist - [x] I have run the [`pre-commit` checks](https://pre-commit.com/) with `./orbit.sh --format` - [ ] I have made corresponding changes to the documentation - [x] My changes generate no new warnings - [ ] I have added tests that prove my fix is effective or that my feature works - [x] I have updated the changelog and the corresponding version in the extension's `config/extension.toml` file --------- Co-authored-by: Nikita Rudin <[email protected]>
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Proposal
Currently, the reward manager assumes that there is a single reward, made up of multiple rewards terms. However, in multi-critic scenarios (e.g. CMDP) we require multiple critics, as the name implies.
I suggest generalizing the reward manager to multiple reward terms, akin to how the observation manager does with observation groups.
Motivation
Allows for multi-critics.
Pitch
Allow multiple groups of rewards with different reward terms.
Alternatives
Create multi-reward manager or a dedicated manager for different critics (e.g. a "cost manager" for CMDP).
Additional context
I'll open a PR.
Checklist
The text was updated successfully, but these errors were encountered: