New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Adds reward groups to the reward manager #729

Open

Dhoeller19 wants to merge 2 commits into main from feature/reward_groups

+210 −82

Collaborator

Dhoeller19 commented Jul 25, 2024

Description

This MR ports the code from @lorenwel and adds the concept of reward groups to the code. In the manager-based workflow, the reward terms can now be grouped together, where the computations for different groups can be done independently. This is useful in multi-agents scenarios, where you have to compute different rewards for different agents.

The reward manager still supports the previous case where no groups are defined.

Type of change

New feature (non-breaking change which adds functionality)

Checklist

I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

Dhoeller19 added 2 commits

July 25, 2024 16:56


          Ported the reward group code from Lorenz

5737d3a


          Fix doc string and type hint

583a264

lorenwel mentioned this pull request

Generalize rewards manager #221

Closed

7 tasks

Contributor

lorenwel commented Jul 25, 2024

Thanks for porting and cleaning this up.
I just checked this PR against our internal up-to-date version of #221 and just judging by the git diff, there doesn't seem to be a functional difference.
So all good from my side.

Mayankm96 reviewed

View reviewed changes

source/extensions/omni.isaac.lab/omni/isaac/lab/managers/manager_term_cfg.py

Comment on lines +267 to +269

		"""

Contributor

Mayankm96 Jul 26, 2024

Suggested change

Mayankm96 reviewed

View reviewed changes

source/extensions/omni.isaac.lab/omni/isaac/lab/managers/reward_manager.py

Comment on lines 45 to +48


		For backwards compatibility, the reward manager also supports the old configuration format without
		groups.

Contributor

Mayankm96 Jul 26, 2024 •

edited

Loading

Better to keep the note in a separate directive.

Suggested change

      
                    For backwards compatibility, the reward manager also supports the old configuration format without
          
                    groups.
          
            .. note::
          
                    For backwards compatibility, the reward manager also supports the old configuration format without
          
                    groups. In this case, the :meth:`compute` returns a tensor instead of a dictionary of tensors.

Mayankm96 reviewed

View reviewed changes

source/extensions/omni.isaac.lab/omni/isaac/lab/managers/reward_manager.py

    
            @@ -126,98 +178,164 @@ def compute(self, dt: float) -> torch.Tensor:
          
                          dt: The time-step interval of the environment.

                      Returns:

                          The net reward signal of shape (num_envs,).

                          A dictionary containing the net reward signal of shape (num_envs,) for each group.

Contributor

Mayankm96 Jul 26, 2024

We return a tensor when user doesn't set things up. We should throw proper deprecation warnings to the user so that they adapt.

Both the docstring here in the return and also in the class docs need to be fixed.

Mayankm96 reviewed

View reviewed changes

source/extensions/omni.isaac.lab/omni/isaac/lab/managers/reward_manager.py

Comment on lines +228 to 235

+                  def set_term_cfg(self, term_name: str, cfg: RewardTermCfg, group_name: str = DEFAULT_GROUP_NAME):
                       """Sets the configuration of the specified term into the manager.
                       Args:
+                          group_name: The name of the reward group.
                           term_name: The name of the reward term.
                           cfg: The configuration for the reward term.

Contributor

Mayankm96 Jul 26, 2024

Ordering of the args don't match.

Mayankm96 requested changes

View reviewed changes

Contributor

Mayankm96 left a comment

Minor comments related to breaking changes. Also need to adapt the unit tests.

source/extensions/omni.isaac.lab/omni/isaac/lab/managers/reward_manager.py

Comment on lines +251 to 252

		group_name: The name of the reward group.
		term_name: The name of the reward term.

Contributor

Mayankm96 Jul 26, 2024

Same here.

source/extensions/omni.isaac.lab/omni/isaac/lab/managers/reward_manager.py

-                  def active_terms(self) -> list[str]:
-                      """Name of active reward terms."""
-                      return self._term_names
+                  def active_terms(self) -> dict[str, list[str]]:

Contributor

Mayankm96 Jul 26, 2024

Breaking? This should return a list if it is only one group?

Mayankm96 reviewed

View reviewed changes

source/extensions/omni.isaac.lab/omni/isaac/lab/managers/reward_manager.py

               if TYPE_CHECKING:
                   from omni.isaac.lab.envs import ManagerBasedRLEnv
+              DEFAULT_GROUP_NAME = "reward"

Contributor

Mayankm96 Jul 26, 2024

If this is only for internal usage in the class then I suggest marking it with an underscore.

Mayankm96 added the enhancement label

pascal-roth added the dev team label

Mayankm96 requested a review from jsmith-bdai as a code owner

September 24, 2024 07:47

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dev team enhancement