Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds reward groups to the reward manager #729

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

Dhoeller19
Copy link
Collaborator

Description

This MR ports the code from @lorenwel and adds the concept of reward groups to the code. In the manager-based workflow, the reward terms can now be grouped together, where the computations for different groups can be done independently. This is useful in multi-agents scenarios, where you have to compute different rewards for different agents.

The reward manager still supports the previous case where no groups are defined.

Type of change

  • New feature (non-breaking change which adds functionality)

Checklist

  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@lorenwel lorenwel mentioned this pull request Jul 25, 2024
7 tasks
@lorenwel
Copy link
Contributor

Thanks for porting and cleaning this up.
I just checked this PR against our internal up-to-date version of #221 and just judging by the git diff, there doesn't seem to be a functional difference.
So all good from my side.

Comment on lines +267 to +269
"""


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""
"""
pass

Comment on lines 45 to +48

For backwards compatibility, the reward manager also supports the old configuration format without
groups.

Copy link
Contributor

@Mayankm96 Mayankm96 Jul 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to keep the note in a separate directive.

Suggested change
For backwards compatibility, the reward manager also supports the old configuration format without
groups.
.. note::
For backwards compatibility, the reward manager also supports the old configuration format without
groups. In this case, the :meth:`compute` returns a tensor instead of a dictionary of tensors.

@@ -126,98 +178,164 @@ def compute(self, dt: float) -> torch.Tensor:
dt: The time-step interval of the environment.

Returns:
The net reward signal of shape (num_envs,).
A dictionary containing the net reward signal of shape (num_envs,) for each group.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We return a tensor when user doesn't set things up. We should throw proper deprecation warnings to the user so that they adapt.

Both the docstring here in the return and also in the class docs need to be fixed.

Comment on lines +228 to 235
def set_term_cfg(self, term_name: str, cfg: RewardTermCfg, group_name: str = DEFAULT_GROUP_NAME):
"""Sets the configuration of the specified term into the manager.

Args:
group_name: The name of the reward group.
term_name: The name of the reward term.
cfg: The configuration for the reward term.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordering of the args don't match.

Copy link
Contributor

@Mayankm96 Mayankm96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments related to breaking changes. Also need to adapt the unit tests.

Comment on lines +251 to 252
group_name: The name of the reward group.
term_name: The name of the reward term.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

def active_terms(self) -> list[str]:
"""Name of active reward terms."""
return self._term_names
def active_terms(self) -> dict[str, list[str]]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Breaking? This should return a list if it is only one group?


if TYPE_CHECKING:
from omni.isaac.lab.envs import ManagerBasedRLEnv


DEFAULT_GROUP_NAME = "reward"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is only for internal usage in the class then I suggest marking it with an underscore.

@Mayankm96 Mayankm96 added the enhancement New feature or request label Aug 2, 2024
@pascal-roth pascal-roth added the dev team Issue or pull request created by the dev team label Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dev team Issue or pull request created by the dev team enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants