-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds reward groups to the reward manager #729
base: main
Are you sure you want to change the base?
Conversation
Thanks for porting and cleaning this up. |
""" | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
""" | |
""" | |
pass | |
|
||
For backwards compatibility, the reward manager also supports the old configuration format without | ||
groups. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to keep the note in a separate directive.
For backwards compatibility, the reward manager also supports the old configuration format without | |
groups. | |
.. note:: | |
For backwards compatibility, the reward manager also supports the old configuration format without | |
groups. In this case, the :meth:`compute` returns a tensor instead of a dictionary of tensors. | |
@@ -126,98 +178,164 @@ def compute(self, dt: float) -> torch.Tensor: | |||
dt: The time-step interval of the environment. | |||
|
|||
Returns: | |||
The net reward signal of shape (num_envs,). | |||
A dictionary containing the net reward signal of shape (num_envs,) for each group. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We return a tensor when user doesn't set things up. We should throw proper deprecation warnings to the user so that they adapt.
Both the docstring here in the return and also in the class docs need to be fixed.
def set_term_cfg(self, term_name: str, cfg: RewardTermCfg, group_name: str = DEFAULT_GROUP_NAME): | ||
"""Sets the configuration of the specified term into the manager. | ||
|
||
Args: | ||
group_name: The name of the reward group. | ||
term_name: The name of the reward term. | ||
cfg: The configuration for the reward term. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ordering of the args don't match.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments related to breaking changes. Also need to adapt the unit tests.
group_name: The name of the reward group. | ||
term_name: The name of the reward term. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
def active_terms(self) -> list[str]: | ||
"""Name of active reward terms.""" | ||
return self._term_names | ||
def active_terms(self) -> dict[str, list[str]]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Breaking? This should return a list if it is only one group?
|
||
if TYPE_CHECKING: | ||
from omni.isaac.lab.envs import ManagerBasedRLEnv | ||
|
||
|
||
DEFAULT_GROUP_NAME = "reward" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is only for internal usage in the class then I suggest marking it with an underscore.
Description
This MR ports the code from @lorenwel and adds the concept of reward groups to the code. In the manager-based workflow, the reward terms can now be grouped together, where the computations for different groups can be done independently. This is useful in multi-agents scenarios, where you have to compute different rewards for different agents.
The reward manager still supports the previous case where no groups are defined.
Type of change
Checklist
pre-commit
checks with./isaaclab.sh --format
config/extension.toml
fileCONTRIBUTORS.md
or my name already exists there