Generalize rewards manager #221

lorenwel · 2024-01-29T16:50:35Z

Description

Generalizes the reward manager to allow for multiple "reward groups" made of multiple terms instead of a single one.
Is backwards compatible with the previous setup, if just reward terms are added to the config.

Fixes #220

Type of change

New feature (non-breaking change which adds functionality)
This change requires a documentation update (Maybe, not sure)

Checklist

I have run the pre-commit checks with ./orbit.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have run all the tests with ./orbit.sh --test and they pass
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

Mayankm96 · 2024-01-31T10:40:38Z

Thanks for this MR. Will review this in the following days and add feedback. Probably also want to do this for termination manager.

The main blocking question is that with this, do we expect that the Env class returns a dict of rewards? Or it should still return a single tensor, and we put all the miscellaneous groups to extras? The latter seems to be closer to Gymnasium definitions.

lorenwel · 2024-02-07T10:22:23Z

The way I implemented this at the moment, is that it will return a single Tensor if there are no reward groups and a dict of tensors when there are multiple.
The reason I chose this, is because it might be hard to eventually decide which is the "main reward". In the case of CMDP with a single reward and cost this might be easy, but for things like multi-critic, where you have a dedicated critic network per reward term, it is unclear how this should be split up.

I'm not sure how much downstream complexity this would involve for the multi-critic case in learning frameworks other than rsl_rl. There, it was a fairly straight-forward modification to the OnPolicyRunner to move all Tensors in the dict to GPU/CPU (plus some logging stuff).

Generally, to me it is always a bit confusing when "major" information for learning is put into the "extras" dictionary. Then it suddenly becomes a core thing, depending on the situation.
It's already like that with the observations, which I find confusing, but that's a separate issue.

lorenwel · 2024-06-07T11:05:54Z

@Fe1777 Thank you for the updated PR description.
@Mayankm96 Any update on whether this is something you are still interested in and if so, if the architecture is fine. I also didn't get a response in #220 since January.

We've been using this internally since then and it works fine for us.
I'm happy to resolve the conflicts, if you are generally fine with the architecture.

lorenwel · 2024-07-25T15:22:29Z

Superseded by #729

lorenwel added 4 commits January 23, 2024 19:28

Generalized reward manager to multiple groups

286061c

Changed efault reward group name

8d121b2

Small fix in reward manager and formatting

5149166

Merge branch 'main' into feature/generalize_rewards

9c8116f

lorenwel mentioned this pull request Jan 29, 2024

[Proposal] RewardManager cannot handle multiple critics #220

Closed

1 task

Add contributor and format

ea5055d

Mayankm96 added the enhancement New feature or request label Jan 29, 2024

Mayankm96 assigned lorenwel Feb 12, 2024

Mayankm96 force-pushed the main branch 3 times, most recently from 3e7a470 to cfcabba Compare March 24, 2024 22:35

Mayankm96 force-pushed the main branch from 1ecb2d1 to 7cc56c3 Compare April 22, 2024 21:34

Dhoeller19 unassigned lorenwel Jun 18, 2024

Mayankm96 force-pushed the main branch from 512710c to ffec353 Compare June 25, 2024 22:32

Mayankm96 mentioned this pull request Jul 8, 2024

How to include different robots of differents DOFs in the scene ? #657

Open

lorenwel closed this Jul 25, 2024

lorenwel mentioned this pull request Jul 25, 2024

Adds reward groups to the reward manager #729

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalize rewards manager #221

Generalize rewards manager #221

lorenwel commented Jan 29, 2024 •

edited

Loading

Mayankm96 commented Jan 31, 2024 •

edited

Loading

lorenwel commented Feb 7, 2024

lorenwel commented Jun 7, 2024

lorenwel commented Jul 25, 2024

Generalize rewards manager #221

Generalize rewards manager #221

Conversation

lorenwel commented Jan 29, 2024 • edited Loading

Description

Type of change

Checklist

Mayankm96 commented Jan 31, 2024 • edited Loading

lorenwel commented Feb 7, 2024

lorenwel commented Jun 7, 2024

lorenwel commented Jul 25, 2024

lorenwel commented Jan 29, 2024 •

edited

Loading

Mayankm96 commented Jan 31, 2024 •

edited

Loading