Replies: 3 comments
-
PR is open: #221 |
Beta Was this translation helpful? Give feedback.
-
Thank you for bringing this up and the MR with it. Does it also make sense to do the same for other MDP signals? Once that is done, we should also be able to incorporate multi-agent learning setups. @Dhoeller19 This is akin to the discussions we had on generalizing the managers further |
Beta Was this translation helpful? Give feedback.
-
I guess for the actions and commands it might make sense to do this, for the multi-agent setups you mention. Since these are parallel efforts to the multi-critic though, I'd appreciate if we would not bulk up this issue with multi-agent things on top of the multi-critic changes. EDIT: Or I guess what I'm mostly trying to say is that I would prefer not to delay #221 due to multi-agent stuff. |
Beta Was this translation helpful? Give feedback.
-
Proposal
Currently, the reward manager assumes that there is a single reward, made up of multiple rewards terms. However, in multi-critic scenarios (e.g. CMDP) we require multiple critics, as the name implies.
I suggest generalizing the reward manager to multiple reward terms, akin to how the observation manager does with observation groups.
Motivation
Allows for multi-critics.
Pitch
Allow multiple groups of rewards with different reward terms.
Alternatives
Create multi-reward manager or a dedicated manager for different critics (e.g. a "cost manager" for CMDP).
Additional context
I'll open a PR.
Checklist
Beta Was this translation helpful? Give feedback.
All reactions