You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello,
in accordance with DLR-RM/stable-baselines3#1624, @SimRey and I would like to implement Hybrid PPO in this library. This is the paper that introduced it.
Motivation
@SimRey had to implement it for his thesis, so we thought I'd be nice to have in this library, especially as it's one of the only (or most popular) options when both discrete and continuous actions as necessary. A number of problems in chemical engineering are starting to explore reinforcement learning as a solution approach, and in many cases both discrete and continuous actions are needed.
Pitch
Implement a new algorithm (subclass of PPO or MaskablePPO) with corresponding network architecture that outputs both discrete and continuous actions.
Alternatives
The Hybrid PPO algorithm needs to perform 2 backward steps: one on the discrete actions (with frozen weights on the continuous actions part of the net), and one on the continuous actions (with frozen weights on the discrete actions part of the net).
To me it looks like the only option is to subclass PPO or MaskablePPO and override some methods.
Also the network architecture needs to have 2 outputs, like actor-critic but with different meaning of the outputs (also, the "critic" part could have a dimension > 1).
Additional context
No response
Checklist
I have checked that there is no similar issue in the repo
If I'm requesting a new feature, I have proposed alternatives
The text was updated successfully, but these errors were encountered:
🚀 Feature
Hello,
in accordance with DLR-RM/stable-baselines3#1624, @SimRey and I would like to implement Hybrid PPO in this library.
This is the paper that introduced it.
Motivation
@SimRey had to implement it for his thesis, so we thought I'd be nice to have in this library, especially as it's one of the only (or most popular) options when both discrete and continuous actions as necessary. A number of problems in chemical engineering are starting to explore reinforcement learning as a solution approach, and in many cases both discrete and continuous actions are needed.
Pitch
Implement a new algorithm (subclass of PPO or MaskablePPO) with corresponding network architecture that outputs both discrete and continuous actions.
Alternatives
The Hybrid PPO algorithm needs to perform 2 backward steps: one on the discrete actions (with frozen weights on the continuous actions part of the net), and one on the continuous actions (with frozen weights on the discrete actions part of the net).
To me it looks like the only option is to subclass PPO or MaskablePPO and override some methods.
Also the network architecture needs to have 2 outputs, like actor-critic but with different meaning of the outputs (also, the "critic" part could have a dimension > 1).
Additional context
No response
Checklist
The text was updated successfully, but these errors were encountered: