You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm currently working on a project with my team, developing a MaskablePPO reinforcement learning model with MultiDiscrete action space. Since, our action space is really large, we wanted to create independent actions. However, in our case we've encountered a challenge: our model needs to incorporate action masking that handles dependent actions. This means if one action is selected, it might invalidate other actions, a feature our current model setup doesn't support as it only allows for independent action masking.So briefly, if we use Discrete action space we can handle those action dependencies but our action space becomes massive to handle. If we decide to separate actions and create a MultiDiscrete action space, we won't be able handle dependent actions such as; if I choose [1,,] 1 as my first cction we should be only allowing second and third actions to be 2 or 3. So, [1,2,3] --> is valid while [1,1,3] is not and it should be masked.
I'm keen to find efficient solutions or workarounds for this issue. I'd appreciate any suggestions or advice. I'm also open to discussing this further if you're interested in collaborating. Thank you for your time!
Hi, we were facing such a similar situation too for our particular use-case. Maybe someone should start looking into this. I'm also open to collaborate on this.
❓ Question
I'm currently working on a project with my team, developing a MaskablePPO reinforcement learning model with MultiDiscrete action space. Since, our action space is really large, we wanted to create independent actions. However, in our case we've encountered a challenge: our model needs to incorporate action masking that handles dependent actions. This means if one action is selected, it might invalidate other actions, a feature our current model setup doesn't support as it only allows for independent action masking.So briefly, if we use Discrete action space we can handle those action dependencies but our action space becomes massive to handle. If we decide to separate actions and create a MultiDiscrete action space, we won't be able handle dependent actions such as; if I choose [1,,] 1 as my first cction we should be only allowing second and third actions to be 2 or 3. So, [1,2,3] --> is valid while [1,1,3] is not and it should be masked.
I'm keen to find efficient solutions or workarounds for this issue. I'd appreciate any suggestions or advice. I'm also open to discussing this further if you're interested in collaborating. Thank you for your time!
Checklist
The text was updated successfully, but these errors were encountered: