Dependent Actions in MultiDiscrete Action Space #242

bbarisbaturay · 2024-05-01T18:52:01Z

❓ Question

I'm currently working on a project with my team, developing a MaskablePPO reinforcement learning model with MultiDiscrete action space. Since, our action space is really large, we wanted to create independent actions. However, in our case we've encountered a challenge: our model needs to incorporate action masking that handles dependent actions. This means if one action is selected, it might invalidate other actions, a feature our current model setup doesn't support as it only allows for independent action masking.So briefly, if we use Discrete action space we can handle those action dependencies but our action space becomes massive to handle. If we decide to separate actions and create a MultiDiscrete action space, we won't be able handle dependent actions such as; if I choose [1,,] 1 as my first cction we should be only allowing second and third actions to be 2 or 3. So, [1,2,3] --> is valid while [1,1,3] is not and it should be masked.

I'm keen to find efficient solutions or workarounds for this issue. I'd appreciate any suggestions or advice. I'm also open to discussing this further if you're interested in collaborating. Thank you for your time!

Checklist

I have read the documentation
I have checked that there is no similar issue in the repo
If code there is, it is minimal and working
If code there is, it is formatted using the markdown code blocks for both code and stack traces.

araffin · 2024-05-06T07:17:38Z

Hello,
I don't have much to say, the discrete solution looks like a good option.
Another one would be to modify the distribution used by SB3.

Otherwise, maybe @vwxyzjn has an idea?

BaiYu-Code · 2024-07-10T12:03:05Z

Hi, have you found the solution? I have the same problem.

akbaig · 2024-08-11T21:45:42Z

Hi, we were facing such a similar situation too for our particular use-case. Maybe someone should start looking into this. I'm also open to collaborate on this.

BaiYu-Code · 2024-08-26T11:35:24Z

I directly modified the sb3_contrib/common/maskable/distributions.py code, thus implementing the dependent actions problem.

dbasso98 · 2024-09-09T09:02:53Z

Any chance for @bbarisbaturay to report a code snippet of the dependent action masking process?

bbarisbaturay added the question Further information is requested label May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dependent Actions in MultiDiscrete Action Space #242

Dependent Actions in MultiDiscrete Action Space #242

bbarisbaturay commented May 1, 2024 •

edited

Loading

araffin commented May 6, 2024

BaiYu-Code commented Jul 10, 2024

akbaig commented Aug 11, 2024

BaiYu-Code commented Aug 26, 2024

dbasso98 commented Sep 9, 2024

Dependent Actions in MultiDiscrete Action Space #242

Dependent Actions in MultiDiscrete Action Space #242

Comments

bbarisbaturay commented May 1, 2024 • edited Loading

❓ Question

Checklist

araffin commented May 6, 2024

BaiYu-Code commented Jul 10, 2024

akbaig commented Aug 11, 2024

BaiYu-Code commented Aug 26, 2024

dbasso98 commented Sep 9, 2024

bbarisbaturay commented May 1, 2024 •

edited

Loading