Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependent Actions in MultiDiscrete Action Space #242

Open
4 tasks done
bbarisbaturay opened this issue May 1, 2024 · 5 comments
Open
4 tasks done

Dependent Actions in MultiDiscrete Action Space #242

bbarisbaturay opened this issue May 1, 2024 · 5 comments
Labels
question Further information is requested

Comments

@bbarisbaturay
Copy link

bbarisbaturay commented May 1, 2024

❓ Question

I'm currently working on a project with my team, developing a MaskablePPO reinforcement learning model with MultiDiscrete action space. Since, our action space is really large, we wanted to create independent actions. However, in our case we've encountered a challenge: our model needs to incorporate action masking that handles dependent actions. This means if one action is selected, it might invalidate other actions, a feature our current model setup doesn't support as it only allows for independent action masking.So briefly, if we use Discrete action space we can handle those action dependencies but our action space becomes massive to handle. If we decide to separate actions and create a MultiDiscrete action space, we won't be able handle dependent actions such as; if I choose [1,,] 1 as my first cction we should be only allowing second and third actions to be 2 or 3. So, [1,2,3] --> is valid while [1,1,3] is not and it should be masked.

I'm keen to find efficient solutions or workarounds for this issue. I'd appreciate any suggestions or advice. I'm also open to discussing this further if you're interested in collaborating. Thank you for your time!

Checklist

@bbarisbaturay bbarisbaturay added the question Further information is requested label May 1, 2024
@araffin
Copy link
Member

araffin commented May 6, 2024

Hello,
I don't have much to say, the discrete solution looks like a good option.
Another one would be to modify the distribution used by SB3.

Otherwise, maybe @vwxyzjn has an idea?

@BaiYu-Code
Copy link

Hi, have you found the solution? I have the same problem.

@akbaig
Copy link

akbaig commented Aug 11, 2024

Hi, we were facing such a similar situation too for our particular use-case. Maybe someone should start looking into this. I'm also open to collaborate on this.

@BaiYu-Code
Copy link

I directly modified the sb3_contrib/common/maskable/distributions.py code, thus implementing the dependent actions problem.

@dbasso98
Copy link

dbasso98 commented Sep 9, 2024

Any chance for @bbarisbaturay to report a code snippet of the dependent action masking process?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants