Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use NN with better sequential modeling ability #5

Open
crizCraig opened this issue Feb 10, 2020 · 0 comments
Open

Use NN with better sequential modeling ability #5

crizCraig opened this issue Feb 10, 2020 · 0 comments
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed

Comments

@crizCraig
Copy link
Member

crizCraig commented Feb 10, 2020

We currently use an MLP for the actor and critic networks in PPO. This should be fine so long as the environment is fully observable, but presents scaling issues when dealing with variable numbers of agents in the scene. See this TODO for context.

Also, we eventually will need to introduce partial observability of other agents to simulate the occlusion that occurs in real-world vehicles - so having a NN capable of some type of memory / sequential modeling will be necessary eventually. OpenAI used LSTMs with PPO for Dota2, so there may be some info on doing that. Also transformers seemingly handle long term dependencies and sequential modeling more efficiently, and have been successfully applied to RL with Deepmind's GTrXL work.

Note: We should move to Pytorch first to make NN modifications much more tractable, fun, sane, pleasurable, easy, fast etc...

@crizCraig crizCraig added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers labels Feb 10, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant