Use NN with better sequential modeling ability #5

crizCraig · 2020-02-10T20:10:03Z

We currently use an MLP for the actor and critic networks in PPO. This should be fine so long as the environment is fully observable, but presents scaling issues when dealing with variable numbers of agents in the scene. See this TODO for context.

Also, we eventually will need to introduce partial observability of other agents to simulate the occlusion that occurs in real-world vehicles - so having a NN capable of some type of memory / sequential modeling will be necessary eventually. OpenAI used LSTMs with PPO for Dota2, so there may be some info on doing that. Also transformers seemingly handle long term dependencies and sequential modeling more efficiently, and have been successfully applied to RL with Deepmind's GTrXL work.

Note: We should move to Pytorch first to make NN modifications much more tractable, fun, sane, pleasurable, easy, fast etc...

crizCraig added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers labels Feb 10, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use NN with better sequential modeling ability #5

Use NN with better sequential modeling ability #5

crizCraig commented Feb 10, 2020 •

edited

Loading

Use NN with better sequential modeling ability #5

Use NN with better sequential modeling ability #5

Comments

crizCraig commented Feb 10, 2020 • edited Loading

crizCraig commented Feb 10, 2020 •

edited

Loading