Use NN with better sequential modeling ability #5
Labels
enhancement
New feature or request
good first issue
Good for newcomers
help wanted
Extra attention is needed
We currently use an MLP for the actor and critic networks in PPO. This should be fine so long as the environment is fully observable, but presents scaling issues when dealing with variable numbers of agents in the scene. See this TODO for context.
Also, we eventually will need to introduce partial observability of other agents to simulate the occlusion that occurs in real-world vehicles - so having a NN capable of some type of memory / sequential modeling will be necessary eventually. OpenAI used LSTMs with PPO for Dota2, so there may be some info on doing that. Also transformers seemingly handle long term dependencies and sequential modeling more efficiently, and have been successfully applied to RL with Deepmind's GTrXL work.
Note: We should move to Pytorch first to make NN modifications much more tractable, fun, sane, pleasurable, easy, fast etc...
The text was updated successfully, but these errors were encountered: