This repository implements different Deep Reinforcement Learning Agents for the pysc2 learning environment as described in the DeepMind StarCraft II paper.
We provide implementations for:
- Advantage Actor Critic (A2C) based on A3C https://arxiv.org/abs/1602.01783
- Fully Connected Policy
- Convolutional LSTM Policy https://arxiv.org/abs/1506.04214
- Proximal Policy Optimization (PPO) https://arxiv.org/abs/1707.06347
- FeUdal Networks (FuN) https://arxiv.org/abs/1703.01161
This repository is part of a student research project which was conducted at the Autonomous Systems Labs, TU Darmstadt by Daniel Palenicek, Marcel Hussing, and Simon Meister.
The repository was originally located at simonmeister/pysc2-rl-agents but has moved to this new location.
The following gives a brief explaination about what we have implemented in this repository. For more detailed information check out the reports.
We have adapted and implemented the FeUdal Networks algorithm for hierarical reinforcement learning on StarCraft II. To be compatable with StarCraft II we account for the spatial state and action space, opposed to the original pubication on Atari.
We implemented these baseline agents to learn the PySC2 minigames. While PPO can only train a FullyConvolutional Policy in the current implementation A2C can additionally train a ConvolutionalLSTM policy.
We document our implementation and results in more depth in the following reports:
- Daniel Palenicek, Marcel Hussing, Simon Meister (Apr. 2018): Deep Reinforcement Learning for StarCraft II
- Daniel Palenicek, Marcel Hussing (Sep. 2018): Adapting Feudal Networks for StarCraft II
- Python 3
- pysc2 (tested with v1.2)
- TensorFlow (tested with 1.4.0)
- StarCraft II and mini games (see below or pysc2)
pip install numpy tensorflow-gpu pysc2==1.2
- Install StarCraft II. On Linux, use 3.16.1. Unzip the package into the home directory.
- Download the
mini games
and extract them to your
~/StarcraftII/Maps/
directory.
Quickstart:
python run.py <experiment-id>
will run the training with default settings for Fully Connected A2C.
To evalutate after training run python run.py <experiment-id> --eval
.
The implementation enables highly configurable experiments via the command line args. To see the full documentation run python run.py --help
.
The most important flags to add to the python run.py <experiment-id>
command include:
--agent
: Choose between A2C, PPO and FeUdal--policy
: Choose the topology of the policy network (not all agents are compatible with every network)--map
: Choose the mini-map which you want to train on--vis
: Visualize the agent
Summaries are written to out/summary/<experiment_name>
and model checkpoints are written to out/models/<experiment_name>
.
For fast training, a GPU is recommended. We ran our experiments on Titan X Pascal and GTX 1080Ti GPUs
On the mini games, we report the following results as best mean over score:
Map | FC | ConvLSTM | PPO | FUN | DeepMind |
---|---|---|---|---|---|
MoveToBeacon | 26 | 26 | 26 | 26 | 26 |
CollectMineralShards | 97 | 93 | - | - | 103 |
FindAndDefeatZerglings | 45 | - | - | - | 45 |
DefeatRoaches | - | - | - | - | 100 |
DefeatZerglingsAndBanelings | 68 | - | - | - | 62 |
CollectMineralsAndGas | - | - | - | - | 3978 |
BuildMarines | - | - | - | - | 3 |
In the following we show plots for the score over episodes.
Note that the DeepMind mean scores are their best individual scores after 100 runs for each game, where the initial learning rate was randomly sampled for each run. We use a constant initial learning rate for a much smaller number of runs due to limited hardware.
This project is licensed under the MIT License (refer to the LICENSE file for details).
The code in rl/environment.py
is based on OpenAI baselines, with adaptions from sc2aibot. Some of the code in rl/agents/a2c/runner.py
is loosely based on sc2aibot. The Convolutional LSTM Cell implementation is taken from carlthome. The FeUdal Networks implementation is inspired by
dmakian.