This repository has been archived by the owner on Dec 11, 2022. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 462
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Updated README and added .nojekyll file for github pages to work prop…
…erly
- Loading branch information
Showing
2 changed files
with
19 additions
and
17 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,12 @@ | ||
# Coach | ||
|
||
[](https://circleci.com/gh/IntelAI/coach-aws) | ||
[](https://github.com/NervanaSystems/coach/blob/master/LICENSE) | ||
[](https://nervanasystems.github.io/coach/) | ||
[](https://doi.org/10.5281/zenodo.1134898) | ||
|
||
<p align="center"><img src="img/coach_logo.png" alt="Coach Logo" width="200"/></p> | ||
|
||
Coach is a python reinforcement learning research framework containing implementation of many state-of-the-art algorithms. | ||
Coach is a python reinforcement learning framework containing implementation of many state-of-the-art algorithms. | ||
|
||
It exposes a set of easy-to-use APIs for experimenting with new RL algorithms, and allows simple integration of new environments to solve. | ||
Basic RL components (algorithms, environments, neural network architectures, exploration policies, ...) are well decoupled, so that extending and reusing existing components is fairly painless. | ||
|
@@ -28,7 +27,8 @@ coach -p CartPole_DQN -r | |
Blog posts from the Intel® AI website: | ||
* [Release 0.8.0](https://ai.intel.com/reinforcement-learning-coach-intel/) (initial release) | ||
* [Release 0.9.0](https://ai.intel.com/reinforcement-learning-coach-carla-qr-dqn/) | ||
* [Release 0.10.0](https://ai.intel.com/introducing-reinforcement-learning-coach-0-10-0/) (current release) | ||
* [Release 0.10.0](https://ai.intel.com/introducing-reinforcement-learning-coach-0-10-0/) | ||
* Release 0.11 (current release) | ||
|
||
Contacting the Coach development team is also possible through the email [[email protected]]([email protected]) | ||
|
||
|
@@ -149,7 +149,7 @@ For example: | |
coach -r -p Atari_NEC -lvl pong | ||
``` | ||
|
||
There are several types of agents that can benefit from running them in a distrbitued fashion with multiple workers in parallel. Each worker interacts with its own copy of the environment but updates a shared network, which improves the data collection speed and the stability of the learning process. | ||
There are several types of agents that can benefit from running them in a distributed fashion with multiple workers in parallel. Each worker interacts with its own copy of the environment but updates a shared network, which improves the data collection speed and the stability of the learning process. | ||
To specify the number of workers to run, use the `-n` flag. | ||
|
||
For example: | ||
|
@@ -164,6 +164,11 @@ It is easy to create new presets for different levels or environments by followi | |
|
||
More usage examples can be found [here](https://nervanasystems.github.io/coach/usage/index.html). | ||
|
||
### Distributed Multi-Node Coach | ||
|
||
As of release 0.11 Coach supports horizontal scaling for training RL agents on multiple nodes. In release 0.11 this was tested on the ClippedPPO and DQN agents. | ||
For usage instructions please refer to the documentation [here](https://nervanasystems.github.io/coach/dist_usage.html) | ||
|
||
### Running Coach Dashboard (Visualization) | ||
Training an agent to solve an environment can be tricky, at times. | ||
|
||
|
@@ -186,15 +191,14 @@ dashboard | |
|
||
* *OpenAI Gym:* | ||
|
||
Installed by default by Coach's installer. The version used by Coach is 0.10.5. | ||
Installed by default by Coach's installer | ||
|
||
* *ViZDoom:* | ||
|
||
Follow the instructions described in the ViZDoom repository - | ||
|
||
https://github.com/mwydmuch/ViZDoom | ||
|
||
The version currently used by Coach is 1.1.4. | ||
Additionally, Coach assumes that the environment variable VIZDOOM_ROOT points to the ViZDoom installation directory. | ||
|
||
* *Roboschool:* | ||
|
@@ -231,15 +235,11 @@ dashboard | |
|
||
https://github.com/deepmind/pysc2 | ||
|
||
The version used by Coach is 2.0.1 | ||
|
||
* *DeepMind Control Suite:* | ||
|
||
Follow the instructions described in the DeepMind Control Suite repository - | ||
|
||
https://github.com/deepmind/dm_control | ||
|
||
The version used by Coach is 0.0.0 | ||
|
||
|
||
## Supported Algorithms | ||
|
@@ -257,23 +257,25 @@ dashboard | |
* [Persistent Advantage Learning (PAL)](https://arxiv.org/abs/1512.04860) ([code](rl_coach/agents/pal_agent.py)) | ||
* [Categorical Deep Q Network (C51)](https://arxiv.org/abs/1707.06887) ([code](rl_coach/agents/categorical_dqn_agent.py)) | ||
* [Quantile Regression Deep Q Network (QR-DQN)](https://arxiv.org/pdf/1710.10044v1.pdf) ([code](rl_coach/agents/qr_dqn_agent.py)) | ||
* [N-Step Q Learning](https://arxiv.org/abs/1602.01783) | **Distributed** ([code](rl_coach/agents/n_step_q_agent.py)) | ||
* [N-Step Q Learning](https://arxiv.org/abs/1602.01783) | **Multi Worker Single Node** ([code](rl_coach/agents/n_step_q_agent.py)) | ||
* [Neural Episodic Control (NEC)](https://arxiv.org/abs/1703.01988) ([code](rl_coach/agents/nec_agent.py)) | ||
* [Normalized Advantage Functions (NAF)](https://arxiv.org/abs/1603.00748.pdf) | **Distributed** ([code](rl_coach/agents/naf_agent.py)) | ||
* [Normalized Advantage Functions (NAF)](https://arxiv.org/abs/1603.00748.pdf) | **Multi Worker Single Node** ([code](rl_coach/agents/naf_agent.py)) | ||
* [Rainbow](https://arxiv.org/abs/1710.02298) ([code](rl_coach/agents/rainbow_dqn_agent.py)) | ||
|
||
### Policy Optimization Agents | ||
* [Policy Gradients (PG)](http://www-anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf) | **Distributed** ([code](rl_coach/agents/policy_gradients_agent.py)) | ||
* [Asynchronous Advantage Actor-Critic (A3C)](https://arxiv.org/abs/1602.01783) | **Distributed** ([code](rl_coach/agents/actor_critic_agent.py)) | ||
* [Deep Deterministic Policy Gradients (DDPG)](https://arxiv.org/abs/1509.02971) | **Distributed** ([code](rl_coach/agents/ddpg_agent.py)) | ||
* [Policy Gradients (PG)](http://www-anw.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf) | **Multi Worker Single Node** ([code](rl_coach/agents/policy_gradients_agent.py)) | ||
* [Asynchronous Advantage Actor-Critic (A3C)](https://arxiv.org/abs/1602.01783) | **Multi Worker Single Node** ([code](rl_coach/agents/actor_critic_agent.py)) | ||
* [Deep Deterministic Policy Gradients (DDPG)](https://arxiv.org/abs/1509.02971) | **Multi Worker Single Node** ([code](rl_coach/agents/ddpg_agent.py)) | ||
* [Proximal Policy Optimization (PPO)](https://arxiv.org/pdf/1707.06347.pdf) ([code](rl_coach/agents/ppo_agent.py)) | ||
* [Clipped Proximal Policy Optimization (CPPO)](https://arxiv.org/pdf/1707.06347.pdf) | **Distributed** ([code](rl_coach/agents/clipped_ppo_agent.py)) | ||
* [Clipped Proximal Policy Optimization (CPPO)](https://arxiv.org/pdf/1707.06347.pdf) | **Multi Worker Single Node** ([code](rl_coach/agents/clipped_ppo_agent.py)) | ||
* [Generalized Advantage Estimation (GAE)](https://arxiv.org/abs/1506.02438) ([code](rl_coach/agents/actor_critic_agent.py#L86)) | ||
|
||
### General Agents | ||
* [Direct Future Prediction (DFP)](https://arxiv.org/abs/1611.01779) | **Distributed** ([code](rl_coach/agents/dfp_agent.py)) | ||
* [Direct Future Prediction (DFP)](https://arxiv.org/abs/1611.01779) | **Multi Worker Single Node** ([code](rl_coach/agents/dfp_agent.py)) | ||
|
||
### Imitation Learning Agents | ||
* Behavioral Cloning (BC) ([code](rl_coach/agents/bc_agent.py)) | ||
* [Conditional Imitation Learning](https://arxiv.org/abs/1710.02410) ([code](rl_coach/agents/cil_agent.py)) | ||
|
||
### Hierarchical Reinforcement Learning Agents | ||
* [Hierarchical Actor Critic (HAC)](https://arxiv.org/abs/1712.00948.pdf) ([code](rl_coach/agents/ddpg_hac_agent.py)) | ||
|