forked from udacity/deep-reinforcement-learning
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Reorganized README and Instructions.
- Loading branch information
1 parent
ed1f52a
commit 57ce966
Showing
19 changed files
with
614 additions
and
533 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,120 @@ | ||
[//]: # (Image References) | ||
|
||
[image1]: https://user-images.githubusercontent.com/10624937/42135602-b0335606-7d12-11e8-8689-dd1cf9fa11a9.gif "Trained Agents" | ||
[image2]: https://user-images.githubusercontent.com/10624937/42386929-76f671f0-8106-11e8-9376-f17da2ae852e.png "Kernel" | ||
|
||
# Deep Reinforcement Learning Nanodegree | ||
|
||
![Trained Agents][image1] | ||
|
||
This repository contains material related to Udacity's [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893) program. | ||
|
||
## Table of Contents | ||
|
||
### Tutorials | ||
|
||
The tutorials lead you through implementing various algorithms in reinforcement learning. All of the code is in PyTorch (v0.4) and Python 3. | ||
|
||
* [Dynamic Programming](https://github.com/udacity/deep-reinforcement-learning/tree/master/dynamic-programming): Implement Dynamic Programming algorithms such as Policy Evaluation, Policy Improvement, Policy Iteration, and Value Iteration. | ||
* [Monte Carlo](https://github.com/udacity/deep-reinforcement-learning/tree/master/monte-carlo): Implement Monte Carlo methods for prediction and control. | ||
* [Temporal-Difference](https://github.com/udacity/deep-reinforcement-learning/tree/master/temporal-difference): Implement Temporal-Difference methods such as Sarsa, Q-Learning, and Expected Sarsa. | ||
* [Discretization](https://github.com/udacity/deep-reinforcement-learning/tree/master/discretization): Learn how to discretize continuous state spaces, and solve the Mountain Car environment. | ||
* [Tile Coding](https://github.com/udacity/deep-reinforcement-learning/tree/master/tile-coding): Implement a method for discretizing continuous state spaces that enables better generalization. | ||
* [Deep Q-Network](https://github.com/udacity/deep-reinforcement-learning/tree/master/dqn): Explore how to use a Deep Q-Network (DQN) to navigate a space vehicle without crashing. | ||
* [Robotics](https://github.com/dusty-nv/jetson-reinforcement): Use a C++ API to train reinforcement learning agents from virtual robotic simulation in 3D. (_External link_) | ||
* [Hill Climbing](https://github.com/udacity/deep-reinforcement-learning/tree/master/hill-climbing): Use hill climbing with adaptive noise scaling to balance a pole on a moving cart. | ||
* [Cross-Entropy Method](https://github.com/udacity/deep-reinforcement-learning/tree/master/cross-entropy): Use the cross-entropy method to train a car to navigate a steep hill. | ||
* [REINFORCE](https://github.com/udacity/deep-reinforcement-learning/tree/master/reinforce): Learn how to use Monte Carlo Policy Gradients to solve a classic control task. | ||
* **Proximal Policy Optimization**: Explore how to use Proximal Policy Optimization (PPO) to solve a classic reinforcement learning task. (_Coming soon!_) | ||
* **Deep Deterministic Policy Gradients**: Explore how to use Deep Deterministic Policy Gradients (DDPG) with OpenAI Gym environments. | ||
* [Pendulum](https://github.com/udacity/deep-reinforcement-learning/tree/master/ddpg-pendulum): Use OpenAI Gym's Pendulum environment. | ||
* [BipedalWalker](https://github.com/udacity/deep-reinforcement-learning/tree/master/ddpg-bipedal): Use OpenAI Gym's BipedalWalker environment. | ||
* [Finance](https://github.com/udacity/deep-reinforcement-learning/tree/master/finance): Train an agent to discover optimal trading strategies. | ||
|
||
### Labs / Projects | ||
|
||
The labs and projects can be found below. All of the projects use rich simulation environments from [Unity ML-Agents](https://github.com/Unity-Technologies/ml-agents). In the [Deep Reinforcement Learning Nanodegree](https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893) program, you will receive a review of your project. These reviews are meant to give you personalized feedback and to tell you what can be improved in your code. | ||
|
||
* [The Taxi Problem](https://github.com/udacity/deep-reinforcement-learning/tree/master/lab-taxi): In this lab, you will train a taxi to pick up and drop off passengers. | ||
* [Navigation](https://github.com/udacity/deep-reinforcement-learning/tree/master/p1_navigation): In the first project, you will train an agent to collect yellow bananas while avoiding blue bananas. | ||
* [Continuous Control](https://github.com/udacity/deep-reinforcement-learning/tree/master/p2_continuous-control): In the second project, you will train an robotic arm to reach target locations. | ||
* [Collaboration and Competition](https://github.com/udacity/deep-reinforcement-learning/tree/master/p3_collab-compet): In the third project, you will train a pair of agents to play tennis! | ||
|
||
### Resources | ||
|
||
* [Cheatsheet](https://github.com/udacity/deep-reinforcement-learning/blob/master/cheatsheet): You are encouraged to use [this PDF file](https://github.com/udacity/deep-reinforcement-learning/blob/master/cheatsheet/cheatsheet.pdf) to guide your study of reinforcement learning. | ||
|
||
## OpenAI Gym Benchmarks | ||
|
||
### Classic Control | ||
- `Acrobot-v1` with [Tile Coding](https://github.com/udacity/deep-reinforcement-learning/blob/master/tile-coding/Tile_Coding_Solution.ipynb) and Q-Learning | ||
- `Cartpole-v0` with [Hill Climbing](https://github.com/udacity/deep-reinforcement-learning/blob/master/hill-climbing/Hill_Climbing.ipynb) | solved in 13 episodes | ||
- `Cartpole-v0` with [REINFORCE](https://github.com/udacity/deep-reinforcement-learning/blob/master/reinforce/REINFORCE.ipynb) | solved in 691 episodes | ||
- `MountainCarContinuous-v0` with [Cross-Entropy Method](https://github.com/udacity/deep-reinforcement-learning/blob/master/cross-entropy/CEM.ipynb) | solved in 47 iterations | ||
- `MountainCar-v0` with [Uniform-Grid Discretization](https://github.com/udacity/deep-reinforcement-learning/blob/master/discretization/Discretization_Solution.ipynb) and Q-Learning | solved in <50000 episodes | ||
- `Pendulum-v0` with [Deep Deterministic Policy Gradients (DDPG)](https://github.com/udacity/deep-reinforcement-learning/blob/master/ddpg-pendulum/DDPG.ipynb) | ||
|
||
### Box2d | ||
- `BipedalWalker-v2` with [Deep Deterministic Policy Gradients (DDPG)](https://github.com/udacity/deep-reinforcement-learning/blob/master/ddpg-bipedal/DDPG.ipynb) | ||
- `CarRacing-v0` with **Deep Q-Networks (DQN)** | _Coming soon!_ | ||
- `LunarLander-v2` with [Deep Q-Networks (DQN)](https://github.com/udacity/deep-reinforcement-learning/blob/master/dqn/solution/Deep_Q_Network_Solution.ipynb) | solved in 1504 episodes | ||
|
||
### Toy Text | ||
- `FrozenLake-v0` with [Dynamic Programming](https://github.com/udacity/deep-reinforcement-learning/blob/master/dynamic-programming/Dynamic_Programming_Solution.ipynb) | ||
- `Blackjack-v0` with [Monte Carlo Methods](https://github.com/udacity/deep-reinforcement-learning/blob/master/monte-carlo/Monte_Carlo_Solution.ipynb) | ||
- `CliffWalking-v0` with [Temporal-Difference Methods](https://github.com/udacity/deep-reinforcement-learning/blob/master/temporal-difference/Temporal_Difference_Solution.ipynb) | ||
|
||
## Dependencies | ||
|
||
To set up your python environment to run the code in this repository, follow the instructions below. | ||
|
||
0. Install Conda via Command Line | ||
Anaconda is a distribution of Python that aims to simplify package management and deployment. The main use for Anaconda in this repository would be the creation of an isolated environment to install your deep learning packages and dependencies into. | ||
- [Official Guide](https://docs.anaconda.com/anaconda/install) | ||
- [Youtube Guide for MAC](https://www.youtube.com/watch?v=oWVTO_69U4c&t=35s) | ||
|
||
1. Create (and activate) a new environment with Python 3.6. | ||
|
||
- __Linux__ or __Mac__: | ||
```bash | ||
conda create --name drlnd python=3.6 | ||
source activate drlnd | ||
``` | ||
- __Windows__: | ||
```bash | ||
conda create --name drlnd python=3.6 | ||
activate drlnd | ||
``` | ||
|
||
2. Follow the instructions in [this repository](https://github.com/openai/gym) to perform a minimal install of OpenAI gym. | ||
- Next, install the **classic control** environment group by following the instructions [here](https://github.com/openai/gym#classic-control) using `pip install gym[classic_control]`. | ||
- Then, install the **box2d** environment group by following the instructions [here](https://github.com/openai/gym#box2d) using `pip install gym[box2d]`. | ||
|
||
3. If you would like to run your projects with CUDA GPU, you can install relevant CUDA & pytorch packages via [the instructions here](https://pytorch.org/get-started/locally/). **You can verify if CUDA is used by PyTorch by running `torch.cuda.is_available()` in your python script.** | ||
|
||
4. Clone the repository (if you haven't already!), and navigate to the `python/` folder. Then, install several dependencies. | ||
```bash | ||
git clone https://github.com/udacity/deep-reinforcement-learning.git | ||
cd deep-reinforcement-learning/python | ||
pip install . # NOTE: You may need to comment out pytorch version 0.4.0 in requirements.txt, and manually install latest pytorch using pip install torch to prevent weird installation errors (Manually installing 0.4.0 may causes jupyter notebook hanging error). | ||
``` | ||
|
||
5. Create an [IPython kernel](http://ipython.readthedocs.io/en/stable/install/kernel_install.html) for the `drlnd` environment. | ||
```bash | ||
python -m ipykernel install --user --name drlnd --display-name "drlnd" | ||
``` | ||
|
||
6. Before running code in a notebook, change the kernel to match the `drlnd` environment by using the drop-down `Kernel` menu. <br> | ||
|
||
|
||
**NOTE:** You may notice that code cells may not execute as expected (E.g. Hangs for no reason). Please check `python/requirements.txt` to ensure that the appropriate tensorflow & pytorch packages are installed. | ||
|
||
![Kernel][image2] | ||
|
||
## Want to learn more? | ||
|
||
<p align="center">Come learn with us in the <a href="https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893">Deep Reinforcement Learning Nanodegree</a> program at Udacity!</p> | ||
|
||
<p align="center"><a href="https://www.udacity.com/course/deep-reinforcement-learning-nanodegree--nd893"> | ||
<img width="503" height="133" src="https://user-images.githubusercontent.com/10624937/42135812-1829637e-7d16-11e8-9aa1-88056f23f51e.png"></a> | ||
</p> |
Oops, something went wrong.