Learning to Throw with Reinforcement Learning

In this project, we intended to train a humanoid robot to throw a ball using only reinforcement learning. The goal is to let the robot learn to throw a ball as fast or far as possible.

Requirements

Usage

To train a model, use the following command.

python train.py [options for training] [options for specific algorithm]

During training, the process saves the snapshot of the model in the file model-<num_steps>.pth, where <num_steps> is the total number of training steps. The model is saved in the directory defined in --save-dir. To retrain a saved model, use the option --load-model to reload the .pth file.

To evaluate the saved model on a specific environment use the following command.

python evaluate.py <saved_model.pth> [options for evaluation]

It will save the video of testing in the directory defined in --save-dir

Environment names

We implemented five environments for our humanoid to learn.

OurHumanoidStand-v0
OurHumanoidHold-v0
OurHumanoidThrow-v0
OurHumanoidStandToHold-v0
OurHumanoidHoldToThrow-v0

Options for training

Options	Descriptions
`--env {env_name}`	The environment for training.
`--model {"ddpg" or "ppo"}`	The training algorithm.
`--num-envs {number}`	The number of training environments running in parallel.
`--save-dir {dir}`	The directory to save snapshots of model and logging files.
`--load-model {model.pth}`	Retrain the saved snapshot of model.

Options for evaluation

Options	Descriptions
`--env {env_name}`	The environment for testing.
`--num-eval {number}`	The number of times to perform evaluation.
`--save-dir {dir}`	The directory to save the video of running testing.

Options for PPO

Options	Descriptions
`--lr {float number}`	The learning rate for training.
`--use-linear-lr-decay`	With this option, the learning rate will decay linearly to 0 at the end of training.
`--model-hiddens {numbers, e.g. 500 500}`	The dimension of hidden layers.
`--model-activation {"tanh" or "relu"}`	Use tanh or ReLU as activation function.
`--num-steps {number}`	The number of steps is in a training episode.
`--num-env-steps`	The total number of training steps.
`--save-interval`	How many steps between saving the snapshots of the model.

Options for DDPG

Options	Descriptions
`--rate {float number}`	The learning rate.
`--prate {float number}`	The learning rate for policy.

Examples

Train to stand stably using PPO

python train.py --env OurHumanoidStand-v0 --model ppo --num-envs 8 --lr 3e-4 --num-env-steps 10000000 --use-linear-lr-decay --num-steps 256 --model-hiddens 500 500 --model-activation relu --save-dir "experiments/ppo"

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
OurEnvs		OurEnvs
a2c_ppo		a2c_ppo
ddpg		ddpg
.gitignore		.gitignore
Final Report.pdf		Final Report.pdf
README.md		README.md
arg_parser.py		arg_parser.py
env_normalize.py		env_normalize.py
evaluate.py		evaluate.py
normalized_env.py		normalized_env.py
random_action.py		random_action.py
run.sh		run.sh
test_mujoco.py		test_mujoco.py
train.py		train.py
vec_envs.py		vec_envs.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Learning to Throw with Reinforcement Learning

Requirements

Usage

Environment names

Options for training

Options for evaluation

Options for PPO

Options for DDPG

Examples

Codes borrowed from

About

Uh oh!

Releases

Packages

Languages

hanhsienhuang/ReinforcementLearningProject

Folders and files

Latest commit

History

Repository files navigation

Learning to Throw with Reinforcement Learning

Requirements

Usage

Environment names

Options for training

Options for evaluation

Options for PPO

Options for DDPG

Examples

Codes borrowed from

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages