DDPG

This implementation follows the original paper with slightly different hyperparameters as described here.

To create a new MPC instance

# create a model
model = EnvironmentModel(state_dim, action_dim, [100,100], [], predicts_delta=True,batch_norm=True).to('cpu')
# select a loss function that works on our model
loss = torch.nn.MSELoss()
# create a trainer for the model
trainer = ModelTrainer(model, loss_func=loss, weight_decay=[1.e-5, 1.e-5],epochs=5, lr=1.e-3, lr_decay=1., batch_size=25)
# plug evey thing into mpc
mpc = MPC(gym.make("Pendulum-v0"), rewards["Pendulum-v0"], model, trainer, trial_horizon=1000,
          device="cpu", warmup_trials=5, learning_trials=30,
          predict_horizon=12, render=5)
# start training
mpc.sim_train()

MPC is not built to learn rewards. This means that we can only learn environments for which a reward function was implemented and registered in reward.py. The key of a reward function in the rewards dict should equal the name of the environment the reward function should be used for.

MPC currently does not support saving and loading of learnt models.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Readme.md

Readme.md

DDPG

Files

Readme.md

Latest commit

History

Readme.md

File metadata and controls

DDPG