-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ddpg + her #7
Comments
*Addendum: Even with only ddpg, the robot arm moves into said position and it is very difficult for him to change its position from there. I also implemented ddpg from scratch, corroborated it with "pendulum-v0" gym sample environment and tried it with my robot environment, but the result was similar. After around 10 optimization steps my cumulative reward begins to (slowly) diverge towards the negative. |
Hi Stefan, I did a quick check and I didn't encounter any problem in training a model with DDPG. Can you also give more details on your training environment (i.e. observation shape, reward function, action shape, fixed / random goal, fixed / moving goal, action space) and DDPG hyperparameters? |
By folding I mean that the robot goes into a configuration where all joint angles are at max or at min and it cant get out of that configuration. I think it is best to give you a link to my repository. I adapted my GymEnv structure from your repository, so it is pretty similar. I stopped tracking your repository though, so I wills stick with an older version. The init for the environment on the picture are look like this: id='ur5e_reacher-v5', The hyperparams look like this: ur5e_reacher-v5: |
I don't have time for a case by case troubleshooting but I can suggest a few things:
|
I was wondering whether you have tried to train a model with ddpg + her.
I had some success training sac + her but with ddpg my arm "folds" itself by eventually setting q to joint limits.
If you have, maybe you could share some thoughts on it. Thanks in advance.
The text was updated successfully, but these errors were encountered: