-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
forcing joint position and setting joint position #6
Comments
Yes that's correct, the joints are forced to a new position rather than being applied a position or torque. This is a design choice made by Replab from which the rl_reach environments are based on. It may be a good idea to ask them directly. I think this approach was adopted for speeding up the simulation time, since it's computationally faster to reset joint positions than applying torques and computing the real physics and inertial forces. Of course, it means that the simulation is less realistic but I found that it wasn't a big issue if the motion amplitude is small enough between each time steps. For example, when I transferred the trained model from Pybullet to the real robot, the robot was able to reproduce the policy learnt exclusively with Pybullet. One downside was that we have to wait for the physical robot to complete its motion at each time step, resulting in a jerking motion (not smooth). I'm planning to look into using the setJointMotorControl2 function to control the joints by torque in the future. |
For information, I have just pushed an update to the WidowX environments where the joints are position-controlled (as opposed to being forced to the new position), see here. You can activate this by creating a new environment and setting I thought it would be useful to compute the real physics simulation, for example to detect collisions. Thanks for your suggestion. The only downside is that the training takes much longer to complete. |
I also did that myself. glad that I could be of some help :) Yes, the training takes very long when doing position control. Unfortunately Pybullet doesnt support multithreading atm... |
Did you manage to train a model successfully with position-controlled joints? If so, please could you share how you implemented it? It seems that I cannot get the model to learn when I change from forced joints to position-controlled joints. |
In general, I was not able to train any models for the past 2 months. I have tried several, but I almost always get similar results: |
When moving with position control the joint values that one gets by calling getjointState can slightly vary from the values that we put into the position control function. This error can accumulate. One idea would be to feed the desired position rather than the one we get from position control. I have also observed that e.g. calcInverseKinematics is extremely inaccurate (even though I dont see them in your code). This could be an indication that even the calcForwardKinematics = True might also lead to inaccuracies. It a very vague idea though |
One more thought: I believe that your threshold for sparse rewards is too small. Especially if you train wihtout HER the model might not see enought reward signals. I chose it to be 0.1 or 0.2 . But I havent been successful either. I also outsourced it to be defined in the init which seemed a bit easiert to handle. |
Another thought: when training her, the reward for sampled goals is calculated using the "compute_reward" function defined in the environment. Because I did not specify the axis along which to compute the norm, I got an array which had the same value in every entry. I changed it s.t. the norm is computed along axis 1 (norm(x,y,z coordinates of gripper). I am wondering if it makes more sense to just use the specified reward by reward_type... |
In order for getjointState to return the correct joint position, you should allow the robot to complete its action by skipping a few frames. For example see here. |
This wasn't an issue for me, I was able to train a model with a threshold of 0.001 (I didn't use a completely sparse reward thought). Feel free to modify it as you wish. |
I'm not an expert on HER but the idea is the use the achieved_goal in order to leverage unsuccessful episodes during training when you're using sparse reward. The compute_reward function must use the achieved_goal array in some way but you can change the function. Currently this function returns the distance between the achieved_goal and the goal, which you want to minimise. |
I have a question regarding the simulation in pybullet.
in the widowx_env, pybullet never does stepSimulation(). Instead, the positions of the robot are forced into the new postition. Why did you choose this approach? In this manner no real physics simulation happens through pybullet...
Thanks in advance
The text was updated successfully, but these errors were encountered: