The Recurrent Deterministic Policy Gradient(RDPG) now works on the Fetch Reach Gym Env. The network architecture for both Critic and Actor is according to the following Paper: Sim-to-Real Transfer of Robotic Control with Dynamics Randomization
Code will updated soon!
The Learning curve with the Hindsight Experience Replay and the RDPG
Video of a Successful Implementation https://user-images.githubusercontent.com/12818429/110194968-5a90e380-7df0-11eb-8797-551ab8df8717.mp4