The Recurrent Deterministic Policy Gradient(RDPG) now works on the Fetch Reach Gym Env. The network architecture for both Critic and Actor is according to the following Paper: Sim-to-Real Transfer of Robotic Control with Dynamics Randomization
Code will updated soon!
The Learning curve with the Hindsight Experience Replay and the RDPG
Video of a Successful Implementation