clip in PPOLoss #2334
-
Hi. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hello! As you mention, the action sampling is defined by de actor, and is independent of the algorithm. Essentially, TorchRL has a ProbabilisticActor class that can be used handle the probabilistic sampling. You can add it at the end of your model and specify the distribution you want (e.g. TanhNormal). As long as the outputs of your model (the keys of the output TensorDict) match the inputs expected by the distribution, the ProbabilisticActor will automatically sample the action and add it to the output TensorDict. An example of how to define your ProbabilisticActor can be found in the sota-implementation for PPO for MuJoCo environments: https://github.com/pytorch/rl/blob/main/sota-implementations/ppo/utils_mujoco.py#L83 I am not sure if I understood the 1 / sigma part. I don't think there is any existing module to flip sigma and obtain 1/sigma, but if you need to do that for some reason you could create a very simple custom TensorDictModule that does it in the forward method and concatenate it at the end of your model, before the ProbabilisticActor. For the PPO clip parameter, it can be fixed in the PPO class. |
Beta Was this translation helpful? Give feedback.
Hello!
As you mention, the action sampling is defined by de actor, and is independent of the algorithm.
Essentially, TorchRL has a ProbabilisticActor class that can be used handle the probabilistic sampling. You can add it at the end of your model and specify the distribution you want (e.g. TanhNormal). As long as the outputs of your model (the keys of the output TensorDict) match the inputs expected by the distribution, the ProbabilisticActor will automatically sample the action and add it to the output TensorDict.
An example of how to define your ProbabilisticActor can be found in the sota-implementation for PPO for MuJoCo environments: https://github.com/pytorch/rl/blob/main/sota-imple…