You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I was taking a look at your code and wondering if you tackle the stale hidden states after each rollout. As I have seen, the code is used in order to be stateful at episode level, and then, when done is found, the hidden states are reset. However, from one rollout to another, the output hidden state of the last rollout is copied in order to be the input hidden state of the current rollout, although the actor-critic network parameters (including GRU) have already been updated.
Is there any reason why you do not recalculate the last rollouts hidden state taking into account the new network weights?
Thank you in advance!
The text was updated successfully, but these errors were encountered:
Hi!
I was taking a look at your code and wondering if you tackle the stale hidden states after each rollout. As I have seen, the code is used in order to be stateful at episode level, and then, when done is found, the hidden states are reset. However, from one rollout to another, the output hidden state of the last rollout is copied in order to be the input hidden state of the current rollout, although the actor-critic network parameters (including GRU) have already been updated.
Is there any reason why you do not recalculate the last rollouts hidden state taking into account the new network weights?
Thank you in advance!
The text was updated successfully, but these errors were encountered: