You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to understand how the RecurrentActorCriticPolicy works. Coming from an NLP background I am used to have tensors of the shape (batch_size, seq_len, feature_dim) as input to the LSTM (and optional starting hidden states). From what I am seeing, however, the LSTM implemented basically allows only to feed sequence of length 1
In fact, by zipping features_sequence (with shape [seq_len, n_envs, feature_dims]) and episode_starts (with shape [n_envs, -1]), in the case of 1 environment, we only allow seq_len to be 1.
Is this intended and am I reading this correctly? Is the logic behind that since we keep propagating the state we are still happy with sequences of length 1?
Checklist
I have checked that there is no similar issue in the repo
📚 Documentation
I am trying to understand how the RecurrentActorCriticPolicy works. Coming from an NLP background I am used to have tensors of the shape (batch_size, seq_len, feature_dim) as input to the LSTM (and optional starting hidden states). From what I am seeing, however, the LSTM implemented basically allows only to feed sequence of length 1
stable-baselines3-contrib/sb3_contrib/common/recurrent/policies.py
Line 198 in 25b4326
In fact, by zipping features_sequence (with shape
[seq_len, n_envs, feature_dims]
) and episode_starts (with shape[n_envs, -1]
), in the case of 1 environment, we only allow seq_len to be 1.Is this intended and am I reading this correctly? Is the logic behind that since we keep propagating the state we are still happy with sequences of length 1?
Checklist
The text was updated successfully, but these errors were encountered: