-
Notifications
You must be signed in to change notification settings - Fork 177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questions regarding BPTT (backpropagation through time) #110
Comments
Hello,
what is your motivation being that?
as many as possible (until we find an invalid transition = new episode starts), so it is at maximum There is a discussion about BPTT here: #103 and in the PR: #53 (comment)
not really, |
Hello,
A very specific one. The reason is that I am implementing a particular algorithm from a paper in which some parameters and the value head are sort of meta-learnt over an outer 'slow' loop, at a given recurrence rate, and some others including the policy head are learnt in an inner 'faster' loop at another recurrence rate. I did manage to implement that behaviour using the more barebones ppo implementation from the torch-ac library, which allows setting the recurrence steps, and I just wanted to try that behaviour with the stable-baselines3 recurrent PPO.
Thanks, I imagined that, so in that case the max. upper bound would be then the default 128 for PPO (or whatever value I set up under n_steps) isn't? Cheers! |
I see...
yes |
Hi,
This is more a question. I am implementing some specific experiments using Recurrent PPO, but at some point I would like to set the number of BPTT steps, I mean in a truncated BPTT fashion (let's say I want a recurrence of 32 steps, for example). My questions are:
I had a look in the code but haven't managed to figure out where this is performed.
Many thanks in advance!
The text was updated successfully, but these errors were encountered: