Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions regarding BPTT (backpropagation through time) #110

Closed
ibagur opened this issue Oct 19, 2022 · 3 comments
Closed

Questions regarding BPTT (backpropagation through time) #110

ibagur opened this issue Oct 19, 2022 · 3 comments

Comments

@ibagur
Copy link

ibagur commented Oct 19, 2022

Hi,

This is more a question. I am implementing some specific experiments using Recurrent PPO, but at some point I would like to set the number of BPTT steps, I mean in a truncated BPTT fashion (let's say I want a recurrence of 32 steps, for example). My questions are:

  • In the current implementation, how many BPTT steps are performed?
  • Is it possible to change this as an hyper-parameter?

I had a look in the code but haven't managed to figure out where this is performed.

Many thanks in advance!

@araffin
Copy link
Member

araffin commented Oct 21, 2022

Hello,

the number of BPTT steps, I mean in a truncated BPTT fashion

what is your motivation being that?

In the current implementation, how many BPTT steps are performed?

as many as possible (until we find an invalid transition = new episode starts), so it is at maximum n_steps (actually it is also bounded by the max number of steps per episode).

There is a discussion about BPTT here: #103 and in the PR: #53 (comment)

Is it possible to change this as an hyper-parameter?

not really, n_steps does act as a upper bound on it though.

@ibagur
Copy link
Author

ibagur commented Oct 21, 2022

Hello,

what is your motivation being that?

A very specific one. The reason is that I am implementing a particular algorithm from a paper in which some parameters and the value head are sort of meta-learnt over an outer 'slow' loop, at a given recurrence rate, and some others including the policy head are learnt in an inner 'faster' loop at another recurrence rate. I did manage to implement that behaviour using the more barebones ppo implementation from the torch-ac library, which allows setting the recurrence steps, and I just wanted to try that behaviour with the stable-baselines3 recurrent PPO.

as many as possible (until we find an invalid transition = new episode starts), so it is at maximum n_steps (actually it is also bounded by the max number of steps per episode).

Thanks, I imagined that, so in that case the max. upper bound would be then the default 128 for PPO (or whatever value I set up under n_steps) isn't?

Cheers!

@araffin
Copy link
Member

araffin commented Oct 22, 2022

A very specific one.

I see...

Thanks, I imagined that, so in that case the max. upper bound would be then the default 128 for PPO (or whatever value I set up under n_steps) isn't?

yes

@araffin araffin closed this as completed Oct 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants