Questions regarding BPTT (backpropagation through time) #110

ibagur · 2022-10-19T16:59:24Z

Hi,

This is more a question. I am implementing some specific experiments using Recurrent PPO, but at some point I would like to set the number of BPTT steps, I mean in a truncated BPTT fashion (let's say I want a recurrence of 32 steps, for example). My questions are:

In the current implementation, how many BPTT steps are performed?
Is it possible to change this as an hyper-parameter?

I had a look in the code but haven't managed to figure out where this is performed.

Many thanks in advance!

araffin · 2022-10-21T16:10:21Z

Hello,

the number of BPTT steps, I mean in a truncated BPTT fashion

what is your motivation being that?

In the current implementation, how many BPTT steps are performed?

as many as possible (until we find an invalid transition = new episode starts), so it is at maximum n_steps (actually it is also bounded by the max number of steps per episode).

There is a discussion about BPTT here: #103 and in the PR: #53 (comment)

Is it possible to change this as an hyper-parameter?

not really, n_steps does act as a upper bound on it though.

ibagur · 2022-10-21T18:16:00Z

Hello,

what is your motivation being that?

A very specific one. The reason is that I am implementing a particular algorithm from a paper in which some parameters and the value head are sort of meta-learnt over an outer 'slow' loop, at a given recurrence rate, and some others including the policy head are learnt in an inner 'faster' loop at another recurrence rate. I did manage to implement that behaviour using the more barebones ppo implementation from the torch-ac library, which allows setting the recurrence steps, and I just wanted to try that behaviour with the stable-baselines3 recurrent PPO.

as many as possible (until we find an invalid transition = new episode starts), so it is at maximum n_steps (actually it is also bounded by the max number of steps per episode).

Thanks, I imagined that, so in that case the max. upper bound would be then the default 128 for PPO (or whatever value I set up under n_steps) isn't?

Cheers!

araffin · 2022-10-22T16:45:33Z

A very specific one.

I see...

Thanks, I imagined that, so in that case the max. upper bound would be then the default 128 for PPO (or whatever value I set up under n_steps) isn't?

yes

araffin closed this as completed Oct 22, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions regarding BPTT (backpropagation through time) #110

Questions regarding BPTT (backpropagation through time) #110

ibagur commented Oct 19, 2022

araffin commented Oct 21, 2022

ibagur commented Oct 21, 2022

araffin commented Oct 22, 2022

Questions regarding BPTT (backpropagation through time) #110

Questions regarding BPTT (backpropagation through time) #110

Comments

ibagur commented Oct 19, 2022

araffin commented Oct 21, 2022

ibagur commented Oct 21, 2022

araffin commented Oct 22, 2022