[Proposal] RL Common / Baselines Common Package #503

araffin · 2019-10-09T19:42:59Z

With @hill-a , we came up to the same conclusion about things in the common folder: a lot is not specific to this library and could be separated in a stand-alone python package.

Doing so would:

allow people to develop/build things on that package without depending on a specific backend (i.e. this removes tensorflow dependency)
separate what is related to rl in general vs stable-baselines
ease the transition between tf 1 and tf 2 (Tensorflow 2.0 support? #366) as we could decide to have two separate versions of SB instead of maintaining a version with tf1 and tf2 support at the same time (which seems complicated especially because of the removal of tf.contrib)

As a personal example, I have now an internal (minimal) pytorch version of stable-baselines (I plan to release it at some point but it is used for my own research purposes for now) where I had to copy a lot of what was present in the common folder.

I would like to have your thoughts on that ;)
@erniejunior @AdamGleave @Miffyli

Suggestions for the name of the package are also welcomed, for now we have:

baselines-common (by @hill-a)
rl-common (my proposition)

the repo would be under the stable-baselines team organization: https://github.com/Stable-Baselines-Team

Miffyli · 2019-10-10T09:25:52Z

Which parts of common is this referring to, exactly? Files like distributions.py and policies.py, the other ones (not distributions/policies) or all files? Would the aim be to move most/all backend stuff to this new package, and then we can implement this package with different backends? The stable-baselines portion would then use these tools to implement the final RL algorithms?

I agree this would be a useful thing, especially now with the upcoming transitions to a new backend. It would be a lot of work and debugging to get everything right (and even then probably some clashing with interfaces), but it would help keeping current stable-baselines structure clean. At the same time we can tidy up other things, e.g.

Reorganize/rename some of the code under common which is somewhat sprinkled around
Get rid of the oddballs like layer definitions being in a2c.utils and common.distributions importing from there (but a2c then imports stuff from distributions via policies ???)

As for the name: I vote for "baselines-common". Term "Baselines" is already familiar in (D)RL scene. "rl-common" is nice too, but there is still the very active part of "non-deep reinforcement learning" which is quite different from the "deep reinforcement learning" where network-like models screw up everything :)

Edit: Cpt. Obvious here but the first thing to do is to gather solid baseline results of current stable-baselines/baselines we then should be able to replicate in the new backends or when major things change. I think results in the RLZoo would do the trick?

araffin · 2019-10-10T09:36:07Z

Which parts of common is this referring to, exactly?

Everything that is not related to tensorflow, I see at least (not exhaustive list):

vec envs
noise
maths utils
monitor
debug envs
replay buffer

Reorganize/rename some of the code under common which is somewhat sprinkled around
Get rid of the oddballs like layer definitions being in a2c.utils

Totally agree

When using a new backend, you will need to implement those specific parts:

base class (at least for getting/setting the parameters)
policies
distributions
buffers (rollout buffer and replay buffer), the current replay buffer implementation rely on numpy only but could be improved if using a specific backend

I think results in the RLZoo would do the trick?

That is the whole point of rl zoo ;)
Note: I have successfully replicated results with the pytorch backend, so I think I now got some experience of what can go wrong ;)

Miffyli · 2019-10-10T09:51:21Z

To check I understood this right: In summary, this would move backend independent tools used by one or more RL methods, used by environments or generally used for debugging to a new package.

This sounds good. The lack of assumptions on the backend makes throughout optimization impossible (e.g. using TF tensors to store replay data or so), but that is not what this package is for and I agree this would be very useful for quick debugging and setting up algorithms. 👍

I think repo for such package should be quick to setup by cloning stable-baselines setup with all the CI things and how tests/docs are done, etc?

PyTorch experiments: Yay! Now at least there is one of us who knows how to do them properly ^^.

araffin · 2019-10-10T09:55:03Z

I think repo for such package should be quick to setup by cloning stable-baselines setup with all the CI things and how tests/docs are done, etc?

It will require at least one day I think and then some iteration to refactor things nicely.
But yes, in essence, this is mostly copy-pasting.
The repo will be under stable-baselines-team organization.

Miffyli · 2019-10-10T09:56:32Z

Ah, I was talking about setting it up so we can start working on it ^^. Naturally the full refactoring / better documentation / etc will take longer, but at least would have a place to start with.

Would still like to hear comments from others, though (@erniejunior @AdamGleave)

AdamGleave · 2019-10-12T04:04:15Z

I'm weakly in favour of this. I think separating the common dependencies out would be slightly cleaner, although it might also slightly complicate setting up the development environment, which could deter some new contributors.

I suspect you'll be one of the few users of this change. So if it'll save you time to restructure this then go for it. But unless others chime up I would not do it for the good of the community. Not many people write RL frameworks from scratch, and those that do will often not want to use Stable Baselines "common" implementation. Even if we clean things up there will be lots of design decisions embedded in this. For example, the VecEnv interface works well for parallelizing at small scale on a single machine, but the synchronous API prevents it scaling to distributed RL workers.

araffin · 2019-10-13T17:53:38Z

I suspect you'll be one of the few users of this change.

That's true... Now, you make me ensure about what to do...

slightly complicate setting up the development environment

On that part, I would expect the "common" package to be relatively stable (in term of amount of change) compared to the main repo.

Not many people write RL frameworks from scratch

True, I was mostly thinking of people implementing only one algorithm. For instance, I've already seen the replay buffer implementation copied over.

Do you think there could be an alternative, like a git-submodule instead of a package?
Better option: I think I've seen that somewhere, but I'm not sure it's possible, we create a second package using options in the setup.py while keeping everything else the same (so everything stay in the same repo, we would need maybe to reorganize things though)

Miffyli · 2019-10-14T07:18:30Z

The suggestions sound even more obscure than having a separate package. If goal is to also offer people a set of miscellaneous and independent utilities for RL algorithms, I do not see how git-submodules or the proposed setup.py method helps.

I have personally looked for independent implementations of some algorithms like more sophisticated return/advantage computations (e.g. GAE, V-trace), since it is easy have bugs in them. In that sense a repository of independent and common tools would be nice, but I do not think this would be easy if even possible to do in independent fashion, as the definition of the function can be closely tied to the learning algorithm itself.

AdamGleave · 2019-10-14T21:02:28Z

I agree with @Miffyli that if we're going to split it then it's better to be in a separate package. Probably best to be in its own repo too, although you can have multiple Python packages in a single repository.

araffin added the enhancement New feature or request label Oct 9, 2019

araffin pinned this issue Oct 10, 2019

araffin mentioned this issue Oct 10, 2019

Tensorflow 2.0 support? #366

Closed

Miffyli mentioned this issue Nov 4, 2019

Refactor common #540

Merged

4 tasks

araffin unpinned this issue Nov 23, 2019

araffin closed this as completed in #540 Feb 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Proposal] RL Common / Baselines Common Package #503

[Proposal] RL Common / Baselines Common Package #503

araffin commented Oct 9, 2019

Miffyli commented Oct 10, 2019 •

edited

Loading

araffin commented Oct 10, 2019

Miffyli commented Oct 10, 2019

araffin commented Oct 10, 2019

Miffyli commented Oct 10, 2019 •

edited

Loading

AdamGleave commented Oct 12, 2019

araffin commented Oct 13, 2019

Miffyli commented Oct 14, 2019

AdamGleave commented Oct 14, 2019

[Proposal] RL Common / Baselines Common Package #503

[Proposal] RL Common / Baselines Common Package #503

Comments

araffin commented Oct 9, 2019

Miffyli commented Oct 10, 2019 • edited Loading

araffin commented Oct 10, 2019

Miffyli commented Oct 10, 2019

araffin commented Oct 10, 2019

Miffyli commented Oct 10, 2019 • edited Loading

AdamGleave commented Oct 12, 2019

araffin commented Oct 13, 2019

Miffyli commented Oct 14, 2019

AdamGleave commented Oct 14, 2019

Miffyli commented Oct 10, 2019 •

edited

Loading

Miffyli commented Oct 10, 2019 •

edited

Loading