Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] 'layers' in 'policy_kwargs' does not functional for cnn features in general, but it works in some implementation like DDPG, SAC #526

Closed
seheevic opened this issue Oct 28, 2019 · 6 comments · May be fixed by #587
Labels
documentation Documentation should be updated question Further information is requested

Comments

@seheevic
Copy link

As you see in the code of DQN policy below

with tf.variable_scope("model", reuse=reuse):
with tf.variable_scope("action_value"):
if feature_extraction == "cnn":
extracted_features = cnn_extractor(self.processed_obs, **kwargs)
action_out = extracted_features
else:
extracted_features = tf.layers.flatten(self.processed_obs)
action_out = extracted_features
for layer_size in layers:
action_out = tf_layers.fully_connected(action_out, num_outputs=layer_size, activation_fn=None)
if layer_norm:
action_out = tf_layers.layer_norm(action_out, center=True, scale=True)
action_out = act_fun(action_out)
action_scores = tf_layers.fully_connected(action_out, num_outputs=self.n_actions, activation_fn=None)

layers after extracted_features does not appended for CNN features.
I can understand this as forcing to create full network in the custom cnn_extractor.
But some other policy implementation like DDPG, SAC, you are appending layers in both CNN and MLP cases. You can see this of SAC policy in the code below .
def make_actor(self, obs=None, reuse=False, scope="pi"):
if obs is None:
obs = self.processed_obs
with tf.variable_scope(scope, reuse=reuse):
if self.feature_extraction == "cnn":
pi_h = self.cnn_extractor(obs, **self.cnn_kwargs)
else:
pi_h = tf.layers.flatten(obs)
pi_h = mlp(pi_h, self.layers, self.activ_fn, layer_norm=self.layer_norm)

Is this inconsistency of using 'layers' argument is intended?
Should we be careful about this argument for case-by-case?
Please shed some light on me.
Thanks.

@Miffyli
Copy link
Collaborator

Miffyli commented Oct 28, 2019

You are right: The SAC and DDPG codes run inputs through cnn_extractor and then through some additional layers, while the shared policy code here does not use additional layers after cnn_extractor. Indeed this should be either very clearly documented or preferably fixed so that behavior is same all around. A PR would be very welcome :)

@araffin Was there any special reason for doing it this for DDPG/SAC, in case there is a hidden "gotcha" somewhere?

@Miffyli Miffyli added the question Further information is requested label Oct 28, 2019
seheevic added a commit to victech-dev/stable-baselines that referenced this issue Oct 28, 2019
We are not sure of the intension. So let the code as original.
And just be careful when using CNN feature extraction!
@araffin
Copy link
Collaborator

araffin commented Oct 28, 2019

Was there any special reason for doing it this for DDPG/SAC

I don't think so. I created SAC/TD3 policy mimicking DDPG behavior, but never really used the code as normally you don't use image with DDPG/SAC/TD3. So yes, I think we should document or fix that anyway.

while the shared policy code here does not use additional layers

For the flexible mlp, it is already documented that is works only for vector observations (and not images).

@araffin araffin added the documentation Documentation should be updated label Oct 28, 2019
@seheevic
Copy link
Author

Thanks for the quick answers!
Actually, I couldn't find the comment 'net_arch works only for vector observations' in the doc, but the doc leaves a clue of the association with mlp extraction like 'net_arch: blah blah... (see
mlp_extractor documentation for details)
'. But that's ok to me now.
Can I close this issue?

@Miffyli
Copy link
Collaborator

Miffyli commented Oct 30, 2019

You can leave it open for now until the documentation is fixed. If you feel like, you can suggest a PR to fix the documentation to address this issue :)

@seheevic
Copy link
Author

seheevic commented Nov 28, 2019

@araffin @Miffyli
Recently, I am training some DQN with stable baselines, and I figured out that this problem is not as simple as just fixing document. I think that it would be better to change the code to allow appending flexible mlp afterward the output of cnn extractor.
This is because of the consistency with not only the implementation of other algorithms but also the advantage-stream and value-stream of (default enabled) DQN dueling function.
With current implementation, we are injecting [64, 64] (by default) layers to only value-stream V(s) of dueling network. But it seems that more complex function like A(s, a) would require the additional network also. By enabling layers after cnn extractor, we can have options from which we are allowed to inject some shared layers to cnn extractor and seperated layers to 'layers' parameters in terms of dueling network.
I'll create PR for this.

@araffin
Copy link
Collaborator

araffin commented Aug 14, 2021

Should be fixed in SB3: https://github.com/DLR-RM/stable-baselines3

@araffin araffin closed this as completed Aug 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Documentation should be updated question Further information is requested
Projects
None yet
3 participants