[Question] 'layers' in 'policy_kwargs' does not functional for cnn features in general, but it works in some implementation like DDPG, SAC #526

seheevic · 2019-10-28T02:24:26Z

As you see in the code of DQN policy below

stable-baselines/stable_baselines/deepq/policies.py

Lines 103 to 117 in 22c0753

    
           with tf.variable_scope("model", reuse=reuse): 
        
               with tf.variable_scope("action_value"): 
        
                   if feature_extraction == "cnn": 
        
                       extracted_features = cnn_extractor(self.processed_obs, **kwargs) 
        
                       action_out = extracted_features 
        
                   else: 
        
                       extracted_features = tf.layers.flatten(self.processed_obs) 
        
                       action_out = extracted_features 
        
                       for layer_size in layers: 
        
                           action_out = tf_layers.fully_connected(action_out, num_outputs=layer_size, activation_fn=None) 
        
                           if layer_norm: 
        
                               action_out = tf_layers.layer_norm(action_out, center=True, scale=True) 
        
                           action_out = act_fun(action_out) 
        
                   action_scores = tf_layers.fully_connected(action_out, num_outputs=self.n_actions, activation_fn=None)

layers after extracted_features does not appended for CNN features.
I can understand this as forcing to create full network in the custom cnn_extractor.
But some other policy implementation like DDPG, SAC, you are appending layers in both CNN and MLP cases. You can see this of SAC policy in the code below .

stable-baselines/stable_baselines/sac/policies.py

Lines 205 to 216 in 22c0753

    
           def make_actor(self, obs=None, reuse=False, scope="pi"): 
        
               if obs is None: 
        
                   obs = self.processed_obs 
        
               with tf.variable_scope(scope, reuse=reuse): 
        
                   if self.feature_extraction == "cnn": 
        
                       pi_h = self.cnn_extractor(obs, **self.cnn_kwargs) 
        
                   else: 
        
                       pi_h = tf.layers.flatten(obs) 
        
                   pi_h = mlp(pi_h, self.layers, self.activ_fn, layer_norm=self.layer_norm)

Is this inconsistency of using 'layers' argument is intended?
Should we be careful about this argument for case-by-case?
Please shed some light on me.
Thanks.

Miffyli · 2019-10-28T09:44:33Z

You are right: The SAC and DDPG codes run inputs through cnn_extractor and then through some additional layers, while the shared policy code here does not use additional layers after cnn_extractor. Indeed this should be either very clearly documented or preferably fixed so that behavior is same all around. A PR would be very welcome :)

@araffin Was there any special reason for doing it this for DDPG/SAC, in case there is a hidden "gotcha" somewhere?

We are not sure of the intension. So let the code as original. And just be careful when using CNN feature extraction!

araffin · 2019-10-28T12:30:35Z

Was there any special reason for doing it this for DDPG/SAC

I don't think so. I created SAC/TD3 policy mimicking DDPG behavior, but never really used the code as normally you don't use image with DDPG/SAC/TD3. So yes, I think we should document or fix that anyway.

while the shared policy code here does not use additional layers

For the flexible mlp, it is already documented that is works only for vector observations (and not images).

seheevic · 2019-10-30T03:33:14Z

Thanks for the quick answers!
Actually, I couldn't find the comment 'net_arch works only for vector observations' in the doc, but the doc leaves a clue of the association with mlp extraction like 'net_arch: blah blah... (see
mlp_extractor documentation for details)'. But that's ok to me now.
Can I close this issue?

Miffyli · 2019-10-30T10:35:27Z

You can leave it open for now until the documentation is fixed. If you feel like, you can suggest a PR to fix the documentation to address this issue :)

seheevic · 2019-11-28T06:17:19Z

@araffin @Miffyli
Recently, I am training some DQN with stable baselines, and I figured out that this problem is not as simple as just fixing document. I think that it would be better to change the code to allow appending flexible mlp afterward the output of cnn extractor.
This is because of the consistency with not only the implementation of other algorithms but also the advantage-stream and value-stream of (default enabled) DQN dueling function.
With current implementation, we are injecting [64, 64] (by default) layers to only value-stream V(s) of dueling network. But it seems that more complex function like A(s, a) would require the additional network also. By enabling layers after cnn extractor, we can have options from which we are allowed to inject some shared layers to cnn extractor and seperated layers to 'layers' parameters in terms of dueling network.
I'll create PR for this.

araffin · 2021-08-14T12:24:05Z

Should be fixed in SB3: https://github.com/DLR-RM/stable-baselines3

Miffyli added the question Further information is requested label Oct 28, 2019

seheevic added a commit to victech-dev/stable-baselines that referenced this issue Oct 28, 2019

https://github.com/hill-a/stable-baselines/issues/526

bdcec86

We are not sure of the intension. So let the code as original. And just be careful when using CNN feature extraction!

araffin added the documentation Documentation should be updated label Oct 28, 2019

This was referenced Nov 28, 2019

Fix - After cnn features, flexible layers was not appended #586

Closed

Fix -inconsistency of layers/net_arch usage in cnn policy between different algorithms #587

Open

araffin closed this as completed Aug 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] 'layers' in 'policy_kwargs' does not functional for cnn features in general, but it works in some implementation like DDPG, SAC #526

[Question] 'layers' in 'policy_kwargs' does not functional for cnn features in general, but it works in some implementation like DDPG, SAC #526

seheevic commented Oct 28, 2019

Miffyli commented Oct 28, 2019 •

edited

Loading

araffin commented Oct 28, 2019

seheevic commented Oct 30, 2019

Miffyli commented Oct 30, 2019

seheevic commented Nov 28, 2019 •

edited

Loading

araffin commented Aug 14, 2021

[Question] 'layers' in 'policy_kwargs' does not functional for cnn features in general, but it works in some implementation like DDPG, SAC #526

[Question] 'layers' in 'policy_kwargs' does not functional for cnn features in general, but it works in some implementation like DDPG, SAC #526

Comments

seheevic commented Oct 28, 2019

Miffyli commented Oct 28, 2019 • edited Loading

araffin commented Oct 28, 2019

seheevic commented Oct 30, 2019

Miffyli commented Oct 30, 2019

seheevic commented Nov 28, 2019 • edited Loading

araffin commented Aug 14, 2021

Miffyli commented Oct 28, 2019 •

edited

Loading

seheevic commented Nov 28, 2019 •

edited

Loading