You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm not sure whether you are able to mask out bias in BN layers too. (v.bias:cmul(mask))
Since what you minimized and pruned are actually weight not bias.
For BN layers, y=γx+β.
You pruned small γ ones, but how about β ? It may be large or important.
For me, after I masked out β I got an enormous accuracy drop.
If there is any misunderstanding of the works just please tell me.
Thank you.
The text was updated successfully, but these errors were encountered:
In my experiment, masking out bias did not seem to change accuracy much. I thought this was because if γ is zero, then the output of that channel is the same for all input (all β), so that channel is not important and the network learned to let β be small. Even β is large, that channel outputs the same activations for all input, so I think it is not that important. If there is accuracy drop in your experiment, I think fine-tuning can recover that.
Hi @liuzhuang13
I'm not sure whether you are able to mask out bias in BN layers too. (v.bias:cmul(mask))
Since what you minimized and pruned are actually weight not bias.
For BN layers, y=γx+β.
You pruned small γ ones, but how about β ? It may be large or important.
For me, after I masked out β I got an enormous accuracy drop.
If there is any misunderstanding of the works just please tell me.
Thank you.
The text was updated successfully, but these errors were encountered: