Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update weight initialization scheme in mlp.py #106

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Update weight initialization scheme in mlp.py #106

wants to merge 1 commit into from

Conversation

rfeinman
Copy link

The sparse initialization scheme is considered state-of-the art in random weight initialization for MLPs. In this scheme we hard limit the number of non-zero incoming connection weights to each unit (we used 15 in our experiments) and set the biases to 0 (or 0.5 for tanh units). Doing this allows the units to be both highly differentiated as well as unsaturated, avoiding the problem in dense initializations where the connection weights must all be scaled very small in order to prevent saturation, leading to poor differentiation between units.

The sparse initialization scheme is considered state-of-the art in random weight initialization for MLPs. In this scheme we hard limit the number of non-zero incoming connection weights to each unit (we used 15 in our experiments) and set the biases to 0 (or 0.5 for tanh units). Doing this allows the units to be both highly differentiated as well as unsaturated, avoiding the problem in dense initializations where the connection weights must all be scaled very small in order to prevent saturation, leading to poor differentiation between units.
@nouiz
Copy link
Member

nouiz commented Jul 23, 2015

Thanks for the PR, there is a bug that make our auto tests fail with errors like this:

=====================================================================

ERROR: test.test_convolutional_mlp

----------------------------------------------------------------------

Traceback (most recent call last):

  File "/home/travis/miniconda/envs/pyenv/lib/python2.7/site-packages/nose/case.py", line 197, in runTest

    self.test(*self.arg)

  File "/home/travis/build/lisa-lab/DeepLearningTutorials/code/test.py", line 41, in test_convolutional_mlp

    convolutional_mlp.evaluate_lenet5(n_epochs=1, nkerns=[5, 5])

  File "/home/travis/build/lisa-lab/DeepLearningTutorials/code/convolutional_mlp.py", line 206, in evaluate_lenet5

    activation=T.tanh

  File "/home/travis/build/lisa-lab/DeepLearningTutorials/code/mlp.py", line 85, in __init__

    random.shuffle(indices)

NameError: global name 'random' is not defined

-------------------- >> begin captured stdout << ---------------------

Also, there is other part that I think would need update, as there is still have reference to the previous paper.

I didn't read the full paper, which section tell that this initialization is better then the previous one with 1st order optimization?

Before merging, this would need some testing and accepting by other people on the machine learning side (I'm in the software side)

@rfeinman
Copy link
Author

Sorry, I forgot to add "import random" at the top of the code file.

In regard to why it is better, please see section 3.1 of Hinton's paper. The idea is explained very well here.
http://www.cs.toronto.edu/~hinton/absps/momentum.pdf

"The intuitive justification is that the total amount of input to each unit will not depend on the size of the previous layer and hence they will not as easily saturate. Meanwhile, because the inputs to each unit are not all randomly weighted blends of the outputs of many 100s or 1000s of units in the previous layer, they will tend to be qualitatively more “diverse” in their response to inputs."

I referenced Martens 2010 because this is where the initialization scheme was first described.

@lamblin
Copy link
Member

lamblin commented Jul 27, 2015

Hi, sorry about the late reply.
I'm hesitant to replace the existing code, because the current initialization scheme (Glorot+Bengio 2010, we should definitely add the citation) is considered state of the art as well.
The article you site only provides an "intuitive justification" for why sparse initialization would be good, not concrete evidence that it is actually better than Glorot+Bengio.
I would be interested if you have pointers to articles explicitly comparing both, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants