Update weight initialization scheme in mlp.py #106

rfeinman · 2015-07-22T21:24:30Z

The sparse initialization scheme is considered state-of-the art in random weight initialization for MLPs. In this scheme we hard limit the number of non-zero incoming connection weights to each unit (we used 15 in our experiments) and set the biases to 0 (or 0.5 for tanh units). Doing this allows the units to be both highly differentiated as well as unsaturated, avoiding the problem in dense initializations where the connection weights must all be scaled very small in order to prevent saturation, leading to poor differentiation between units.

nouiz · 2015-07-23T00:17:24Z

Thanks for the PR, there is a bug that make our auto tests fail with errors like this:

=====================================================================

ERROR: test.test_convolutional_mlp

----------------------------------------------------------------------

Traceback (most recent call last):

  File "/home/travis/miniconda/envs/pyenv/lib/python2.7/site-packages/nose/case.py", line 197, in runTest

    self.test(*self.arg)

  File "/home/travis/build/lisa-lab/DeepLearningTutorials/code/test.py", line 41, in test_convolutional_mlp

    convolutional_mlp.evaluate_lenet5(n_epochs=1, nkerns=[5, 5])

  File "/home/travis/build/lisa-lab/DeepLearningTutorials/code/convolutional_mlp.py", line 206, in evaluate_lenet5

    activation=T.tanh

  File "/home/travis/build/lisa-lab/DeepLearningTutorials/code/mlp.py", line 85, in __init__

    random.shuffle(indices)

NameError: global name 'random' is not defined

-------------------- >> begin captured stdout << ---------------------

Also, there is other part that I think would need update, as there is still have reference to the previous paper.

I didn't read the full paper, which section tell that this initialization is better then the previous one with 1st order optimization?

Before merging, this would need some testing and accepting by other people on the machine learning side (I'm in the software side)

rfeinman · 2015-07-23T22:39:34Z

Sorry, I forgot to add "import random" at the top of the code file.

In regard to why it is better, please see section 3.1 of Hinton's paper. The idea is explained very well here.
http://www.cs.toronto.edu/~hinton/absps/momentum.pdf

"The intuitive justification is that the total amount of input to each unit will not depend on the size of the previous layer and hence they will not as easily saturate. Meanwhile, because the inputs to each unit are not all randomly weighted blends of the outputs of many 100s or 1000s of units in the previous layer, they will tend to be qualitatively more “diverse” in their response to inputs."

I referenced Martens 2010 because this is where the initialization scheme was first described.

lamblin · 2015-07-27T21:35:00Z

Hi, sorry about the late reply.
I'm hesitant to replace the existing code, because the current initialization scheme (Glorot+Bengio 2010, we should definitely add the citation) is considered state of the art as well.
The article you site only provides an "intuitive justification" for why sparse initialization would be good, not concrete evidence that it is actually better than Glorot+Bengio.
I would be interested if you have pointers to articles explicitly comparing both, though.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update weight initialization scheme in mlp.py #106

Update weight initialization scheme in mlp.py #106

Uh oh!

rfeinman commented Jul 22, 2015

Uh oh!

nouiz commented Jul 23, 2015

Uh oh!

rfeinman commented Jul 23, 2015

Uh oh!

lamblin commented Jul 27, 2015

Uh oh!

Uh oh!

Update weight initialization scheme in mlp.py #106

Are you sure you want to change the base?

Update weight initialization scheme in mlp.py #106

Uh oh!

Conversation

rfeinman commented Jul 22, 2015

Uh oh!

nouiz commented Jul 23, 2015

Uh oh!

rfeinman commented Jul 23, 2015

Uh oh!

lamblin commented Jul 27, 2015

Uh oh!

Uh oh!