-
Notifications
You must be signed in to change notification settings - Fork 415
Updater
This page introduce updater setting for cxxnet
- General SGD Updater
- Constant Learning Rate Scheduling
- Exp Decay Learning Rate Scheduling
- Poly Decay Learning Rate Scheduling
=
In default, the cxxnet will use the SGDUpdater.
The eta
, wd
and momentum
can be set differently to wmat
or bias
, by configure
wmat:eta = 0.1
bias:eta = 0.2
If not specify the target, the setting will take effect globally.
- Basic configuration:
updater = sgd
eta = 0.01
momentum = 0.9
wd = 0.0005
-
eta is known as learning rate, default is 0.01
-
momentum is momentum, default is 0.9
-
wd is known as weight decay, default is 0.005
-
Global updater setting can be overridden in the layer configuration. eg.
# Global setting
eta = 0.01
momentum = 0.9
# Layer setting
netconfig=start
layer[0->1] = fullc:fc1
nhidden = 100
eta = 0.02
momentum = 0.5
layer[1->2] = sigmoid:se1
layer[2->3] = fullc:fc1
nhidden = 10
layer[3->3] = softmax
netconfig=end
In the layer fc1
, the learning rate will be 0.02
and momentum will be 0.5
, but layer fc2
will follow the global setting, whose learning rate will be 0.01
and momentum will be 0.9
=
There are some advanced features for SGDUpdater, like learning rate scheduling. We provides 3 learning rate scheduling method: constant , expdecay and polydecay .
- Example of constant scheduling: In this way the learning rate keep same
updater = sgd
eta = 0.01
momentum = 0.9
wd = 0.0005
lr:schedule = constant
Exponential Learning rate decay adjust learning rate like this formula:
new_learning_rate = base_learning_rate * pow(gamma, epoch / step )
- Example of expdecay scheduling: In this way the learning rate drop in exponential way
updater = sgd
eta = 0.01
momentum = 0.9
wd = 0.0005
lr:schedule = expdecay
lr:start_epoch = 3000
lr:minimum_lr = 0.001
lr:gamma = 0.5
lr:step = 1000
- lr:start_epoch start learning rate scheduling after epoch, default is 0
- lr:minimum_lr minimum of learning rate, default is 0.0001
- lr:gamma learning decay param, default is 0.5
Polynomial learning rate decay adjusts the learning rate like this formula:
new_learning_rate = base_learning_rate * pow( 1.0 + (epoch/step) * gamma, -alpha )
- Example of polydecay scheduling: In this way the learning rate drop in polynomial way
updater = sgd
eta = 0.01
momentum = 0.9
wd = 0.0005
lr:schedule = polydecay
lr:start_epoch = 3000
lr:minimum_lr = 0.001
lr:alpha = 0.5
lr:gamma = 0.1
lr:step = 1000
- lr:start_epoch start learning rate scheduling after epoch, default is 0
- lr:minimum_lr minimum of learning rate, default is 0.00001
- lr:gamma learning decay param, default is 0.5
- lr:alpha learning decay param, default is 0.5