Skip to content
This repository has been archived by the owner on May 24, 2018. It is now read-only.

Updater

Lin N.Y. edited this page May 30, 2015 · 15 revisions

Introduction

This page introduce updater setting for cxxnet

=

Updater

In default, the cxxnet will use the SGDUpdater. The eta, wd and momentum can be set differently to wmat or bias, by configure

wmat:eta = 0.1
bias:eta = 0.2

If not specify the target, the setting will take effect globally.

  • Basic configuration:
updater = sgd
eta = 0.01
momentum = 0.9
wd = 0.0005
  • eta is known as learning rate, default is 0.01

  • momentum is momentum, default is 0.9

  • wd is known as weight decay, default is 0.005

  • Global updater setting can be overridden in the layer configuration. eg.

# Global setting
eta = 0.01
momentum = 0.9
# Layer setting
netconfig=start
layer[0->1] = fullc:fc1
  nhidden = 100
  eta = 0.02
  momentum = 0.5
layer[1->2] = sigmoid:se1
layer[2->3] = fullc:fc1
  nhidden = 10
layer[3->3] = softmax
netconfig=end

In the layer fc1, the learning rate will be 0.02 and momentum will be 0.5, but layer fc2 will follow the global setting, whose learning rate will be 0.01 and momentum will be 0.9

=

Learning Rate Scheduling

There are some advanced features for SGDUpdater, like learning rate scheduling. We provides 3 learning rate scheduling method: constant , expdecay and polydecay .

Constant Scheduling

  • Example of constant scheduling: In this way the learning rate keep same
updater = sgd
eta = 0.01
momentum = 0.9
wd = 0.0005
lr:schedule = constant

Exp Decay

Exponential Learning rate decay adjust learning rate like this formula: new_learning_rate = base_learning_rate * pow(gamma, epoch / step )

  • Example of expdecay scheduling: In this way the learning rate drop in exponential way
updater = sgd
eta = 0.01
momentum = 0.9
wd = 0.0005
lr:schedule = expdecay
lr:start_epoch = 3000
lr:minimum_lr = 0.001
lr:gamma = 0.5
lr:step = 1000
  • lr:start_epoch start learning rate scheduling after epoch, default is 0
  • lr:minimum_lr minimum of learning rate, default is 0.0001
  • lr:gamma learning decay param, default is 0.5

Poly Decay

Polynomial learning rate decay adjusts the learning rate like this formula: new_learning_rate = base_learning_rate * pow( 1.0 + (epoch/step) * gamma, -alpha )

  • Example of polydecay scheduling: In this way the learning rate drop in polynomial way
updater = sgd
eta = 0.01
momentum = 0.9
wd = 0.0005
lr:schedule = polydecay
lr:start_epoch = 3000
lr:minimum_lr = 0.001
lr:alpha = 0.5
lr:gamma = 0.1
lr:step = 1000
  • lr:start_epoch start learning rate scheduling after epoch, default is 0
  • lr:minimum_lr minimum of learning rate, default is 0.00001
  • lr:gamma learning decay param, default is 0.5
  • lr:alpha learning decay param, default is 0.5
Clone this wiki locally