Hard bound on learning rate/computed derivative results in low-precision failures #6

seldridge · 2016-02-21T23:52:57Z

The kludge of forcing the derivative and computed learning rate to be the smallest possible representation in the current fixed point precision may result in instability for low fixed point precision values.

I'm more aware of this problem as it relates to the computed learning rate, i.e.,

computed learning rate = learning rate / # items in a batch
As the number of items in a batch increases, this results in an ever-smaller learning rate. However, for low fixed point precisions (e.g., 7 bits), we can only use a batch size of 64 to allow for a reasonable learning rate of 0.5. Allowing the number of batch items to increase substantially beyond this causes problems. For example, a 7-bit fractional representation with 2048 batch items, the minimum learning rate that we can represent is 16. This is nearly guaranteed to cause instability.

There are a couple of ways to get around this:

Use a larger internal precision to deal with learning rate computations
Limit the batch size to prevent artificially increasing the learning rate

fann-xfiles will currently fail if this behavior is detected, but that is not a legitimate solution.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hard bound on learning rate/computed derivative results in low-precision failures #6

Hard bound on learning rate/computed derivative results in low-precision failures #6

seldridge commented Feb 21, 2016

Hard bound on learning rate/computed derivative results in low-precision failures #6

Hard bound on learning rate/computed derivative results in low-precision failures #6

Comments

seldridge commented Feb 21, 2016