Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hard bound on learning rate/computed derivative results in low-precision failures #6

Open
seldridge opened this issue Feb 21, 2016 · 0 comments

Comments

@seldridge
Copy link
Collaborator

The kludge of forcing the derivative and computed learning rate to be the smallest possible representation in the current fixed point precision may result in instability for low fixed point precision values.

I'm more aware of this problem as it relates to the computed learning rate, i.e.,

computed learning rate = learning rate / # items in a batch
As the number of items in a batch increases, this results in an ever-smaller learning rate. However, for low fixed point precisions (e.g., 7 bits), we can only use a batch size of 64 to allow for a reasonable learning rate of 0.5. Allowing the number of batch items to increase substantially beyond this causes problems. For example, a 7-bit fractional representation with 2048 batch items, the minimum learning rate that we can represent is 16. This is nearly guaranteed to cause instability.

There are a couple of ways to get around this:

Use a larger internal precision to deal with learning rate computations
Limit the batch size to prevent artificially increasing the learning rate

fann-xfiles will currently fail if this behavior is detected, but that is not a legitimate solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant