Skip to content

Releases: karlhigley/torch-optim-sparse

Fixes an issue applying momentum to narrow embedding layers in SGD/SGDW

03 Oct 15:32
Compare
Choose a tag to compare

With embedding layers where one of the dimensions has size 1 (e.g. a single embedding used to represent biases), squeezing the momentum values was removing the size 1 dimension entirely. This adds reshape to make the indices and values of the constructed sparse momentum tensor match with each other.

Fixes weight decay in SGDW, adds learning rate conversion utility

03 Oct 15:31
Compare
Choose a tag to compare

The utility function in this release allows you to specify the effective learning you want and then calculates what actual learning rate you should pass to the optimizer, taking into account momentum, adaptive learning rates, and batch size. It's useful for comparing optimizers in a single hyper-parameter sweep.

Fixes application gradients in SGD/SGDW

22 Sep 23:48
Compare
Choose a tag to compare
Pre-release

It helps if you use the gradients. 🤦

The PyTorch SGD implementation is a little confusing, in that it uses the d_p variable for the gradient to represent three different quantities depending on the combination of options, and some of those values aren't the gradient. In seeking to clarify the variable names, I inadvertently dropped an important behavior—namely, applying the gradient when the momentum is zero.

Initial release

22 Sep 19:48
Compare
Choose a tag to compare
Initial release Pre-release
Pre-release

This is the first version of the four "sparser" optimizers, based on code from PyTorch 1.6.0.