Releases · karlhigley/torch-optim-sparse

03 Oct 15:32

0.1.3

b6dd32f

Fixes an issue applying momentum to narrow embedding layers in SGD/SGDW Pre-release

Pre-release

With embedding layers where one of the dimensions has size 1 (e.g. a single embedding used to represent biases), squeezing the momentum values was removing the size 1 dimension entirely. This adds reshape to make the indices and values of the constructed sparse momentum tensor match with each other.

Assets 4

03 Oct 15:31

karlhigley

0.1.2

61ca507

Fixes weight decay in SGDW, adds learning rate conversion utility Pre-release

Pre-release

The utility function in this release allows you to specify the effective learning you want and then calculates what actual learning rate you should pass to the optimizer, taking into account momentum, adaptive learning rates, and batch size. It's useful for comparing optimizers in a single hyper-parameter sweep.

Assets 4

22 Sep 23:48

karlhigley

0.1.1

bcdac28

Fixes application gradients in SGD/SGDW Pre-release

Pre-release

It helps if you use the gradients. 🤦

The PyTorch SGD implementation is a little confusing, in that it uses the d_p variable for the gradient to represent three different quantities depending on the combination of options, and some of those values aren't the gradient. In seeking to clarify the variable names, I inadvertently dropped an important behavior—namely, applying the gradient when the momentum is zero.

Assets 4

22 Sep 19:48

karlhigley

0.1.0

4dd03c5

Initial release Pre-release

Pre-release

This is the first version of the four "sparser" optimizers, based on code from PyTorch 1.6.0.

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: karlhigley/torch-optim-sparse

Fixes an issue applying momentum to narrow embedding layers in SGD/SGDW

Fixes weight decay in SGDW, adds learning rate conversion utility

Fixes application gradients in SGD/SGDW

Initial release