Fixes application gradients in SGD/SGDW

Pre-release

karlhigley released this 22 Sep 23:48

· 3 commits to mainstem since this release

0.1.1

bcdac28

It helps if you use the gradients. 🤦

The PyTorch SGD implementation is a little confusing, in that it uses the d_p variable for the gradient to represent three different quantities depending on the combination of options, and some of those values aren't the gradient. In seeking to clarify the variable names, I inadvertently dropped an important behavior—namely, applying the gradient when the momentum is zero.

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes application gradients in SGD/SGDW