Skip to content

Fixes application gradients in SGD/SGDW

Pre-release
Pre-release
Compare
Choose a tag to compare
@karlhigley karlhigley released this 22 Sep 23:48
· 3 commits to mainstem since this release

It helps if you use the gradients. 🤦

The PyTorch SGD implementation is a little confusing, in that it uses the d_p variable for the gradient to represent three different quantities depending on the combination of options, and some of those values aren't the gradient. In seeking to clarify the variable names, I inadvertently dropped an important behavior—namely, applying the gradient when the momentum is zero.