-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimization for ConvNd if dropout=0. #2371
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this DoRA optimization to conv layers. Generally, this looks good, but I have one comment about the bias term.
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Thanks for the update. Let's call |
Should be fixed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the update. As you can see, the tests are failing. I checked what's going on and the issue is that the bias is still flat at this point, hence when it is subtracted, PyTorch broadcasts the base_result
.
By reshaping the bias, this should be addressed. IIUC, the bias shape should be (1, -1, ...)
, where ...
are 1s, the number of which depends on the type of conv layer. I suggested a fix, but please LMK if you think this is incorrect.
Co-authored-by: Benjamin Bossan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for bringing this DoRA optimization to conv layers.
As discussed in #2153