-
Notifications
You must be signed in to change notification settings - Fork 429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Half/BFloat16 in native_group_norm (needs accuracy fix) #7846
base: main
Are you sure you want to change the base?
Conversation
Stack from ghstack (oldest at bottom): |
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7846
Note: Links to docs will display an error until the docs builds have been completed. ❌ 3 New FailuresAs of commit d3ecd5b with merge base 03cba2c (): NEW FAILURES - The following jobs have failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ghstack-source-id: 938abf9d9ceafbd9d492d099722056ca1b124d44 ghstack-comment-id: 2608331848 Pull Request resolved: #7846
EXPECT_TENSOR_CLOSE_WITH_TOL( | ||
out0, | ||
out0_expected, | ||
2e-1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's 20%. This doesn't make me feel comfortable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
group_norm is one of the ops that automatic mixed precision will autocast to float32: https://intel.github.io/intel-extension-for-pytorch/xpu/1.10.200+gpu/tutorials/features/amp.html
I think the norm ops are just particularly prone to roundoff error, but I'm certainly not a numerical analysis person.
(unresolving for posterity)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dug in a little further. Here's the PR that originally made PyTorch group_norm support Half: https://github.com/pytorch/pytorch/pull/100234/files#diff-7927db349f568afca2de9b94d74ea5c3b8cb468cb6a433d0cc1e61e65c515a36
It looks like the test is atol=rtol=5e-3. I think it's reasonable to argue that if we can't get the tolerances to be broadly similar then we have a correctness issue and thus don't actually support Half. I'll see what I can do; this one might have to wait for code sharing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test doesn't pass with atol=rtol=5e-3
. Holding off on group_norm until we have code sharing.
ghstack-source-id: 1deac3151e791e9b04da6b08e800eaac1d17111a ghstack-comment-id: 2608331848 Pull Request resolved: #7846
No description provided.