Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Half/BFloat16 in native_group_norm (needs accuracy fix) #7846

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

swolchok
Copy link
Contributor

@swolchok swolchok commented Jan 22, 2025

No description provided.

[ghstack-poisoned]
@swolchok
Copy link
Contributor Author

swolchok commented Jan 22, 2025

Copy link

pytorch-bot bot commented Jan 22, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/7846

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures

As of commit d3ecd5b with merge base 03cba2c (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 22, 2025
swolchok added a commit that referenced this pull request Jan 22, 2025
ghstack-source-id: 938abf9d9ceafbd9d492d099722056ca1b124d44
ghstack-comment-id: 2608331848
Pull Request resolved: #7846
@swolchok swolchok added the release notes: ops & kernels Changes to the opset and any new / changed kernel implementations label Jan 22, 2025
@swolchok swolchok requested review from dbort and manuelcandales and removed request for dbort January 22, 2025 21:44
EXPECT_TENSOR_CLOSE_WITH_TOL(
out0,
out0_expected,
2e-1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's 20%. This doesn't make me feel comfortable.

Copy link
Contributor Author

@swolchok swolchok Jan 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

group_norm is one of the ops that automatic mixed precision will autocast to float32: https://intel.github.io/intel-extension-for-pytorch/xpu/1.10.200+gpu/tutorials/features/amp.html

I think the norm ops are just particularly prone to roundoff error, but I'm certainly not a numerical analysis person.

(unresolving for posterity)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dug in a little further. Here's the PR that originally made PyTorch group_norm support Half: https://github.com/pytorch/pytorch/pull/100234/files#diff-7927db349f568afca2de9b94d74ea5c3b8cb468cb6a433d0cc1e61e65c515a36

It looks like the test is atol=rtol=5e-3. I think it's reasonable to argue that if we can't get the tolerances to be broadly similar then we have a correctness issue and thus don't actually support Half. I'll see what I can do; this one might have to wait for code sharing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test doesn't pass with atol=rtol=5e-3. Holding off on group_norm until we have code sharing.

@manuelcandales manuelcandales self-requested a review January 23, 2025 17:19
[ghstack-poisoned]
swolchok added a commit that referenced this pull request Jan 23, 2025
ghstack-source-id: 1deac3151e791e9b04da6b08e800eaac1d17111a
ghstack-comment-id: 2608331848
Pull Request resolved: #7846
@swolchok swolchok changed the title Support Half/BFloat16 in native_group_norm Support Half/BFloat16 in native_group_norm (needs accuracy fix) Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. release notes: ops & kernels Changes to the opset and any new / changed kernel implementations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants