-
Notifications
You must be signed in to change notification settings - Fork 168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert "Pin PT version: Fix FPX Inductor error" #843
Conversation
This reverts commit 287458c.
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/843
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 47acb46 with merge base 1b317f9 (): This comment was automatically generated by Dr. CI and updates every 15 minutes. |
so fpx test is no longer failing but i see some new failures with bitnet that should probably be skipped @andrewor14 before we make the release. Also save/load is now failing on nightlies @jerryzh168 |
@msaroufim do you have a link to the failing tests? I can try to help root cause (and/or at least figure out if it's subclass related) |
Thank you! <3 They're in test/integration/test_integration.py test_int8_weight_only_quant_subclass_api And only fail for cpu |
A couple things: (1) I confirmed that (2) I tweaked the test to compile with a few different backends: (3) i checked out @eellison's beautiful inductor bisecting tool from here and ran it. It bisected down to to the lowering for (4) Also confirmed that this only repros with cpu inputs. Idk, @eellison do you know any suspicious commits coming from the cpu-inductor side in the last week that could affect numerics, especially in relation to casting ops? |
@bdhirsh, I was on pto last week so i'm not sure. sounds like @leslie-fang-intel has an idea. Would be easier to bisect maybe. |
@leslie-fang-intel not sure, but seems unlikely (mainly because the issue you linked is a hard error while the subclass is running at compile time, while this is a runtime / bad numerics error) |
Thanks @bdhirsh. It sounds like a different error then. Maybe after we resolve the first one, we will meet the numeric error you met. Do you have any idea about the hard error in #890? Why it didn't happen in your test environment. ----------------- Update ---------------
comment out these lines to enable the test:
It seems we saw 2 different errors when running 2 different UT
|
Hi @bdhirsh, I have create PRs to fix these 2 failures
Please kindly help to review these 2 PRs. |
@jerryzh168 I vaguely remember it being a problem to re-enable Either way, we should fix the freezing interaction (@IvanKobzarev is taking a look) |
Reverts #790
The problems are all mostly cpu specific, in particular this feels like a subclass on cpu problem but not sure - cc @bdhirsh