-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FP8 OCP to FP8 FNUZ on hardware with only FP8 FNUZ support #3684
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #3684 +/- ##
===========================================
+ Coverage 92.16% 92.18% +0.01%
===========================================
Files 515 518 +3
Lines 21978 22047 +69
===========================================
+ Hits 20256 20324 +68
- Misses 1722 1723 +1 ☔ View full report in Codecov by Sentry. |
it needs a string input
Fixed the bug in the the pointwise compilation. __builtin_nan requires a string input that affects the most significant bits. |
This build is not recommended to merge 🔴 |
🔴bert_large_uncased_fp16: FAILED: MIGraphX is not within tolerance - check verbose output |
* intrinsically. Conversion uses the same bit representation and adjusts scaling factors at the | ||
* dequantization. Using the same bit representation from fp8e4m3fn to fp8e4m3fnuz halves the | ||
* floating point representation. This pass should run before simplify_qdq so that the scales and | ||
* zero points calculated by simplify_qdq have the correct adjusted scaling factors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Appreciate this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Add some additional tests for some of the coverage warnings and that's about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Disregard, you're already like 97% covered here. Looks good
simplify_qdq
so that the adjusted scales and zero points are propagated to after the quantized operator.test/fp8_ocp_to_nanoo.cpp
checks the pass works withsimplify_qdq
and does the expected operationstest/ref/fp8_ocp_to_nanoo.cpp
checks the pass produces the same result before and after__builtin_nan
incorrectly