Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DEV][FP8] Improve e4m3 decoding #43

Merged
merged 1 commit into from
May 21, 2024
Merged

Conversation

LeiWang1999
Copy link
Contributor

This pull request primarily focuses on refining the type conversions and adjusting the precision in the testing function. The changes are aimed at improving the efficiency and accuracy of the code.

Here are the key changes:

Type conversion refinement:

  • python/bitblas/quantization/quantization.py: In the function _tir_u8_to_f8_e4m3_to_f16, the type of the shift operation has been changed from int16 to uint16. Also, the calculation of s_f16 and e_f16 has been modified to use bitwise operations.

Precision adjustment:

@LeiWang1999 LeiWang1999 merged commit c570a76 into microsoft:main May 21, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant