-
Notifications
You must be signed in to change notification settings - Fork 898
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8 bit floating point #1910
base: master
Are you sure you want to change the base?
8 bit floating point #1910
Conversation
RVIA is working on new standard extensions for FP8 (specifically the OCP MX formats) with which this PR will conflict. In addition to the standard extensions being more parsimonious (conversions only, at least for now), there is the complication that there are multiple formats. OCP MX includes both E4M3 and E5M2 formats, for example--neither of which is the same as your FP8 format! If this custom extension is important for your work, I recommend maintaining it on your own fork (which I know can be a pain). (As a separate matter, I appreciate your simple approach of doing the computation in higher precision then rounding.) |
Implements several helper macros as well. Convert to f16, then multiply rounding-to-odd, then convert back to f8 using specified rounding mode.
…_emulation module Will now be able to resuse this logic for other functions.
All use new f8_emulation_1_operand function.
…ting point format Will eventaully be hooked up to architectural register to control toggel.
4699f5b
to
2573d49
Compare
Thanks for the feedback. I just pushed the a version with just the softfloat changes, hopefully someone can make use of this when the extensions are standardized. This also includes the OCP formats (courtesy of my colleague Omar), I forgot to grab those commits for the original PR. |
Hi all,
I did some work on 8-bit floating point a little while back (all three IEEE formats). This implementation uses a conversion method (convert to f16, operate, convert back to f8) utilizing round to odd as discussed in the following paper to preserve correctness. https://ieeexplore.ieee.org/abstract/document/4358278
I imagine the changes I made to the vector unit in the MR are probably unwanted since they don't go with any extension, but I'd be happy to filter out just the softfloat additions I made if there is interest in that from the community.