8 bit floating point #1910

rbuchner-aril · 2025-02-06T22:34:44Z

Hi all,

I did some work on 8-bit floating point a little while back (all three IEEE formats). This implementation uses a conversion method (convert to f16, operate, convert back to f8) utilizing round to odd as discussed in the following paper to preserve correctness. https://ieeexplore.ieee.org/abstract/document/4358278

I imagine the changes I made to the vector unit in the MR are probably unwanted since they don't go with any extension, but I'd be happy to filter out just the softfloat additions I made if there is interest in that from the community.

aswaterman · 2025-02-06T23:32:03Z

RVIA is working on new standard extensions for FP8 (specifically the OCP MX formats) with which this PR will conflict. In addition to the standard extensions being more parsimonious (conversions only, at least for now), there is the complication that there are multiple formats. OCP MX includes both E4M3 and E5M2 formats, for example--neither of which is the same as your FP8 format!

If this custom extension is important for your work, I recommend maintaining it on your own fork (which I know can be a pain).

(As a separate matter, I appreciate your simple approach of doing the computation in higher precision then rounding.)

Implements several helper macros as well. Convert to f16, then multiply rounding-to-odd, then convert back to f8 using specified rounding mode.

…_emulation module Will now be able to resuse this logic for other functions.

All use new f8_emulation_1_operand function.

…ting point format Will eventaully be hooked up to architectural register to control toggel.

rbuchner-aril · 2025-02-07T01:08:09Z

Thanks for the feedback.

I just pushed the a version with just the softfloat changes, hopefully someone can make use of this when the extensions are standardized.

This also includes the OCP formats (courtesy of my colleague Omar), I forgot to grab those commits for the original PR.

rbuchner-aril and others added 19 commits February 6, 2025 16:21

Implement f8_to_f16 and f16_to_f8 conversion functions

422234b

Implement F8 multiply

6326388

Implements several helper macros as well. Convert to f16, then multiply rounding-to-odd, then convert back to f8 using specified rounding mode.

Extract logic for emulating f8 function using f16 from f8_mul into f8…

b76760f

…_emulation module Will now be able to resuse this logic for other functions.

Implement f8_sqrt, f8_rsqrte7, f8_recip7 in softfloat

56a3641

All use new f8_emulation_1_operand function.

Implement f8_div in softfloat

5366c36

Implement f8_add/f8_sub in softfloat

aedb49c

Implement f8_madd instruction in softfloat

4fc27fb

Implement f8_to_(u)i16 in softfloat

863cca1

Implement f8_to_(u)i8 in softfloat

6df0a18

Implement (u)i32_to_f8 in softfloat

c838464

Update vector FP classify to support FP8

5290ed7

Implement F8 min/max

63dfd34

Implement f8 comparsion functions

03ccb6c

Add thread local softfloat_fp8Mode variable to control the 8 bit floa…

9c6df63

…ting point format Will eventaully be hooked up to architectural register to control toggel.

Extract expF8UI and fracF8UI based on fp8Mode

4bb97ee

(1/4) Add OCP e4m3 and e5m2 to list of f8 modes

993fc48

(2/4) Add macros to support new OCP f8 modes

5e65f49

(3/4) Support e4m3/e5m2 modes in f8 to f16 conversion

343abf2

(4/4) Support e4m3/e5m2 modes in f16 to f8 conversion

2573d49

rbuchner-aril force-pushed the rbuchner/FP8 branch from 4699f5b to 2573d49 Compare February 7, 2025 01:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8 bit floating point #1910

8 bit floating point #1910

rbuchner-aril commented Feb 6, 2025

aswaterman commented Feb 6, 2025 •

edited

Loading

rbuchner-aril commented Feb 7, 2025

8 bit floating point #1910

Are you sure you want to change the base?

8 bit floating point #1910

Conversation

rbuchner-aril commented Feb 6, 2025

aswaterman commented Feb 6, 2025 • edited Loading

rbuchner-aril commented Feb 7, 2025

aswaterman commented Feb 6, 2025 •

edited

Loading