[wip] Python-only float8 data type + bare bones UEX #23

vkuzo · 2023-05-10T03:05:48Z

Summary:

This is a lightweight example of how a Float8Tensor could be build out of core, and how it could hook up with a scaling UEX based on module swapping. The Float8Tensor here is the important part. The Float8Linear part is just an example to demonstrate e2e - I'd expect framework owners to create their own UEX for now. Float8Tensor should ideally hook into the existing TransformerEngine cleanly and simplify things.

Note: Ready for initial review, but things might change after we add distributed support.

Note: this is WIP and does not represent PyTorch's opinion on how we will integrate float8. At this point, this is a prototype to get some light feedback.

TODOs that need to be implemented before review of this PR / sending to NVIDIA:

[done] emulated float8 casts to and from float32
[done] move https://github.com/albanD/subclass_zoo/blob/fp8_v2/fp8_subclass_v2.py here and hook up to accurate casts
[done] basic dynamic scaling (with the assumption that the real UEX can switch to delayed scaling)
[50%] numerical testing that this all works for fw + bw with a toy model - it's very "toy" at the moment
[not done] example of how an integration with distributed would work - this may change things

What is out of scope for this POC
a. hooking up to real float8 ops (saved for later, just needs someone to do it)
b. real UEX (saved for later and will need a lot of design discussion)

Test plan:

python protoquant/float8/test.py

Summary: Note: this is WIP and does not represent PyTorch's opinion on how we will integrate float8. At this point, this is a prototype to get some light feedback. TODOs that need to be implemented before review of this PR: * [done] emulated float8 casts to and from float32 * [not done] move https://github.com/albanD/subclass_zoo/blob/fp8_v2/fp8_subclass_v2.py here and hook up to accurate casts * [not done] basic dynamic scaling (with the assumption that the real UEX can switch to delayed scaling) * [not done] numerical testing that this all works for fw + bw with a toy model TODOs to be done in next couple of PRs before sending to NVIDIA: * [not done] example of how an integration with distributed would work What is out of scope for this POC a. hooking up to real float8 ops (saved for later, just needs someone to do it) b. real UEX (saved for later and will need a lot of design discussion) Test plan: ``` python protoquant/float8/test.py ```

Summary: This is a copy of facebookexperimental/protoquant#23 Many things will change based on recent discussions! Test Plan: ``` python float8_playground/test.py ``` Reviewers: Subscribers: Tasks: Tags:

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 10, 2023

vkuzo force-pushed the fp8_poc branch from 4befd60 to 459a2ca Compare May 11, 2023 03:27

vkuzo force-pushed the fp8_poc branch from 459a2ca to 22ac6bf Compare May 11, 2023 14:50

albanD mentioned this pull request Jun 15, 2023

[RFC] FP8 dtype introduction to PyTorch pytorch/pytorch#91577

Closed

vkuzo mentioned this pull request Jul 19, 2023

Python-only float8 data type + bare bones UEX pytorch-labs/float8_experimental#1

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[wip] Python-only float8 data type + bare bones UEX #23

[wip] Python-only float8 data type + bare bones UEX #23

vkuzo commented May 10, 2023 •

edited

Loading

[wip] Python-only float8 data type + bare bones UEX #23

Are you sure you want to change the base?

[wip] Python-only float8 data type + bare bones UEX #23

Conversation

vkuzo commented May 10, 2023 • edited Loading

vkuzo commented May 10, 2023 •

edited

Loading