[wip] make all 3 gemms in float8 linear configurable #258

vkuzo · 2024-05-10T20:10:25Z

Summary:

This PR ensures each of the 3 gemms in fw+bw of the float8 linear can have its own configuration.

The user interface is clean: the float8 module has separate configs for each of the gemms. Configuring this can be exposed in module utilites in a future PR.

The implementation is a bit ugly. given the 3 gemms in fw+bw

Y = X @ W_t
gradX = gradY @ W
gradW = X_t @ gradY

and the fact that a torch.mm doesn't have access to any global state other than its arguments, we need to ensure that the matmul arguments contain the right state to map to the right config. We adopt a simple set of rules to do this:

if only one of the arguments to mm has a config, use the defined config
if both arguments of mm have a config, use the second argument's config
in the float8 modules, do the following:
3a. set X's config for arg0 to config_Y
3b. set W's config for arg0 and arg1 to None
3c. set gradY's config for arg0 to config_gradX, and for arg1 to
config_gradW

If 3 is done correctly, following 1 and 2 will lead to the right config being used for each gemm. It's ugly, but it works.

Test Plan:

for now, works on single GPU

pytest -s test/test_base.py
pytest -s test/test_compile.py

just a question of eng time to also make the distributed tests pass

Reviewers:

Subscribers:

Tasks:

Tags:

Summary: This PR ensures each of the 3 gemms in fw+bw of the float8 linear can have its own configuration. The user interface is clean: the float8 module has separate configs for each of the gemms. Configuring this can be exposed in module utilites in a future PR. The implementation is a bit ugly. given the 3 gemms in fw+bw ``` Y = X @ W_t gradX = gradY @ W gradW = X_t @ gradY ``` and the fact that a `torch.mm` doesn't have access to any global state other than its arguments, we need to ensure that the matmul arguments contain the right state to map to the right config. We adopt a simple set of rules to do this: 1. if only one of the arguments to mm has a config, use the defined config 2. if both arguments of mm have a config, use the second argument's config 3. in the float8 modules, do the following: 3a. set X's config for arg0 to config_Y 3b. set W's config for arg0 and arg1 to None 3c. set gradY's config for arg0 to config_gradX, and for arg1 to config_gradW If 3 is done correctly, following 1 and 2 will lead to the right config being used for each gemm. It's ugly, but it works. Test Plan: for now, works on single GPU ``` pytest -s test/test_base.py pytest -s test/test_compile.py ``` just a question of eng time to also make the distributed tests pass Reviewers: Subscribers: Tasks: Tags:

vkuzo · 2024-07-19T19:41:13Z

closing in favor of #315

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 10, 2024

vkuzo closed this Jul 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[wip] make all 3 gemms in float8 linear configurable #258

[wip] make all 3 gemms in float8 linear configurable #258

vkuzo commented May 10, 2024 •

edited by drisspg

Loading

vkuzo commented Jul 19, 2024

[wip] make all 3 gemms in float8 linear configurable #258

[wip] make all 3 gemms in float8 linear configurable #258

Conversation

vkuzo commented May 10, 2024 • edited by drisspg Loading

vkuzo commented Jul 19, 2024

vkuzo commented May 10, 2024 •

edited by drisspg

Loading