Skip to content
This repository has been archived by the owner on Aug 7, 2024. It is now read-only.

[QST] Dynamic Scaling #274

Closed
jeromeku opened this issue May 31, 2024 · 3 comments
Closed

[QST] Dynamic Scaling #274

jeromeku opened this issue May 31, 2024 · 3 comments

Comments

@jeromeku
Copy link

@vkuzo

Great work on fp8 thus far.

Regarding performance of float8, why is the performance ofdynamic better than delayed scaling per this chart?

I thought the downside of the simpler stateless dynamic approach was that it was more computationally costly.

What other dynamic scaling approaches have been tried other than per-tensor?

@vkuzo
Copy link
Contributor

vkuzo commented May 31, 2024

hi @jeromeku , we are planning for next half now and I updated #187 with some additional details. The tl;dr; is that we haven't focused on delayed scaling in the past months because of accuracy issues reported by our customers. There are known gaps in inductor codegen today for delayed scaling which we haven't gotten to yet, so we aren't running the optimal triton code for this case. I don't have a writeup in an OSS format at the moment but happy to make one if useful.

However, I'd like to resurrect the excitement for delayed scaling given some of the recent data we've collected that shows the accuracy issues might be localized to gradient scaling. My hope is that if we make delayed scaling configurable by activation vs weight vs grad, we can keep grads dynamically scaled (slower but more accurate) and use delayed scaling for activations and weights. If this works out accuracy wise, I plan to fix / get people to fix the performance issues with the inductor code.

@vkuzo
Copy link
Contributor

vkuzo commented May 31, 2024

What other dynamic scaling approaches have been tried other than per-tensor?

pytorch/pytorch#125204 just landed which adds eager mode support for rowwise scaling, inductor work is coming up to enable autotuning.

We are also thinking about how to enable blockwise gemms, but that is super early. Long term we'd like for every scaling type to be supported here with an eager mode reference and inductor support for autotuning and prologue/epilogue fusion.

@vkuzo
Copy link
Contributor

vkuzo commented Jul 30, 2024

closing since this was a question instead of a feature request. We are actively working on both speeding up delayed per-tensor scaling as well as adding rowwise scaling. Our code moved to https://github.com/pytorch/ao/tree/main/torchao/float8, please feel free to open an issue there if relevant!

@vkuzo vkuzo closed this as completed Jul 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants