This repository was archived by the owner on Aug 7, 2024. It is now read-only.
This repository was archived by the owner on Aug 7, 2024. It is now read-only.
[QST] Dynamic Scaling #274
Closed
Description
Great work on fp8
thus far.
Regarding performance of float8
, why is the performance ofdynamic
better than delayed scaling
per this chart?
I thought the downside of the simpler stateless dynamic
approach was that it was more computationally costly.
What other dynamic scaling approaches have been tried other than per-tensor
?
Metadata
Metadata
Assignees
Labels
No labels