-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] FP8 support. #1484
[WIP] FP8 support. #1484
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
Would it be worth to discuss about transformers engine implementation since they say it's compatible with ada lovelace and higher ? the implementation seems to be very easy, as its already done in accelerate |
Well it works whereever FP8 works, H100 are the most common cluster ones I think.
The implementation is silently not done with dimensions are not 16 aligned, this is not acceptable in TGI. There are 2 issues with the current PR implem. That's what makes it slow, and TransformerEngine is just as slow: https://github.com/NVIDIA/TransformerEngine/blob/main/transformer_engine/pytorch/module/linear.py#L91C13-L92 So afaik there's no win to using transformer engine over this implementation. And since LLM mostly operate at seqlen < 16 (during decode this is pretty much batch_size), we also most likely need a customized gemv to make things faster (This limitation on size is very suprising to me) |
The pytorch lab implementation will have inference support later pytorch-labs/float8_experimental#187 (comment) I am also checking the tensorrt llm implementation of fp8 |
What does this PR do?
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.