Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CK_TILE] Implement fp8 quant tests/examples for layernorm and rmsnorm #1814

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

ruanjm
Copy link
Contributor

@ruanjm ruanjm commented Jan 15, 2025

  • Compile option --offload-compress is added because the code object is too large. I'm not sure whether this flag would affect performance.
  • Y scale base of fp8 is set as 240 which is different from that of int8 (127). This base value kMaxY is hard coded in host code because static_cast in host side cannot interpret fp8 to float correctly.
  • Outputs of check_err() for fp8 is improved.

@ruanjm ruanjm force-pushed the amd/dev/jruan/norm_fp8 branch from 054a2ee to d7d27a6 Compare January 15, 2025 04:32
@ruanjm ruanjm changed the title Implement fp8 quant tests/examples for layernorm and rmsnorm [CK_TILE] Implement fp8 quant tests/examples for layernorm and rmsnorm Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant