Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance degrades after converting checkpoint to HF #716

Open
ahmadshapiro opened this issue Aug 28, 2024 · 0 comments
Open

Performance degrades after converting checkpoint to HF #716

ahmadshapiro opened this issue Aug 28, 2024 · 0 comments
Labels
type/question An issue that's a question

Comments

@ahmadshapiro
Copy link

❓ The question

I've recently extended the pretraining of OLMO on a different language for an additional 30K steps. After converting the checkpoint to Hugging Face format, I observed that the model was generating gibberish.

After an extensive debugging session, I identified the root cause: the issue arises when splitting the attn_proj linear layer into three separate q, k, and v layers. Specifically, the problem stems from the use of randomized algorithms in matrix multiplication within cuBLAS. The operation hidden_states @ attn_proj followed by a torch split yields results that differ in decimal precision (up to the 4th digit) from the individual operations hidden_states @ q, hidden_states @ k, and hidden_states @ v.

@ahmadshapiro ahmadshapiro added the type/question An issue that's a question label Aug 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/question An issue that's a question
Projects
None yet
Development

No branches or pull requests

1 participant