Performance degrades after converting checkpoint to HF #716

ahmadshapiro · 2024-08-28T09:20:06Z

❓ The question

I've recently extended the pretraining of OLMO on a different language for an additional 30K steps. After converting the checkpoint to Hugging Face format, I observed that the model was generating gibberish.

After an extensive debugging session, I identified the root cause: the issue arises when splitting the attn_proj linear layer into three separate q, k, and v layers. Specifically, the problem stems from the use of randomized algorithms in matrix multiplication within cuBLAS. The operation hidden_states @ attn_proj followed by a torch split yields results that differ in decimal precision (up to the 4th digit) from the individual operations hidden_states @ q, hidden_states @ k, and hidden_states @ v.

The text was updated successfully, but these errors were encountered:

ahmadshapiro added the type/question An issue that's a question label Aug 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance degrades after converting checkpoint to HF #716

Performance degrades after converting checkpoint to HF #716

ahmadshapiro commented Aug 28, 2024

Performance degrades after converting checkpoint to HF #716

Performance degrades after converting checkpoint to HF #716

Comments

ahmadshapiro commented Aug 28, 2024

❓ The question