Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Llama 3.1 8B fp16 prefill/decode generates nan logits in sharktank #19506

Closed
archana-ramalingam opened this issue Dec 17, 2024 · 4 comments
Closed
Assignees
Labels
bug 🐞 Something isn't working

Comments

@archana-ramalingam
Copy link
Contributor

What happened?

While running iree-run-module Llama 3.1 8B fp16 prefill/decode generates nan logits with vmfb compiled after this commit:
6ff00a8

Steps to reproduce your issue

  1. cd iree
  2. git checkout 6ff00a8a008d06b604d4ca4e0ae6e601ae810b4f
  3. git submodule update --init
  4. cmake -G Ninja -B ../iree-build/ -S . -DCMAKE_BUILD_TYPE=RelWithDebInfo -DIREE_ENABLE_RUNTIME_TRACING=ON -DIREE_ENABLE_ASSERTIONS=ON -DIREE_ENABLE_SPLIT_DWARF=ON -DIREE_ENABLE_THIN_ARCHIVES=ON -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DIREE_ENABLE_LLD=ON -DIREE_BUILD_PYTHON_BINDINGS=ON -DPython3_EXECUTABLE="$(which python)" -DIREE_HAL_DRIVER_HIP=ON -DIREE_HIP_TEST_TARGET_CHIP=gfx942 -DIREE_TARGET_BACKEND_ROCM=ON && cmake --build ../iree-build/
  5. Skip inputs & irpa download by logging on MI300x-3 system. Artifacts path: /data/llama3.1/8b/
  6. Download prefill inputs: https://gist.github.com/archana-ramalingam/508072e2408e36a1b388f8e68e902f84
  7. Download mlir: https://gist.github.com/archana-ramalingam/9cc4b72b82ca77e92d34fd0c11a65860
  8. Download irpa file from Azure sharkblobs: https://github.com/nod-ai/llm-dev/blob/main/llama_benchmarking.md#1-get-the-unsharded-irpa-files
  9. cd iree
  10. Compile: ../iree-build/tools/iree-compile llama8b_f16_decomposed.mlir --iree-hip-target=gfx942 --iree-hal-target-backends=rocm -o=llama8b_f16_decomposed.vmfb
  11. Run:
    ../iree-build/tools/iree-run-module --hip_use_streams=true --device_allocator=caching --module=llama8b_f16_decomposed.vmfb --parameters=model=8b_f16.irpa --device=hip://0 --function=prefill_bs4 --input=@~/tmp/prefill_args_bs4_128_stride_32/tokens.npy --input=@~/tmp/prefill_args_bs4_128_stride_32/seq_lens.npy --input=@~/tmp/prefill_args_bs4_128_stride_32/seq_block_ids.npy --input=@~/tmp/prefill_args_bs4_128_stride_32/cs_f16.npy
  12. Output: https://gist.github.com/archana-ramalingam/8528e7ada4de1970f3fc5a7f97927c3b

What component(s) does this issue relate to?

Compiler

Version information

commit SHA: 6ff00a8a008d06b604d4ca4e0ae6e601ae810b4f

Additional context

No response

@archana-ramalingam archana-ramalingam added the bug 🐞 Something isn't working label Dec 17, 2024
MaheshRavishankar pushed a commit that referenced this issue Dec 18, 2024
This reverts commit 6ff00a8.
The above commit causes Llama3.1 8B fp16 model to generate NaN logits
for prefill/decode.
Issue: #19506

Signed-off-by: archana-ramalingam <[email protected]>
@MaheshRavishankar
Copy link
Contributor

Related #19511

@pashu123
Copy link
Contributor

Thanks, @archana-ramalingam for filing the issue in a detailed manner.

@ScottTodd
Copy link
Member

Fixed by the revert in #19508?

@archana-ramalingam
Copy link
Contributor Author

archana-ramalingam commented Jan 11, 2025

Yes, the revert #19508 fixed it. Left the issue open as @pashu123 wanted to investigate why the reverted #19335 patch generated nan logits in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants