Llama 3.1 8B fp16 prefill/decode generates nan logits in sharktank #19506

archana-ramalingam · 2024-12-17T23:47:54Z

What happened?

While running iree-run-module Llama 3.1 8B fp16 prefill/decode generates nan logits with vmfb compiled after this commit:
6ff00a8

Steps to reproduce your issue

cd iree
git checkout 6ff00a8a008d06b604d4ca4e0ae6e601ae810b4f
git submodule update --init
cmake -G Ninja -B ../iree-build/ -S . -DCMAKE_BUILD_TYPE=RelWithDebInfo -DIREE_ENABLE_RUNTIME_TRACING=ON -DIREE_ENABLE_ASSERTIONS=ON -DIREE_ENABLE_SPLIT_DWARF=ON -DIREE_ENABLE_THIN_ARCHIVES=ON -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DIREE_ENABLE_LLD=ON -DIREE_BUILD_PYTHON_BINDINGS=ON -DPython3_EXECUTABLE="$(which python)" -DIREE_HAL_DRIVER_HIP=ON -DIREE_HIP_TEST_TARGET_CHIP=gfx942 -DIREE_TARGET_BACKEND_ROCM=ON && cmake --build ../iree-build/
Skip inputs & irpa download by logging on MI300x-3 system. Artifacts path: /data/llama3.1/8b/
Download prefill inputs: https://gist.github.com/archana-ramalingam/508072e2408e36a1b388f8e68e902f84
Download mlir: https://gist.github.com/archana-ramalingam/9cc4b72b82ca77e92d34fd0c11a65860
Download irpa file from Azure sharkblobs: https://github.com/nod-ai/llm-dev/blob/main/llama_benchmarking.md#1-get-the-unsharded-irpa-files
cd iree
Compile: ../iree-build/tools/iree-compile llama8b_f16_decomposed.mlir --iree-hip-target=gfx942 --iree-hal-target-backends=rocm -o=llama8b_f16_decomposed.vmfb
Run:
../iree-build/tools/iree-run-module --hip_use_streams=true --device_allocator=caching --module=llama8b_f16_decomposed.vmfb --parameters=model=8b_f16.irpa --device=hip://0 --function=prefill_bs4 --input=@~/tmp/prefill_args_bs4_128_stride_32/tokens.npy --input=@~/tmp/prefill_args_bs4_128_stride_32/seq_lens.npy --input=@~/tmp/prefill_args_bs4_128_stride_32/seq_block_ids.npy --input=@~/tmp/prefill_args_bs4_128_stride_32/cs_f16.npy
Output: https://gist.github.com/archana-ramalingam/8528e7ada4de1970f3fc5a7f97927c3b

What component(s) does this issue relate to?

Compiler

Version information

commit SHA: 6ff00a8a008d06b604d4ca4e0ae6e601ae810b4f

Additional context

No response

The text was updated successfully, but these errors were encountered:

This reverts commit 6ff00a8. The above commit causes Llama3.1 8B fp16 model to generate NaN logits for prefill/decode. Issue: #19506 Signed-off-by: archana-ramalingam <[email protected]>

MaheshRavishankar · 2024-12-18T06:06:01Z

Related #19511

pashu123 · 2024-12-18T16:57:20Z

Thanks, @archana-ramalingam for filing the issue in a detailed manner.

ScottTodd · 2025-01-10T23:41:40Z

Fixed by the revert in #19508?

archana-ramalingam · 2025-01-11T00:04:55Z

Yes, the revert #19508 fixed it. Left the issue open as @pashu123 wanted to investigate why the reverted #19335 patch generated nan logits in the first place.

archana-ramalingam added the bug 🐞 Something isn't working label Dec 17, 2024

MaheshRavishankar assigned pashu123 Dec 17, 2024

archana-ramalingam mentioned this issue Dec 18, 2024

Revert "[LLVMGPU] Deprecate the matmul simt pipeline (#19335)" #19508

Merged

ScottTodd closed this as completed Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama 3.1 8B fp16 prefill/decode generates nan logits in sharktank #19506

Llama 3.1 8B fp16 prefill/decode generates nan logits in sharktank #19506

archana-ramalingam commented Dec 17, 2024

MaheshRavishankar commented Dec 18, 2024

pashu123 commented Dec 18, 2024

ScottTodd commented Jan 10, 2025

archana-ramalingam commented Jan 11, 2025 •

edited

Loading

Llama 3.1 8B fp16 prefill/decode generates nan logits in sharktank #19506

Llama 3.1 8B fp16 prefill/decode generates nan logits in sharktank #19506

Comments

archana-ramalingam commented Dec 17, 2024

What happened?

Steps to reproduce your issue

What component(s) does this issue relate to?

Version information

Additional context

MaheshRavishankar commented Dec 18, 2024

pashu123 commented Dec 18, 2024

ScottTodd commented Jan 10, 2025

archana-ramalingam commented Jan 11, 2025 • edited Loading

archana-ramalingam commented Jan 11, 2025 •

edited

Loading