[Bug] SGLang v0.4.0 with AMD MI300X #2530

BruceXcluding · 2024-12-20T00:51:24Z

Checklist

1. I have searched related issues but cannot get the expected help.
2. The bug has not been fixed in the latest version.
3. Please note that if the bug-related issue you submitted lacks corresponding environment info and a minimal reproducible demo, it will be challenging for us to reproduce and resolve the issue, reducing the likelihood of receiving feedback.
4. If the issue you raised is not a bug but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
5. Please use English, otherwise it will be closed.

Describe the bug

SGLang v0.4.x AMD MI300X Workload Debug

Bug report:

hip vllm version update
rocm_vllm v0.6.5 deps on outlines==0.1.11
triton compiler error with decode attention

Share AMD MI300X workable workload below

Reproduction

docker image: rocm/vllm-dev:20241218
pip uninstall vllm
git clone https://github.com/ROCm/vllm.git rocm_vllm & cd rocm_vllm
python setup.py develop & cd ..
git clone https://github.com/sgl-project/sglang.git & cd sglang
vim python/pyproject.toml +21 "orjson", "outlines>=0.1.7", "outlines-core>=0.1.17"
vim python/pyproject.toml +30 vllm==0.6.5.dev411+gd08b78b5.rocm634
vim python/sglang/srt/constrained/outlines_backend.py +23 from outlines_core.fsm.json_schema import build_regex_from_schema
vim python/sglang/srt/layers/attention/triton_ops/decode_attention.py +405 BLOCK=32 -> BLOCK=16
pip install -e "python[all_hip]"

one-batch:

python -m sglang.bench_one_batch --batch-size 32 --input 128 --output 32 --model /data/deepseekv2-lite/ --dp 1 --tp 1 --trust-remote-code

server-client:

python3 -m sglang.launch_server --model-path /data/deepseekv2-lite/ --disable-radix-cache --trust-remote-code --tp 2 --enable-dp-attention --mem-fraction-static 0.78

python3 -m sglang.bench_serving --backend sglang --dataset-name random --random-input 1 --random-output 32 --random-range-ratio 1 --num-prompts 1000

Environment

rocm/vllm-dev:20241218
sglang/main commit d95a5f5
ROCm/vllm/main commit d08b78b50c94239beca3701d286c6d6202b44bd9

The text was updated successfully, but these errors were encountered:

ZJLi2013 · 2024-12-20T03:59:30Z

nice fix to support deepseek-v2 inference.

as the base image rocm/vllm-dev:20241218 has rocm/vllm installed version as 0.6.5.dev407+gd9fed263.rocm634 , no need to rebuild from source, just change one line in pyproject.toml, make things even simpler.

srt_hip = ["sglang[runtime_common]", "torch", "vllm==0.6.5.dev407+gd9fed263.rocm634"]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] SGLang v0.4.0 with AMD MI300X #2530

[Bug] SGLang v0.4.0 with AMD MI300X #2530

BruceXcluding commented Dec 20, 2024 •

edited

Loading

ZJLi2013 commented Dec 20, 2024

[Bug] SGLang v0.4.0 with AMD MI300X #2530

[Bug] SGLang v0.4.0 with AMD MI300X #2530

Comments

BruceXcluding commented Dec 20, 2024 • edited Loading

Checklist

Describe the bug

SGLang v0.4.x AMD MI300X Workload Debug

Reproduction

one-batch:

server-client:

Environment

ZJLi2013 commented Dec 20, 2024

BruceXcluding commented Dec 20, 2024 •

edited

Loading