[BUG] I can't run DeepSeek V3 using SGlang #596

vabatista · 2025-02-06T18:46:23Z

Describe the bug

When run this code

import openai
client = openai.Client(
    base_url="http://host:4000", api_key="EMPTY")
#gaiab10n10

# Chat completion
response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "system", "content": "You are a helpful AI assistant"},
        {"role": "user", "content": "List 3 countries and their capitals."},
    ],
    temperature=0,
    stream=False,
)
print(response)

I get 404 - Not found. The api call is hiting the server:

To Reproduce

I run DeepSeek V3 into SGlang using this recipe (docker version): https://github.com/sgl-project/sglang/blob/main/docs/backend/server_arguments.md

I'm using 4 cluster nodes with 4 Nvidia A100 each. Here is the command:

singularity exec \
                --env http_proxy=$HTTP_PROXY --env https_proxy=$HTTPS_PROXY --env no_proxy=$NO_PROXY \
                --nv \
                --bind $MOUNT $IMAGE_NAME bash -c \
                "export OUTLINES_CACHE_DIR=/ocache/${SLURM_JOB_ID}_1 && \
                export HF_HOME=/hf_cache \
                python3 -m sglang.launch_server \
                --model-path deepseek-ai/DeepSeek-V3 \
                --tp 16 \
                --dist-init-addr MASTER_IP:5000 \
                --nnodes 4 \
                --node-rank 0 \
                --trust-remote-code \
                --host 0.0.0.0 \
                --disable-cuda-graph \
                --port 4000"

In the other 3 hosts I change only the --node-rank parameter

Expected behavior
Get the response using the API

Additional context
One strange behavior is that the server was up into 3rd node, not in the master.

The text was updated successfully, but these errors were encountered:

LiuJiaqi0505 · 2025-02-07T06:38:54Z

我遇到了同样的问题,不过我是使用VLLM来进行启动的.当访问0.0.0.0:8000时候,会有以下错误

LiuJiaqi0505 · 2025-02-07T06:41:26Z

在浏览器中是这样的的

syxadd · 2025-02-07T07:36:33Z

Did you run the fp8 model on 4x4 A100 clusters? Maybe your model name can be changed to "default".

vabatista · 2025-02-07T08:45:33Z

I also tried "default" in model name. Same issue.

One thing I noticed is that few minutes later, all nodes shutdown with quantization error. But even with this model: unsloth/DeepSeek-V3-bf16 it also crashes later saying my architecture doesn't support fp8. This message come from triton component.

syxadd · 2025-02-07T08:55:20Z

@vabatista I met the same issue on A100 machines, that fp8e4nv data type is not supported on CUDA arch < 89. I don't know whether it is because A100 is currently not yet supported or the model is not effectively converted to bf16 type. But I see in your log that the model inference server seems already to run successfully, so you may do some special things ?

vabatista · 2025-02-07T09:28:54Z

@syxadd What does you mean with "so you may do some special things"?
I used the command above to run the nodes into my cluster.

syxadd · 2025-02-07T09:39:55Z

@vabatista Did you use the Official model repo files, not the bf16 format? I failed to run the model on A100. I guess that the A100 seems not to be supported yet to run the official model.

vabatista · 2025-02-07T09:45:35Z

My first try was to use the official model. Then I also tried unsloth/DeepSeek-V3-bf16.
I guess using --disable-cuda-graph enable both to run the webserver, but both crashed few minutes after.

mowentian · 2025-02-08T03:03:49Z

please go SGlang community for more help，thanks

mowentian closed this as completed Feb 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] I can't run DeepSeek V3 using SGlang #596

[BUG] I can't run DeepSeek V3 using SGlang #596

vabatista commented Feb 6, 2025

LiuJiaqi0505 commented Feb 7, 2025

LiuJiaqi0505 commented Feb 7, 2025

syxadd commented Feb 7, 2025

vabatista commented Feb 7, 2025

syxadd commented Feb 7, 2025

vabatista commented Feb 7, 2025

syxadd commented Feb 7, 2025

vabatista commented Feb 7, 2025

mowentian commented Feb 8, 2025

[BUG] I can't run DeepSeek V3 using SGlang #596

[BUG] I can't run DeepSeek V3 using SGlang #596

Comments

vabatista commented Feb 6, 2025

LiuJiaqi0505 commented Feb 7, 2025

LiuJiaqi0505 commented Feb 7, 2025

syxadd commented Feb 7, 2025

vabatista commented Feb 7, 2025

syxadd commented Feb 7, 2025

vabatista commented Feb 7, 2025

syxadd commented Feb 7, 2025

vabatista commented Feb 7, 2025

mowentian commented Feb 8, 2025