Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] I can't run DeepSeek V3 using SGlang #596

Closed
vabatista opened this issue Feb 6, 2025 · 9 comments
Closed

[BUG] I can't run DeepSeek V3 using SGlang #596

vabatista opened this issue Feb 6, 2025 · 9 comments

Comments

@vabatista
Copy link

Describe the bug

When run this code

import openai
client = openai.Client(
    base_url="http://host:4000", api_key="EMPTY")
#gaiab10n10

# Chat completion
response = client.chat.completions.create(
    model="deepseek-chat",
    messages=[
        {"role": "system", "content": "You are a helpful AI assistant"},
        {"role": "user", "content": "List 3 countries and their capitals."},
    ],
    temperature=0,
    stream=False,
)
print(response)

I get 404 - Not found. The api call is hiting the server:

Image

To Reproduce

I run DeepSeek V3 into SGlang using this recipe (docker version): https://github.com/sgl-project/sglang/blob/main/docs/backend/server_arguments.md

I'm using 4 cluster nodes with 4 Nvidia A100 each. Here is the command:

singularity exec \
                --env http_proxy=$HTTP_PROXY --env https_proxy=$HTTPS_PROXY --env no_proxy=$NO_PROXY \
                --nv \
                --bind $MOUNT $IMAGE_NAME bash -c \
                "export OUTLINES_CACHE_DIR=/ocache/${SLURM_JOB_ID}_1 && \
                export HF_HOME=/hf_cache \
                python3 -m sglang.launch_server \
                --model-path deepseek-ai/DeepSeek-V3 \
                --tp 16 \
                --dist-init-addr MASTER_IP:5000 \
                --nnodes 4 \
                --node-rank 0 \
                --trust-remote-code \
                --host 0.0.0.0 \
                --disable-cuda-graph \
                --port 4000"

In the other 3 hosts I change only the --node-rank parameter

Expected behavior
Get the response using the API

Additional context
One strange behavior is that the server was up into 3rd node, not in the master.

@LiuJiaqi0505
Copy link

我遇到了同样的问题,不过我是使用VLLM来进行启动的.当访问0.0.0.0:8000时候,会有以下错误

Image

@LiuJiaqi0505
Copy link

在浏览器中是这样的的

Image

@syxadd
Copy link

syxadd commented Feb 7, 2025

Did you run the fp8 model on 4x4 A100 clusters? Maybe your model name can be changed to "default".

@vabatista
Copy link
Author

I also tried "default" in model name. Same issue.

One thing I noticed is that few minutes later, all nodes shutdown with quantization error. But even with this model: unsloth/DeepSeek-V3-bf16 it also crashes later saying my architecture doesn't support fp8. This message come from triton component.

@syxadd
Copy link

syxadd commented Feb 7, 2025

@vabatista I met the same issue on A100 machines, that fp8e4nv data type is not supported on CUDA arch < 89. I don't know whether it is because A100 is currently not yet supported or the model is not effectively converted to bf16 type. But I see in your log that the model inference server seems already to run successfully, so you may do some special things ?

@vabatista
Copy link
Author

@syxadd What does you mean with "so you may do some special things"?
I used the command above to run the nodes into my cluster.

@syxadd
Copy link

syxadd commented Feb 7, 2025

@vabatista Did you use the Official model repo files, not the bf16 format? I failed to run the model on A100. I guess that the A100 seems not to be supported yet to run the official model.

@vabatista
Copy link
Author

My first try was to use the official model. Then I also tried unsloth/DeepSeek-V3-bf16.
I guess using --disable-cuda-graph enable both to run the webserver, but both crashed few minutes after.

@mowentian
Copy link
Contributor

please go SGlang community for more help,thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants