Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] generation benchmark failed on llama2-chat-7b-w4 #505

Closed
2 tasks done
del-zhenwu opened this issue Sep 27, 2023 · 1 comment
Closed
2 tasks done

[Bug] generation benchmark failed on llama2-chat-7b-w4 #505

del-zhenwu opened this issue Sep 27, 2023 · 1 comment
Assignees

Comments

@del-zhenwu
Copy link
Contributor

del-zhenwu commented Sep 27, 2023

Checklist

  • 1. I have searched related issues but cannot get the expected help.
  • 2. The bug has not been fixed in the latest version.

Describe the bug

session 1 stats: 
[[0.0, 2048, 0.0], [0.0, 2048, 0.0], [0.0, 2048, 0.0], [0.0, 2048, 0.0], [0.0, 2048, 0.0], [0.0, 2048, 0.0], [0.0, 2048, 0.0], [0.0, 2048, 0.0], [0.0, 2048, 0.0], [0.0, 2048, 0.0]]
--------------------------------------------------
profile_generation.py:143: RuntimeWarning: divide by zero encountered in scalar divide
  throughput = np.sum(stats[:, 1], axis=0) / np.sum(stats[:, 2],
--------------------------------------------------
concurrency: 1, input_tokens: 0, output_tokens: 2048
elapsed_time: 0.12s
first_token latency(min, max, ave): 0.00s, 0.00s, 0.00s
token latency(min, max, ave): 0.00s, 0.00s, 0.00s
throughput: inf token/s

and the following error returned.

concurrency: 8, input_tokens: 0, output_tokens: 2048
elapsed_time: 0.18s
first_token latency(min, max, ave): 0.00s, 0.01s, 0.01s
token latency(min, max, ave): 0.00s, 0.01s, 0.01s
throughput: -3200.00 token/s

Reproduction

python3 -m lmdeploy.serve.turbomind.deploy --model-name llama2 --model-path ./llama2-chat-7b-w4 --model-format awq --group-size 128

Error traceback

No response

@lvhan028
Copy link
Collaborator

#507 has resolved this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants