We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
session 1 stats: [[0.0, 2048, 0.0], [0.0, 2048, 0.0], [0.0, 2048, 0.0], [0.0, 2048, 0.0], [0.0, 2048, 0.0], [0.0, 2048, 0.0], [0.0, 2048, 0.0], [0.0, 2048, 0.0], [0.0, 2048, 0.0], [0.0, 2048, 0.0]] -------------------------------------------------- profile_generation.py:143: RuntimeWarning: divide by zero encountered in scalar divide throughput = np.sum(stats[:, 1], axis=0) / np.sum(stats[:, 2], -------------------------------------------------- concurrency: 1, input_tokens: 0, output_tokens: 2048 elapsed_time: 0.12s first_token latency(min, max, ave): 0.00s, 0.00s, 0.00s token latency(min, max, ave): 0.00s, 0.00s, 0.00s throughput: inf token/s
and the following error returned.
concurrency: 8, input_tokens: 0, output_tokens: 2048 elapsed_time: 0.18s first_token latency(min, max, ave): 0.00s, 0.01s, 0.01s token latency(min, max, ave): 0.00s, 0.01s, 0.01s throughput: -3200.00 token/s
python3 -m lmdeploy.serve.turbomind.deploy --model-name llama2 --model-path ./llama2-chat-7b-w4 --model-format awq --group-size 128
No response
The text was updated successfully, but these errors were encountered:
#507 has resolved this issue.
Sorry, something went wrong.
lvhan028
Successfully merging a pull request may close this issue.
Checklist
Describe the bug
and the following error returned.
Reproduction
python3 -m lmdeploy.serve.turbomind.deploy --model-name llama2 --model-path ./llama2-chat-7b-w4 --model-format awq --group-size 128
Error traceback
No response
The text was updated successfully, but these errors were encountered: