You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I measured genai-perf by running the rtzr/ko-gemma-2-9b-it (gemma-2-9b-it fine-tuning model) model with the tritonserver vllm backend and tritonserver tensorrt_llm backend.
However, the two Output sequence length metrics are different, so I think the Output token throughput (per sec) is different.
Since output-tokens-mean was set to 100 in the argument, vllm came out as 100, and tensorrtllm seems to come out as 100 added to the input sequence length.
I ran genai-perf in nvcr.io/nvidia/tritonserver:24.07-py3-sdk docker.
Please let me know if there is anything that needs to be corrected or something I did wrong.
I'll attach the script and the results.
There was a limitation in TensorRT-LLM that prevented GenAI-Perf from setting this value automatically. That limitation might have been lifted recently. We have it in our queue to investigate whether GenAI-Perf can now take care of this for you.
Thank you for releasing a great project.
I measured
genai-perf
by running the rtzr/ko-gemma-2-9b-it (gemma-2-9b-it fine-tuning model) model with thetritonserver vllm backend
andtritonserver tensorrt_llm backend
.However, the two
Output sequence length metrics
are different, so I think theOutput token throughput (per sec)
is different.Since
output-tokens-mean
was set to 100 in the argument, vllm came out as 100, and tensorrtllm seems to come out as 100 added to theinput sequence length
.I ran genai-perf in
nvcr.io/nvidia/tritonserver:24.07-py3-sdk
docker.Please let me know if there is anything that needs to be corrected or something I did wrong.
I'll attach the script and the results.
The text was updated successfully, but these errors were encountered: