You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
convert model: python3 qwen/convert_checkpoint.py --model_dir ./Qwen2.5-32B-Instruct-GPTQ-Int4/ --dtype float16 --use_weight_only --weight_only_precision int4_gptq --output_dir ./trt_engines/Int4/
compile model: trtllm-build --checkpoint_dir ./trt_engines/Int4 --gemm-plugin auto --output_dir ./trt_engines/compiled-model/
run server-1: trtllm-serve ./trt_engines/Int4/ --host 0.0.0.0 --port 8000
run server-2: trtllm-serve ./trt_engines/compiled-model/ --host 0.0.0.0 --port 8000
when i use v1/chat/completions with 9k words prompt to test server-1 and server-2, they need about 12 seconds to return all the answers,so,What is the purpose of compiling a model?Or require certain configurations?
Expected behavior
compile model can improve performance or others
actual behavior
none
additional notes
none
The text was updated successfully, but these errors were encountered:
System Info
CPU: x86_64
GPU: NVIDIA L40
CUDA: 12.2
OS: ubuntu 22.04
TensorRT-LLM: 0.15.0
Who can help?
@kaiyux
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
convert model: python3 qwen/convert_checkpoint.py --model_dir ./Qwen2.5-32B-Instruct-GPTQ-Int4/ --dtype float16 --use_weight_only --weight_only_precision int4_gptq --output_dir ./trt_engines/Int4/
compile model: trtllm-build --checkpoint_dir ./trt_engines/Int4 --gemm-plugin auto --output_dir ./trt_engines/compiled-model/
run server-1: trtllm-serve ./trt_engines/Int4/ --host 0.0.0.0 --port 8000
run server-2: trtllm-serve ./trt_engines/compiled-model/ --host 0.0.0.0 --port 8000
when i use v1/chat/completions with 9k words prompt to test server-1 and server-2, they need about 12 seconds to return all the answers,so,What is the purpose of compiling a model?Or require certain configurations?
Expected behavior
compile model can improve performance or others
actual behavior
none
additional notes
none
The text was updated successfully, but these errors were encountered: