[Performance] What is the purpose of compiling a model？ #2617

Flynn-Zh · 2024-12-24T10:03:37Z

System Info

CPU: x86_64
GPU: NVIDIA L40
CUDA: 12.2
OS: ubuntu 22.04
TensorRT-LLM: 0.15.0

Who can help?

@kaiyux

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

convert model: python3 qwen/convert_checkpoint.py --model_dir ./Qwen2.5-32B-Instruct-GPTQ-Int4/ --dtype float16 --use_weight_only --weight_only_precision int4_gptq --output_dir ./trt_engines/Int4/
compile model: trtllm-build --checkpoint_dir ./trt_engines/Int4 --gemm-plugin auto --output_dir ./trt_engines/compiled-model/
run server-1: trtllm-serve ./trt_engines/Int4/ --host 0.0.0.0 --port 8000
run server-2: trtllm-serve ./trt_engines/compiled-model/ --host 0.0.0.0 --port 8000
when i use v1/chat/completions with 9k words prompt to test server-1 and server-2, they need about 12 seconds to return all the answers，so，What is the purpose of compiling a model？Or require certain configurations？

Expected behavior

compile model can improve performance or others

actual behavior

none

additional notes

none

The text was updated successfully, but these errors were encountered:

nv-guomingz · 2024-12-24T15:26:21Z

hi @Flynn-Zh please refer to the doc https://nvidia.github.io/TensorRT-LLM/architecture/overview.html#tensorrt-llm-architecture for details.

Flynn-Zh added the bug Something isn't working label Dec 24, 2024

nv-guomingz added triaged Issue has been triaged by maintainers and removed bug Something isn't working labels Dec 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] What is the purpose of compiling a model？ #2617

[Performance] What is the purpose of compiling a model？ #2617

Flynn-Zh commented Dec 24, 2024

nv-guomingz commented Dec 24, 2024

[Performance] What is the purpose of compiling a model？ #2617

[Performance] What is the purpose of compiling a model？ #2617

Comments

Flynn-Zh commented Dec 24, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

nv-guomingz commented Dec 24, 2024