(Memory leak) trtllm-build gets OOM without GPTAttentionPlugin #2690

idantene · 2025-01-14T09:51:32Z

System Info

CPU architecture: x86_64
CPU/Host memory size: 1TB
GPU name: NVIDIA A100-SXM4-40GB (up to 8 of these available)
Libraries
- tensorrt_llm==0.17.0.dev2024121700
- tensorrt==10.7.0
- nvidia-modelopt==0.19.0
- CUDA: 12.6.3
OS: Ubuntu 22.04.5 LTS

Who can help?

@kaiyux @byshiue

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Trying to build Llama 3.1 8B as per the docs with:

trtllm-build --checkpoint_dir ./dummy_llama_converted_ckpt --output_dir ./dummy_llama_engine --max_batch_size 1 --max_input_len 1024 --max_seq_len 2048 --kv_cache_type disabled --gpt_attention_plugin disable --context_fmha disable --remove_input_padding disable --log_level verbose --gemm_plugin auto

Expected behavior

The model should compile into a TensorRT engine file without the GPTAttentionPlugin.

actual behavior

Getting out-of-memory (Killed) in a machine with over 1TB of RAM.

additional notes

This seems like an obvious memory leak somewhere. We were not able to identify the cause.

The text was updated successfully, but these errors were encountered:

nv-guomingz · 2025-01-14T15:09:16Z

hi @idantene
could u plz confirm the above cmd is correct?
I got this error msg
build.py: error: argument --kv_cache_type: invalid KVCacheType value: 'disable'

idantene · 2025-01-14T15:53:28Z

Sorry @nv-guomingz, that was an obvious typo, it should be disabled (I've edited the post above).

idantene · 2025-01-24T15:56:53Z

Any updates @nv-guomingz?

idantene added the bug Something isn't working label Jan 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(Memory leak) trtllm-build gets OOM without GPTAttentionPlugin #2690

(Memory leak) trtllm-build gets OOM without GPTAttentionPlugin #2690

idantene commented Jan 14, 2025 •

edited

Loading

nv-guomingz commented Jan 14, 2025

idantene commented Jan 14, 2025

idantene commented Jan 24, 2025

(Memory leak) trtllm-build gets OOM without GPTAttentionPlugin #2690

(Memory leak) trtllm-build gets OOM without GPTAttentionPlugin #2690

Comments

idantene commented Jan 14, 2025 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

actual behavior

additional notes

nv-guomingz commented Jan 14, 2025

idantene commented Jan 14, 2025

idantene commented Jan 24, 2025

idantene commented Jan 14, 2025 •

edited

Loading