Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(Memory leak) trtllm-build gets OOM without GPTAttentionPlugin #2690

Open
2 of 4 tasks
idantene opened this issue Jan 14, 2025 · 3 comments
Open
2 of 4 tasks

(Memory leak) trtllm-build gets OOM without GPTAttentionPlugin #2690

idantene opened this issue Jan 14, 2025 · 3 comments
Labels
bug Something isn't working

Comments

@idantene
Copy link

idantene commented Jan 14, 2025

System Info

  • CPU architecture: x86_64
  • CPU/Host memory size: 1TB
  • GPU name: NVIDIA A100-SXM4-40GB (up to 8 of these available)
  • Libraries
    • tensorrt_llm==0.17.0.dev2024121700
    • tensorrt==10.7.0
    • nvidia-modelopt==0.19.0
    • CUDA: 12.6.3
  • OS: Ubuntu 22.04.5 LTS

Who can help?

@kaiyux @byshiue

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Trying to build Llama 3.1 8B as per the docs with:

trtllm-build --checkpoint_dir ./dummy_llama_converted_ckpt --output_dir ./dummy_llama_engine --max_batch_size 1 --max_input_len 1024 --max_seq_len 2048 --kv_cache_type disabled --gpt_attention_plugin disable --context_fmha disable --remove_input_padding disable --log_level verbose --gemm_plugin auto

Expected behavior

The model should compile into a TensorRT engine file without the GPTAttentionPlugin.

actual behavior

Getting out-of-memory (Killed) in a machine with over 1TB of RAM.

additional notes

This seems like an obvious memory leak somewhere. We were not able to identify the cause.

@idantene idantene added the bug Something isn't working label Jan 14, 2025
@nv-guomingz
Copy link
Collaborator

hi @idantene
could u plz confirm the above cmd is correct?
I got this error msg
build.py: error: argument --kv_cache_type: invalid KVCacheType value: 'disable'

@idantene
Copy link
Author

Sorry @nv-guomingz, that was an obvious typo, it should be disabled (I've edited the post above).

@idantene
Copy link
Author

Any updates @nv-guomingz?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants