You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hi @idantene
could u plz confirm the above cmd is correct?
I got this error msg build.py: error: argument --kv_cache_type: invalid KVCacheType value: 'disable'
System Info
tensorrt_llm==0.17.0.dev2024121700
tensorrt==10.7.0
nvidia-modelopt==0.19.0
Who can help?
@kaiyux @byshiue
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Trying to build Llama 3.1 8B as per the docs with:
trtllm-build --checkpoint_dir ./dummy_llama_converted_ckpt --output_dir ./dummy_llama_engine --max_batch_size 1 --max_input_len 1024 --max_seq_len 2048 --kv_cache_type disabled --gpt_attention_plugin disable --context_fmha disable --remove_input_padding disable --log_level verbose --gemm_plugin auto
Expected behavior
The model should compile into a TensorRT engine file without the
GPTAttentionPlugin
.actual behavior
Getting out-of-memory (
Killed
) in a machine with over 1TB of RAM.additional notes
This seems like an obvious memory leak somewhere. We were not able to identify the cause.
The text was updated successfully, but these errors were encountered: