-
Notifications
You must be signed in to change notification settings - Fork 1k
Issues: NVIDIA/TensorRT-LLM
[Issue Template]Short one-line summary of the issue #270
#783
opened Jan 1, 2024 by
juney-nvidia
Open
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Error with LoRA Weights Data Type in Quantized TensorRT-LLM Model Execution
#2628
opened Dec 25, 2024 by
Alireza3242
[Performance] why do compiled trtllm models have bad performance compared to torch.compile models?
#2627
opened Dec 25, 2024 by
FPTMMC
[bug] forwardAsync assertion failed: Unable to get batch slot for reqId
#2626
opened Dec 25, 2024 by
akhoroshev
Does trtllm-serve support toolparser and guided-decoding? Any plan?
#2624
opened Dec 25, 2024 by
dwq370
Error during TensorRT-LLM build - Invalid shape and type mismatch in elementwise addition
#2622
opened Dec 24, 2024 by
cocovoc
support for T4
triaged
Issue has been triaged by maintainers
#2620
opened Dec 24, 2024 by
krishnanpooja
4 tasks
SIGABRT while trying to build trtllm engine for biomistral model on T4
triaged
Issue has been triaged by maintainers
#2619
opened Dec 24, 2024 by
krishnanpooja
2 of 4 tasks
[Performance] What is the purpose of compiling a model?
triaged
Issue has been triaged by maintainers
#2617
opened Dec 24, 2024 by
Flynn-Zh
4 tasks
gather_generation_logits doesn't seem to work correctly for SequenceClassification models
Generic Runtime
Investigating
triaged
Issue has been triaged by maintainers
#2615
opened Dec 24, 2024 by
TriLoo
Performance of streaming requests is worse than non-streaming
bug
Something isn't working
Investigating
Performance
Issue about performance number
triaged
Issue has been triaged by maintainers
#2613
opened Dec 24, 2024 by
activezhao
2 of 4 tasks
Adding custom sampling config
triaged
Issue has been triaged by maintainers
#2609
opened Dec 23, 2024 by
buddhapuneeth
1 of 4 tasks
Gemma 2 LoRA support
Investigating
Lora/P-tuning
triaged
Issue has been triaged by maintainers
#2606
opened Dec 21, 2024 by
Aquasar11
[Feature Request] Better support for w4a8 quantization
Investigating
Low Precision
Issue about lower bit quantization, including int8, int4, fp8
triaged
Issue has been triaged by maintainers
#2605
opened Dec 20, 2024 by
ShuaiShao93
SmoothQuant doesn't work with lora
bug
Something isn't working
Investigating
Lora/P-tuning
triaged
Issue has been triaged by maintainers
#2604
opened Dec 20, 2024 by
ShuaiShao93
4 tasks
lora doesn't work with --use_fp8_rowwise
bug
Something isn't working
#2603
opened Dec 20, 2024 by
ShuaiShao93
4 tasks
--use_fp8 doesn't work with llama 3.1 8b
bug
Something isn't working
#2602
opened Dec 20, 2024 by
ShuaiShao93
4 tasks
No module named 'tensorrt_llm.bindings'
bug
Something isn't working
#2599
opened Dec 20, 2024 by
WGS-note
2 of 4 tasks
[Performance] TTFT of qwen2.5 0.5B model
bug
Something isn't working
#2598
opened Dec 20, 2024 by
ReginaZh
4 tasks
trtllm-serve : Failure to launch openAI-API on multiple nodes with 8 GPUs each
#2594
opened Dec 19, 2024 by
sivabreddy
Previous Next
ProTip!
Type g i on any issue or pull request to go back to the issue listing page.