Replies: 2 comments 4 replies
-
Hello @tanmayv25 @kaiyux sorry for tagging you folks directly. After posting the discussion here I realized that tensorrtllm_backend repository asks to create an issue in case of any questions or doubts. |
Beta Was this translation helpful? Give feedback.
-
Hi @bhavin192 , when you said that you "built the engine with the same Triton image, it just worked without any issues" -- does this mean that you ran |
Beta Was this translation helpful? Give feedback.
-
I'm writing a blog post which first builds the engine using
nvidia/cuda:12.4.0-devel-ubuntu22.04
image and then runs it with Triton.Building the engine
I'm installing required dependencies and then tensorrt_llm with:
This pulls TensorRT version 10.0.1:
Running it with Triton
Now when I try to run this engine with Triton image
nvcr.io/nvidia/tritonserver:24.06-trtllm-python-py3
, it fails with this error:But if we check the TensorRT and tensorrt_llm versions installed in this image, they match what I had during engine build process.
More confusing part is the release notes and support matrix which lists completely different version of TensorRT for 24.06 image of tritonserver.
https://docs.nvidia.com/deeplearning/triton-inference-server/release-notes/rel-24-06.html and https://docs.nvidia.com/deeplearning/frameworks/support-matrix/index.html says TensorRT 10.1.0.27, but we have 10.0.1.6 instead in the image.
Building the engine within Triton
When I built the engine with the same Triton image, it just worked without any issues. I'm aware that, that's the recommended way to do this, i.e. use the same image which is going to be used to run Triton. I'm going to add that note in my blog post, but I still wanted to understand what's happening.
Questions
So questions I'm trying to find answers to (sorry the list turned out to be a bit bigger than I though):
tritonserver:yy.mm-trtllm-python-py3
are special images with tensorrt_llm backend in them, the versions of TensorRT and TensorRT-LLM in this image can differ from what is written in support matrix, is it correct?Serialized Engine Version
, does it depend on something else than TensorRT and TensorRT-LLM version?Related issues I have read already:
Beta Was this translation helpful? Give feedback.
All reactions