TensorRT backend? #4824

nick-pape · 2025-02-13T18:24:26Z

nick-pape
Feb 13, 2025

Wanted to get some community feedback on interest in a TensorRT backend.

TensorRT should be quite a bit faster than GGML/GGUF for folks with NVIDIA hardware. However, it comes with the tradeoff that a model needs to be specifically compiled for the machine's GPU. This step isn't terribly difficult, especially for raw PyTorch and/or TRT models.

I am not thinking, at first, that this should be built into LocalAI. Rather, it could be a standalone engine for now. Probably utilizing TensorRT and TensorRT-LLM. This project also shows how simple it could be to setup the gRPC for TensorRT.

Although it would be nice for LocalAI to provide some UX/model download hooks. That way you could check a "compile to TRT" option when downloading a model. This would make it automatic for most users.

Someone's benchmark of llama.cpp vs tensorRT-llm: https://hackmd.io/@janhq/benchmarking-tensorrt-llm . (Spoiler, they found a 50-60% speedup for Mistral 7B v0.2 GGUF Q4_K_M.)

I'm interested in putting some work into this. Mostly curious how interested the maintainers of LocalAI are in this (longer-term). As well as how interested the community would be.

mudler · 2025-02-20T17:45:12Z

mudler
Feb 20, 2025
Maintainer

This would be really interesting - initially like you suggest can be a separate engine, but would be nice to have also part as the core. The downside I see is that could push even further the image sizes. LocalAI already supports remote grpc backends which might come handy exactly for this. In the long run I'm thinking to have engines to be installed/uninstalled dynamically which might tie into this.

Compiling to TRT could be even done automatically when the model loads the first time, which should reduce friction in usage (e.g. we would need to introduce a new "compile" action which would be quite specific in this case).

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

TensorRT backend? #4824

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

TensorRT backend? #4824

Uh oh!

nick-pape Feb 13, 2025

Replies: 1 comment

Uh oh!

mudler Feb 20, 2025 Maintainer

nick-pape
Feb 13, 2025

mudler
Feb 20, 2025
Maintainer