feat: TensorRT-LLM Inflight batching #29

tikikun · 2024-03-21T07:06:13Z

Relevant docs can be found here.

Inflight batching is the most beneficial feature in CUDA system for LLM inferencing right now it can enable very high throughput.

github-actions · 2024-07-25T02:21:50Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."

tikikun added the P1: important Important feature / fix label Mar 21, 2024

tikikun self-assigned this Mar 21, 2024

github-actions bot added the stale label Jul 25, 2024

imtuyethan removed the P1: important Important feature / fix label Aug 29, 2024

imtuyethan unassigned tikikun Aug 29, 2024

dan-homebrew changed the title ~~feat: Enable inflight batching in nitro-tensorrt-llm~~ feat: TensorRT-LLM Inflight batching Sep 8, 2024

dan-homebrew mentioned this issue Sep 15, 2024

epic: Cortex TensorRT-LLM support janhq/cortex.cpp#1152

Open

7 tasks

Provide feedback