add build resources in trt-llm config #1423

michaelfeil · 2025-03-04T22:28:52Z

🚀 Background

current engine builder jobs need more GPU memory CPU memory and CPUs in the build job than in the deployment.
Generally speaking, the number deployment typically needs only half of the gpus that are needed to set during build time. number_builder_gpus: int = 2*deployment_gpus

For some models, we also need more CPU memory for the build stage, especially in cases where CPU memory * 2 < VRAM.

Examples:
For e.g. a 70B model from fp16 to fp8 we need around:

140GB Vram
140GB (loading the model) (potentally + 70GB cpu memory for export.)
-> H100:1 has currently 234GB - works

L4:1 and A10G:1 have issues, as they have issues with memory.

This PR is just a draft, there could potentially be a much better way to implement this, e.g. by finding the actual replica and get the request limits from it, instead of "blindly" doubeling.

💻 How

🔬 Testing

🚢 Release requirements

Pre-release: Specify here
Post-release: Specify here
External communication: Specify here

michaelfeil added 5 commits March 4, 2025 22:28

add build resources

79b9e0d

add truss trt-llm config

49309e9

add accelerator reloading

43e5883

issue on truss config

af1c061

add exception for double import

aa677e1

michaelfeil marked this pull request as draft March 7, 2025 04:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add build resources in trt-llm config #1423

add build resources in trt-llm config #1423

michaelfeil commented Mar 4, 2025 •

edited

Loading

add build resources in trt-llm config #1423

Are you sure you want to change the base?

add build resources in trt-llm config #1423

Conversation

michaelfeil commented Mar 4, 2025 • edited Loading

🚀 Background

💻 How

🔬 Testing

🚢 Release requirements

michaelfeil commented Mar 4, 2025 •

edited

Loading