add build resources in trt-llm config #1423
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🚀 Background
number_builder_gpus: int = 2*deployment_gpus
For some models, we also need more CPU memory for the build stage, especially in cases where CPU memory * 2 < VRAM.
Examples:
For e.g. a 70B model from fp16 to fp8 we need around:
-> H100:1 has currently 234GB - works
L4:1 and A10G:1 have issues, as they have issues with memory.
This PR is just a draft, there could potentially be a much better way to implement this, e.g. by finding the actual replica and get the request limits from it, instead of "blindly" doubeling.
💻 How
🔬 Testing
🚢 Release requirements