Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add build resources in trt-llm config #1423

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

michaelfeil
Copy link
Contributor

@michaelfeil michaelfeil commented Mar 4, 2025

🚀 Background

  • current engine builder jobs need more GPU memory CPU memory and CPUs in the build job than in the deployment.
  • Generally speaking, the number deployment typically needs only half of the gpus that are needed to set during build time. number_builder_gpus: int = 2*deployment_gpus

For some models, we also need more CPU memory for the build stage, especially in cases where CPU memory * 2 < VRAM.

Examples:
For e.g. a 70B model from fp16 to fp8 we need around:

  • 140GB Vram
  • 140GB (loading the model) (potentally + 70GB cpu memory for export.)
    -> H100:1 has currently 234GB - works

L4:1 and A10G:1 have issues, as they have issues with memory.

This PR is just a draft, there could potentially be a much better way to implement this, e.g. by finding the actual replica and get the request limits from it, instead of "blindly" doubeling.

💻 How

🔬 Testing

🚢 Release requirements

  • Pre-release: Specify here
  • Post-release: Specify here
  • External communication: Specify here

@michaelfeil michaelfeil marked this pull request as draft March 7, 2025 04:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant