-
Notifications
You must be signed in to change notification settings - Fork 32
feat: Parameterize TensorRT allocation strategy #109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,4 @@ | ||
// Copyright 2022-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
// Copyright 2022-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved. | ||
// | ||
// Redistribution and use in source and binary forms, with or without | ||
// modification, are permitted provided that the following conditions | ||
|
@@ -1693,19 +1693,6 @@ ModelInstanceState::InitIOIndexMap() | |
TRITONSERVER_Error* | ||
ModelInstanceState::InitOptimizationProfiles() | ||
{ | ||
// TRT sets the optimization profile index to be 0 implicitly with | ||
// the first context creation. As currently triton supports one | ||
// context per engine, in order to set the specified profile_index, | ||
// another context is created and the previous context is destroyed. | ||
std::shared_ptr<nvinfer1::IExecutionContext> default_trt_context( | ||
engine_->createExecutionContext()); | ||
if (default_trt_context == nullptr) { | ||
return TRITONSERVER_ErrorNew( | ||
TRITONSERVER_ERROR_INTERNAL, | ||
(std::string("unable to create TensorRT context: ") + | ||
model_state_->GetTensorRTLogger().LastErrorMsg()) | ||
.c_str()); | ||
} | ||
std::vector<std::pair<std::string, int>> profile_name_index; | ||
// No optimization profile is set for this TensorRT plan | ||
if (ProfileNames().empty()) { | ||
|
@@ -1736,17 +1723,19 @@ ModelInstanceState::InitOptimizationProfiles() | |
.c_str()); | ||
continue; | ||
} | ||
if (profile_index == 0) { | ||
res.first->second.context_ = std::move(default_trt_context); | ||
} else { | ||
res.first->second.context_.reset(engine_->createExecutionContext()); | ||
if (res.first->second.context_ == nullptr) { | ||
return TRITONSERVER_ErrorNew( | ||
TRITONSERVER_ERROR_INTERNAL, | ||
(std::string("unable to create TensorRT context: ") + | ||
model_state_->GetTensorRTLogger().LastErrorMsg()) | ||
.c_str()); | ||
} | ||
|
||
// Create a new execution context for the profile | ||
res.first->second.context_.reset( | ||
engine_->createExecutionContext(model_state_->AllocationStrategy())); | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm concerned the tests in triton-inference-server/server#8150 aren't comprehensive enough if they didn't catch the allocation strategy not being passed in. Is there a simple test that can be added to confirm the correct allocation strategy is being used? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I tried various plan models from
If we use custom model built for A100, we need to either add new script in gen_qa_model_repository or only allow test to run on A100 GPU. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually, I loaded all plan models from Anmol Gupta has confirmed the change works. See #108 (comment). Can we merge the existing test triton-inference-server/server#8150 to main? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, we can go with the manual confirmation from Anmol for now, as they were the requester and we don't want to delay getting this in. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks. I have ask Anmol to try to generate a sample TRT model using our There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In case Anmol doesn't have a simple one, something like |
||
if (res.first->second.context_ == nullptr) { | ||
return TRITONSERVER_ErrorNew( | ||
TRITONSERVER_ERROR_INTERNAL, | ||
(std::string("unable to create TensorRT context: ") + | ||
model_state_->GetTensorRTLogger().LastErrorMsg()) | ||
.c_str()); | ||
} | ||
|
||
if (profile_index != 0) { | ||
if (!res.first->second.context_->setOptimizationProfileAsync( | ||
profile_index, stream_)) { | ||
return TRITONSERVER_ErrorNew( | ||
|
Uh oh!
There was an error while loading. Please reload this page.