-
Notifications
You must be signed in to change notification settings - Fork 32
feat: Parameterize TensorRT allocation strategy #109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
} | ||
|
||
res.first->second.context_.reset( | ||
engine_->createExecutionContext(model_state_->AllocationStrategy())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned the tests in triton-inference-server/server#8150 aren't comprehensive enough if they didn't catch the allocation strategy not being passed in. Is there a simple test that can be added to confirm the correct allocation strategy is being used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried various plan models from /data/inferenceserver
model repositories. The problem is that our model's allocation size is too small to show the difference (the line with [MemUsageChange]).
I0417 21:48:34.404149 20014 tensorrt.cc:297] "TRITONBACKEND_ModelInstanceInitialize: plan_float32_float32_float32-4-32_0_0 (GPU device 0)"
I0417 21:48:34.407117 20014 logging.cc:46] "Loaded engine size: 0 MiB"
I0417 21:48:34.410343 20014 logging.cc:46] "[MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)"
I0417 21:48:34.410353 20014 logging.cc:46] "Switching optimization profile from: 0 to 6. Please ensure there are no enqueued operations pending in this context prior to switching profiles"
If we use custom model built for A100, we need to either add new script in gen_qa_model_repository or only allow test to run on A100 GPU.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I loaded all plan models from /data/inferenceserver
and none shows non-zero allocation size in line [MemUsageChange]
(even for the large models). I discussed with Anmol Gupta and this might be specific to how their test works. I guess it's not easy to prove new allocation strategy is passed in engine_->createExecutionContext(model_state_->AllocationStrategy())
.
Anmol Gupta has confirmed the change works. See #108 (comment). Can we merge the existing test triton-inference-server/server#8150 to main?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can go with the manual confirmation from Anmol for now, as they were the requester and we don't want to delay getting this in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I have ask Anmol to try to generate a sample TRT model using our gen_qa_model_repository
scirpt and provided him with instructions. He will try later and let me know if it works or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case Anmol doesn't have a simple one, something like trtexec --onnx model.onnx --saveEngine model.plan
on the Densenet ONNX model we use for quickstart might work:
What does the PR do?
ModelInstanceState::InitOptimizationProfiles()
.Checklist
<commit_type>: <Title>
Commit Type:
Check the conventional commit type
box here and add the label to the github PR.
Related PRs:
Where should the reviewer start?
Test plan:
All TensorRT related tests
L0_model_config--base
27050513
Caveats:
Background
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)