feat: Parameterize TensorRT allocation strategy #109

yinggeh · 2025-04-16T12:41:32Z

What does the PR do?

Support per-model configuration of ExecutionContextAllocationStrategy to TensorRT backend.
Fix extra memory allocation when profile_index is not zero during ModelInstanceState::InitOptimizationProfiles().

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

feat

Related PRs:

Where should the reviewer start?

Test plan:

All TensorRT related tests
L0_model_config--base

CI Pipeline ID:
27050513

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

src/model_state.cc

src/model_state.h

src/instance_state.cc

rmccorm4 · 2025-04-16T22:23:55Z

src/instance_state.cc

-      }
+
+    res.first->second.context_.reset(
+        engine_->createExecutionContext(model_state_->AllocationStrategy()));


I'm concerned the tests in triton-inference-server/server#8150 aren't comprehensive enough if they didn't catch the allocation strategy not being passed in. Is there a simple test that can be added to confirm the correct allocation strategy is being used?

I tried various plan models from /data/inferenceserver model repositories. The problem is that our model's allocation size is too small to show the difference (the line with [MemUsageChange]).

I0417 21:48:34.404149 20014 tensorrt.cc:297] "TRITONBACKEND_ModelInstanceInitialize: plan_float32_float32_float32-4-32_0_0 (GPU device 0)" I0417 21:48:34.407117 20014 logging.cc:46] "Loaded engine size: 0 MiB" I0417 21:48:34.410343 20014 logging.cc:46] "[MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 0 (MiB)" I0417 21:48:34.410353 20014 logging.cc:46] "Switching optimization profile from: 0 to 6. Please ensure there are no enqueued operations pending in this context prior to switching profiles"

If we use custom model built for A100, we need to either add new script in gen_qa_model_repository or only allow test to run on A100 GPU.

Actually, I loaded all plan models from /data/inferenceserver and none shows non-zero allocation size in line [MemUsageChange] (even for the large models). I discussed with Anmol Gupta and this might be specific to how their test works. I guess it's not easy to prove new allocation strategy is passed in engine_->createExecutionContext(model_state_->AllocationStrategy()).

Anmol Gupta has confirmed the change works. See #108 (comment). Can we merge the existing test triton-inference-server/server#8150 to main?

Sure, we can go with the manual confirmation from Anmol for now, as they were the requester and we don't want to delay getting this in.

Thanks. I have ask Anmol to try to generate a sample TRT model using our gen_qa_model_repository scirpt and provided him with instructions. He will try later and let me know if it works or not.

In case Anmol doesn't have a simple one, something like trtexec --onnx model.onnx --saveEngine model.plan on the Densenet ONNX model we use for quickstart might work:

https://github.com/triton-inference-server/server/tree/main/docs/examples/model_repository/densenet_onnx

https://github.com/triton-inference-server/server/blob/20b8dfff629dde9aebffb4f4e85406246337afb0/docs/examples/fetch_models.sh#L44-L47

Initial commit

4cb3091

yinggeh added the enhancement New feature or request label Apr 16, 2025

yinggeh self-assigned this Apr 16, 2025

yinggeh mentioned this pull request Apr 16, 2025

test: Add config parameter "execution_context_allocation_strategy" to TensorRT backend triton-inference-server/server#8150

Merged

11 tasks

yinggeh requested a review from rmccorm4 April 16, 2025 18:00

rmccorm4 reviewed Apr 16, 2025

View reviewed changes

src/model_state.cc Outdated Show resolved Hide resolved

rmccorm4 reviewed Apr 16, 2025

View reviewed changes

src/model_state.h Show resolved Hide resolved

rmccorm4 reviewed Apr 16, 2025

View reviewed changes

src/instance_state.cc Outdated Show resolved Hide resolved

rmccorm4 requested a review from tanmayv25 April 16, 2025 20:44

Minor fix

351d027

yinggeh commented Apr 16, 2025

View reviewed changes

src/instance_state.cc Show resolved Hide resolved

yinggeh requested a review from rmccorm4 April 16, 2025 22:12

yinggeh commented Apr 16, 2025

View reviewed changes

src/instance_state.cc Outdated Show resolved Hide resolved

rmccorm4 reviewed Apr 16, 2025

View reviewed changes

yinggeh marked this pull request as draft April 17, 2025 18:12

yinggeh added 2 commits April 17, 2025 11:33

Minor update

56e09e9

Sync forked PR

65d9d8b

yinggeh closed this Apr 18, 2025

yinggeh deleted the yinggeh-DLIS-8281-allocation-strategy-parameter branch June 18, 2025 20:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Parameterize TensorRT allocation strategy #109

feat: Parameterize TensorRT allocation strategy #109

Uh oh!

yinggeh commented Apr 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rmccorm4 Apr 16, 2025

Uh oh!

yinggeh Apr 18, 2025

Uh oh!

yinggeh Apr 18, 2025

Uh oh!

rmccorm4 Apr 18, 2025 •

edited

Loading

Uh oh!

yinggeh Apr 18, 2025

Uh oh!

rmccorm4 Apr 18, 2025 •

edited

Loading

Uh oh!

Uh oh!

feat: Parameterize TensorRT allocation strategy #109

feat: Parameterize TensorRT allocation strategy #109

Uh oh!

Conversation

yinggeh commented Apr 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rmccorm4 Apr 16, 2025

Choose a reason for hiding this comment

Uh oh!

yinggeh Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

yinggeh Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

rmccorm4 Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yinggeh Apr 18, 2025

Choose a reason for hiding this comment

Uh oh!

rmccorm4 Apr 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yinggeh commented Apr 16, 2025 •

edited

Loading

rmccorm4 Apr 18, 2025 •

edited

Loading

rmccorm4 Apr 18, 2025 •

edited

Loading