feat: Ensemble async callback execution (rework) #438

yinggeh · 2025-05-14T21:52:50Z

What does the PR do?

Reduce e2e latency in ensemble model by executing callbacks asynchronously at the end of each ensemble step. Excluding models that require responses to have the same order of requests.

Improvement: maximum throughput of sample ensemble model increased from 39k infer/sec to 50k infer/sec.

Checklist

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

feat

Related PRs:

triton-inference-server/common#133
Previous PR: #429

Where should the reviewer start?

Reviewer should start from the second commit.
Attention to the preserve_responses_order logic.

Test plan:

L0_simple_ensemble
L0_sequence_batcher
L0_backend_python

CI Pipeline ID:
28454142

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #7650

…" (#436)" This reverts commit 109c69f.

…nses order.

yinggeh added 2 commits May 14, 2025 05:37

Revert "Revert "feat: Ensemble asynchronous callback executions (#429)…

c01d4cc

…" (#436)" This reverts commit 109c69f.

Execute callbacks synchronously for models required to preserve respo…

439826a

…nses order.

yinggeh self-assigned this May 14, 2025

yinggeh added the PR: feat A new feature label May 14, 2025

yinggeh requested review from tanmayv25, GuanLuo and ziqif-nv May 14, 2025 21:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Ensemble async callback execution (rework) #438

feat: Ensemble async callback execution (rework) #438

yinggeh commented May 14, 2025 •

edited

Loading

feat: Ensemble async callback execution (rework) #438

Are you sure you want to change the base?

feat: Ensemble async callback execution (rework) #438

Conversation

yinggeh commented May 14, 2025 • edited Loading

What does the PR do?

Checklist

Commit Type:

Related PRs:

Where should the reviewer start?

Test plan:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

yinggeh commented May 14, 2025 •

edited

Loading