Why I still perform inference while triton server is not ready? #5441

ptran1203 · 2023-03-01T04:38:45Z

ptran1203
Mar 1, 2023

Description
According to this link . If triton model is not ready, I shouldn't able to send inference request to server, But when I tried to send Inference request despite of readiness is False (grpcclient.InferenceServerClient.is_server_ready() return False), the server still response me the prediction.

Triton Information
What version of Triton are you using?

22.08

Are you using the Triton container or did you build it yourself?
I use Triton container: nvcr.io/nvidia/tritonserver:22.08-py3

To Reproduce

I use model-control-mode=poll
Assume my model is loaded and Server is performing inference
I make a change on my model repository config (i.e: version 1 to version 2)
When I call grpcclient.InferenceServerClient.is_server_ready(), it returns False

Expected behavior
When server readiness is False, I shouldn't be able to send inference requests

Actual behavior
The server still response me the prediction

Answered by dyastremsky

Mar 1, 2023

In polling mode, model reloads should not result in loss of availability. You can read more here. The documentation also discusses how all new requests will be routed to the new model, assuming loading succeeds.

As far as the is_server_ready() flags, that is unexpected. The model shouldn't be become "not ready" during reloading. However, polling can be non-atomic in loading/unloading, which is why we recommend EXPLICIT mode for running in production. That gives you greater control over model behavior.

Can you try running the index API to see what models are listed as ready or not ready? You may also be able to look at the verbose logs for context. This will help make sure that what's bein…

View full answer

dyastremsky · 2023-03-01T19:13:31Z

dyastremsky
Mar 1, 2023
Collaborator

In polling mode, model reloads should not result in loss of availability. You can read more here. The documentation also discusses how all new requests will be routed to the new model, assuming loading succeeds.

As far as the is_server_ready() flags, that is unexpected. The model shouldn't be become "not ready" during reloading. However, polling can be non-atomic in loading/unloading, which is why we recommend EXPLICIT mode for running in production. That gives you greater control over model behavior.

Can you try running the index API to see what models are listed as ready or not ready? You may also be able to look at the verbose logs for context. This will help make sure that what's being marked as not ready is definitely the model being reloaded.

I'd recommend these steps:

Start Triton
Check that server is ready
Change model version
Check index to see what models ready/not ready
Check whether server is ready

CC: @GuanLuo

0 replies

ptran1203 · 2023-03-02T12:55:55Z

ptran1203
Mar 2, 2023
Author

Thank you for your response, I will try your suggestion!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why I still perform inference while triton server is not ready? #5441

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Why I still perform inference while triton server is not ready? #5441

ptran1203 Mar 1, 2023

Replies: 2 comments

dyastremsky Mar 1, 2023 Collaborator

ptran1203 Mar 2, 2023 Author

ptran1203
Mar 1, 2023

dyastremsky
Mar 1, 2023
Collaborator

ptran1203
Mar 2, 2023
Author