Why I still perform inference while triton server is not ready? #5441
-
Description Triton Information
Are you using the Triton container or did you build it yourself? To Reproduce
Expected behavior Actual behavior |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
In polling mode, model reloads should not result in loss of availability. You can read more here. The documentation also discusses how all new requests will be routed to the new model, assuming loading succeeds. As far as the is_server_ready() flags, that is unexpected. The model shouldn't be become "not ready" during reloading. However, polling can be non-atomic in loading/unloading, which is why we recommend EXPLICIT mode for running in production. That gives you greater control over model behavior. Can you try running the index API to see what models are listed as ready or not ready? You may also be able to look at the verbose logs for context. This will help make sure that what's being marked as not ready is definitely the model being reloaded. I'd recommend these steps:
CC: @GuanLuo |
Beta Was this translation helpful? Give feedback.
-
Thank you for your response, I will try your suggestion! |
Beta Was this translation helpful? Give feedback.
In polling mode, model reloads should not result in loss of availability. You can read more here. The documentation also discusses how all new requests will be routed to the new model, assuming loading succeeds.
As far as the is_server_ready() flags, that is unexpected. The model shouldn't be become "not ready" during reloading. However, polling can be non-atomic in loading/unloading, which is why we recommend EXPLICIT mode for running in production. That gives you greater control over model behavior.
Can you try running the index API to see what models are listed as ready or not ready? You may also be able to look at the verbose logs for context. This will help make sure that what's bein…