You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, my question is about using the Triton client within a FastAPI server to send requests downstream to Triton. Is it recommended to create a single instance of the Triton client and re-use it for each request? Or should we create a new instance of the Triton client for each new request?
Asking because for streaming requests, the grpc InferenceServerClient only supports one stream at a time. Please let me know and thanks in advance!
The text was updated successfully, but these errors were encountered:
Hi, my question is about using the Triton client within a FastAPI server to send requests downstream to Triton. Is it recommended to create a single instance of the Triton client and re-use it for each request? Or should we create a new instance of the Triton client for each new request?
Asking because for streaming requests, the grpc InferenceServerClient only supports one stream at a time. Please let me know and thanks in advance!
The text was updated successfully, but these errors were encountered: