Is zero-scaling possible in Triton Server? #5592

kwanwoo02 · 2023-04-05T09:02:31Z

kwanwoo02
Apr 5, 2023

I currently proceeded to autoscaling the triton server using minikube, but I want to do zero scaling and together autoscaling.
I want to zero scale my tritonserver. Is it possible?
The zero scaling I mentioned here means that the triton server is in a standby state and does not occupy memory when there is no inference request. Like Kserve

if possible please tell me how

thank you~

dyastremsky · 2023-04-07T17:00:00Z

dyastremsky
Apr 7, 2023
Collaborator

I don't believe that's an option. Have you tried unloading all models when you don't need them? That should get you to the smallest memory footprint. You'll then need to load the model again when the next request comes in.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is zero-scaling possible in Triton Server? #5592

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Is zero-scaling possible in Triton Server? #5592

kwanwoo02 Apr 5, 2023

Replies: 1 comment

dyastremsky Apr 7, 2023 Collaborator

kwanwoo02
Apr 5, 2023

dyastremsky
Apr 7, 2023
Collaborator