Replies: 1 comment
-
I don't believe that's an option. Have you tried unloading all models when you don't need them? That should get you to the smallest memory footprint. You'll then need to load the model again when the next request comes in. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I currently proceeded to autoscaling the triton server using minikube, but I want to do zero scaling and together autoscaling.
I want to zero scale my tritonserver. Is it possible?
The zero scaling I mentioned here means that the triton server is in a standby state and does not occupy memory when there is no inference request. Like Kserve
if possible please tell me how
thank you~
Beta Was this translation helpful? Give feedback.
All reactions