You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
codificat
changed the title
What is the resource requirement of the deployed model? Explain the resources defined for the model pod. What is the throughput of the model? How can we increase the throughput? Given a combination of hardware, model type, and optimization techniques, what can be the maximum expected and observed throughput?
Optimizing LLMs for max performance when serving on ODH
May 16, 2023
@codificat
Hi i'm the maintainer of LiteLLM and we allow you to max throughput by load balancing between multiple LLM endpoints.
Thought it would be useful for you, I'd love feedback if not
What is the resource requirement of the deployed model? Explain the resources defined for the model pod.
What is the throughput of the model? How can we increase the throughput?
Given a combination of hardware, model type, and optimization techniques, what can be the maximum expected and observed throughput?
The text was updated successfully, but these errors were encountered: