Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing LLMs for max performance when serving on ODH #48

Open
Tracked by #37
codificat opened this issue May 16, 2023 · 1 comment
Open
Tracked by #37

Optimizing LLMs for max performance when serving on ODH #48

codificat opened this issue May 16, 2023 · 1 comment

Comments

@codificat
Copy link
Member

codificat commented May 16, 2023

What is the resource requirement of the deployed model? Explain the resources defined for the model pod.

What is the throughput of the model? How can we increase the throughput?

Given a combination of hardware, model type, and optimization techniques, what can be the maximum expected and observed throughput?

@codificat codificat changed the title What is the resource requirement of the deployed model? Explain the resources defined for the model pod. What is the throughput of the model? How can we increase the throughput? Given a combination of hardware, model type, and optimization techniques, what can be the maximum expected and observed throughput? Optimizing LLMs for max performance when serving on ODH May 16, 2023
@ishaan-jaff
Copy link

@codificat
Hi i'm the maintainer of LiteLLM and we allow you to max throughput by load balancing between multiple LLM endpoints.
Thought it would be useful for you, I'd love feedback if not

Here's the quick start, to use LiteLLM load balancer (works with 100+ LLMs)
doc: https://docs.litellm.ai/docs/simple_proxy#model-alias

Step 1 Create a Config.yaml

model_list:
- model_name: openhermes
  litellm_params:
      model: openhermes
      temperature: 0.6
      max_tokens: 400
      custom_llm_provider: "openai"
      api_base: http://192.168.1.23:8000/v1
- model_name: openhermes
  litellm_params:
      model: openhermes
      custom_llm_provider: "openai"
      api_base: http://192.168.1.23:8001/v1
- model_name: openhermes
  litellm_params:
      model: openhermes
      custom_llm_provider: "openai"
      frequency_penalty : 0.6
      api_base: http://192.168.1.23:8010/v1

Step 2: Start the litellm proxy:

litellm --config /path/to/config.yaml

Step3 Make Request to LiteLLM proxy:

curl --location 'http://0.0.0.0:8000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
      "model": "openhermes",
      "messages": [
        {
          "role": "user",
          "content": "what llm are you"
        }
      ],
    }
'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants