Optimizing LLMs for max performance when serving on ODH #48

codificat · 2023-05-16T16:44:47Z

What is the resource requirement of the deployed model? Explain the resources defined for the model pod.

What is the throughput of the model? How can we increase the throughput?

Given a combination of hardware, model type, and optimization techniques, what can be the maximum expected and observed throughput?

ishaan-jaff · 2023-11-23T00:45:07Z

@codificat
Hi i'm the maintainer of LiteLLM and we allow you to max throughput by load balancing between multiple LLM endpoints.
Thought it would be useful for you, I'd love feedback if not

Here's the quick start, to use LiteLLM load balancer (works with 100+ LLMs)
doc: https://docs.litellm.ai/docs/simple_proxy#model-alias

Step 1 Create a Config.yaml

model_list:
- model_name: openhermes
  litellm_params:
      model: openhermes
      temperature: 0.6
      max_tokens: 400
      custom_llm_provider: "openai"
      api_base: http://192.168.1.23:8000/v1
- model_name: openhermes
  litellm_params:
      model: openhermes
      custom_llm_provider: "openai"
      api_base: http://192.168.1.23:8001/v1
- model_name: openhermes
  litellm_params:
      model: openhermes
      custom_llm_provider: "openai"
      frequency_penalty : 0.6
      api_base: http://192.168.1.23:8010/v1

Step 2: Start the litellm proxy:

litellm --config /path/to/config.yaml

Step3 Make Request to LiteLLM proxy:

curl --location 'http://0.0.0.0:8000/chat/completions' \
--header 'Content-Type: application/json' \
--data ' {
      "model": "openhermes",
      "messages": [
        {
          "role": "user",
          "content": "what llm are you"
        }
      ],
    }
'

codificat mentioned this issue May 16, 2023

[EPIC] Serving Foundation Models #37

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizing LLMs for max performance when serving on ODH #48

Optimizing LLMs for max performance when serving on ODH #48

codificat commented May 16, 2023 •

edited

Loading

ishaan-jaff commented Nov 23, 2023

Optimizing LLMs for max performance when serving on ODH #48

Optimizing LLMs for max performance when serving on ODH #48

Comments

codificat commented May 16, 2023 • edited Loading

ishaan-jaff commented Nov 23, 2023

Step 1 Create a Config.yaml

Step 2: Start the litellm proxy:

Step3 Make Request to LiteLLM proxy:

codificat commented May 16, 2023 •

edited

Loading