[LLM] Support LLM routing through notdiamond #4184

xingyaoww · 2024-10-03T03:45:14Z

What problem or use case are you trying to solve?

Not Diamond intelligently identifies which LLM is best-suited to respond to any given query. We want to implement a mechanism in OpenHands to support this type of "LLM" selector.

Describe the UX of the solution you'd like

Ideally, use should define a "LLMRouter" as a special type of LLM with some special configs (e.g., multiple keys for different providers). And user can just put in keys, and select that router, and OpenHands will automatically use that going forward.

Do you have thoughts on the technical implementation?

Modify https://github.com/All-Hands-AI/OpenHands/blob/main/openhands/llm/llm.py, as well as config related files under https://github.com/All-Hands-AI/OpenHands/tree/main/openhands/core/config.

You should probably use model_select (from notdiamond API) rather than create to be compatible with existing LiteLLM calls.

Describe alternatives you've considered

Additional context

Here's the documentation from NotDiamond

# -*- coding: utf-8 -*-
"""Getting started with Not Diamond

Automatically generated by Colab.

Original file is located at
    https://colab.research.google.com/drive/1Ao-YhYF_S6QP5UGp_kYhgKps_Sw3a2RO

# **Setting up**
"""

!pip install -q notdiamond[create] --upgrade

import os

os.environ["NOTDIAMOND_API_KEY"] = 'YOUR_NOTDIAMOND_API_KEY'
os.environ["OPENAI_API_KEY"] = 'YOUR_OPENAI_API_KEY'
os.environ["ANTHROPIC_API_KEY"] = 'YOUR_ANTHROPIC_API_KEY'
os.environ["PPLX_API_KEY"] = 'YOUR_PERPLEXITY_API_KEY'

"""# **Automatic routing and model calling**

We'll start by defining the routing `NotDiamond` client, which will function like a 'meta-LLM' that ensembles together the best of multiple models.

We then add specific LLM targets for routing. Not Diamond works with any LLM in [this list](https://notdiamond.readme.io/docs/llm-models).
"""

from notdiamond import NotDiamond

client = NotDiamond()

llm_providers = [
    'openai/gpt-4o',
    'anthropic/claude-3-5-sonnet-20240620',
    'openai/gpt-4o-mini',
    'perplexity/llama-3.1-sonar-large-128k-online'
]

"""Next, let's call the client by passing in an array of messages and our target models:"""

result, session_id, provider = client.chat.completions.create(
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Concisely explain merge sort."}  # Adjust as desired
    ],
  	model=llm_providers
)

print("LLM called: \n", provider.model)
print("\nLLM output: \n", result.content)

"""If we pass in a different question, `NotDiamond` will recommend a different model:"""

result, session_id, provider = client.chat.completions.create(
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What's the weather like in SF today?"}
    ],
  	model=llm_providers
)

print("LLM called: \n", provider.model)
print("\nLLM output: \n", result.content)

"""# **Defining tradeoffs**

We can also define tradeoffs for cost or latency to adjust routing to our preferences:
"""

result, session_id, provider = client.chat.completions.create(
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Concisely explain merge sort."}
    ],
  	model=llm_providers,
  	tradeoff="cost" # Consider cheaper models when quality loss is negligible. Alternatively, you can use "latency".
)

print("LLM called: \n", provider.model)
print("\nLLM output: \n", result.content)

"""# **Routing recommendations with `model_select`**

Finally, we can also use `model_select` to return a recommended LLM for your prompt. You can then invoke that LLM using your own custom logic.
"""

session_id, provider = client.chat.completions.model_select(
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Concisely explain merge sort."}
    ],
  	model=llm_providers
)

## Add more application logic here. e.g.:

print(f"LLM called: \n{provider.model}\n", )

match provider.model:
    case "claude-3-5-sonnet-20240620":
        print("Running custom_sonnet_3-5_invoke()...")
        # custom_sonnet_35_invoke()
    case 'gpt-4o':
        print("Running custom_gpt_4o_invoke()...")
        # custom_gpt_4o_invoke()
    case 'llama-3-sonar-large-32k-online':
        print("Running custom_pplx_llama3_invoke()...")
        # custom_pplx_llama3_invoke()
    case 'gpt-4o-mini':
        print("Running custom_gpt_4o_mini_invoke()...")
        # custom_gpt_4o_mini_invoke()

The text was updated successfully, but these errors were encountered:

github-actions · 2024-10-03T03:45:37Z

OpenHands started fixing the issue! You can monitor the progress here.

tobitege · 2024-10-03T06:18:06Z

@xingyaoww - also see #4109 where litellm's Router is being incorporated and also a config structure that could maybe used here

github-actions · 2024-10-03T11:36:36Z

OpenHands started fixing the issue! You can monitor the progress here.

github-actions · 2024-10-03T11:44:22Z

An attempt was made to automatically fix this issue, but it was unsuccessful. A branch named 'openhands-fix-issue-4184' has been created with the attempted changes. You can view the branch here. Manual intervention may be required.

neubig · 2024-10-03T12:03:27Z

Quick point of discussion: do we want to implement this within OpenHands? Or should we host a server with the router, like we host our proxy server for All Hands AI?

Personally I think the latter might be better. Doing this on the client side means that users have to acquire several different API keys and somehow configure them. This seems like a pain UI-wise, especially given that currently our configuration behavior is hard to understand: #3220

xingyaoww · 2024-10-03T12:54:05Z

Good point - but another thing is it might be tricky to calculate costs (especially with all the prompt caching and stuff.. for the router than :(.

Another potential idea is to do this with LiteLLM router 🤔
https://docs.litellm.ai/docs/routing#advanced---routing-strategies-%EF%B8%8F

neubig · 2024-10-03T12:59:05Z

Yeah, maybe NotDiamond could be implemented as a custom routing strategy within the LiteLLM proxy?

xingyaoww · 2024-10-03T13:14:45Z

yeah seems like a better approach (if we can get the cost propagation to work correctly). Close this for now then

acompa · 2024-10-03T17:27:00Z

Hi @xingyaoww @neubig, just caught this issue.

While our LLMConfigs accept prices, they only help tune cost tradeoffs. You won't have to provide that parameter for public models - we track prices for every model we support.

Beyond this, we're also happy to help you set up a routing integration with Not Diamond's API. Just let me know if that interests you.

As for LiteLLM, we've actually been discussing an integration with them since July! While waiting on their feedback, we've also implemented a simple integration in our Python client which might help you.

neubig · 2024-10-03T17:38:03Z

Thanks @acompa , I do think we'd be interested in at least running an evaluation where we use NotDiamond as a backend and see if the results are better/cheaper than what we get now. If your API offers OpenAI compatible endpoints it should be pretty easy (we haven't looked super-carefully yet).

acompa · 2024-10-03T17:45:56Z

Thanks @acompa , I do think we'd be interested in at least running an evaluation where we use NotDiamond as a backend and see if the results are better/cheaper than what we get now. If your API offers OpenAI compatible endpoints it should be pretty easy (we haven't looked super-carefully yet).

We do accept OpenAI-style requests with messages at our model_select endpoint. We're not a proxy, though, so at the moment we only support create via a Langchain integration.

neubig · 2024-10-03T17:47:50Z

Cool, thanks! I'll re-open this as I think that whatever way we implement it'd be interesting to see if model routing helps.

acompa · 2024-10-04T18:43:06Z

Excellent. As you begin your evaluation, note that we offer two approaches to AI model routing:

Our out-of-the-box router has been trained on generalist, cross-domain data (including coding and non-coding tasks) to provide a strong "multi-model" multidisciplinary experience.

Secondly, OpenHands focuses on development applications, and so you might benefit from specialized routing trained on the distribution of your proprietary data. We also offer custom routing to serve these types of domain-targeted use cases as a higher-performance option beyond out-of-the-box routing.

We're happy to answer questions or support you in whichever of these approaches you evaluate.

tobitege · 2024-10-05T13:11:41Z

@neubig we could also look into the https://github.com/Not-Diamond/RoRF/ repo to start with (pair-wise routing) to start with?

github-actions · 2024-11-05T01:57:57Z

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

neubig · 2024-11-05T02:20:25Z

I think the NotDiamond folks are working on this still.

github-actions · 2024-12-06T02:05:06Z

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions · 2024-12-16T02:07:26Z

This issue was closed because it has been stalled for over 30 days with no activity.

xingyaoww · 2024-12-16T03:43:54Z

I think this is still in progress?

neubig · 2024-12-16T04:11:08Z

Yes, it is!

xingyaoww added enhancement New feature or request fix-me Attempt to fix this issue with OpenHands labels Oct 3, 2024

openhands-agent added a commit that referenced this issue Oct 3, 2024

Fix issue #4184: '[LLM] Support LLM routing through notdiamond'

4fb3b0d

xingyaoww closed this as completed Oct 3, 2024

neubig reopened this Oct 3, 2024

github-actions bot added the Stale Inactive for 30 days label Nov 5, 2024

neubig removed the Stale Inactive for 30 days label Nov 5, 2024

github-actions bot added the Stale Inactive for 30 days label Dec 6, 2024

xingyaoww mentioned this issue Dec 8, 2024

Summary / auxiliary LLM #5464

Open

enyst mentioned this issue Dec 10, 2024

feat: Configure fallback LLMs for rate limit handling #5494

Closed

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Dec 16, 2024

xingyaoww reopened this Dec 16, 2024

github-actions bot removed the Stale Inactive for 30 days label Dec 30, 2024

enyst mentioned this issue Jan 2, 2025

[Resolver] Unhandled RateLimitError when calling litellm.completion in issue_definitions.py #5030

Closed

neubig added this to OpenHands Roadmap Jan 17, 2025

neubig added this to the 2025-01 milestone Jan 17, 2025

neubig assigned ryanhoangt Jan 17, 2025

rbren modified the milestones: 2025-01, 2025-02 Jan 31, 2025

rbren moved this to In Progress in OpenHands Roadmap Feb 14, 2025

rbren modified the milestones: 2025-02, 2025-03 Feb 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LLM] Support LLM routing through notdiamond #4184

[LLM] Support LLM routing through notdiamond #4184

xingyaoww commented Oct 3, 2024

github-actions bot commented Oct 3, 2024

tobitege commented Oct 3, 2024

github-actions bot commented Oct 3, 2024

github-actions bot commented Oct 3, 2024

neubig commented Oct 3, 2024

xingyaoww commented Oct 3, 2024

neubig commented Oct 3, 2024

xingyaoww commented Oct 3, 2024

acompa commented Oct 3, 2024

neubig commented Oct 3, 2024

acompa commented Oct 3, 2024

neubig commented Oct 3, 2024

acompa commented Oct 4, 2024

tobitege commented Oct 5, 2024

github-actions bot commented Nov 5, 2024

neubig commented Nov 5, 2024

github-actions bot commented Dec 6, 2024

github-actions bot commented Dec 16, 2024

xingyaoww commented Dec 16, 2024

neubig commented Dec 16, 2024

[LLM] Support LLM routing through notdiamond #4184

[LLM] Support LLM routing through notdiamond #4184

Comments

xingyaoww commented Oct 3, 2024

github-actions bot commented Oct 3, 2024

tobitege commented Oct 3, 2024

github-actions bot commented Oct 3, 2024

github-actions bot commented Oct 3, 2024

neubig commented Oct 3, 2024

xingyaoww commented Oct 3, 2024

neubig commented Oct 3, 2024

xingyaoww commented Oct 3, 2024

acompa commented Oct 3, 2024

neubig commented Oct 3, 2024

acompa commented Oct 3, 2024

neubig commented Oct 3, 2024

acompa commented Oct 4, 2024

tobitege commented Oct 5, 2024

github-actions bot commented Nov 5, 2024

neubig commented Nov 5, 2024

github-actions bot commented Dec 6, 2024

github-actions bot commented Dec 16, 2024

xingyaoww commented Dec 16, 2024

neubig commented Dec 16, 2024