feat: Configure fallback LLMs for rate limit handling #5494

malhotra5 · 2024-12-09T22:02:00Z

This PR adds support for configuring fallback LLMs that are automatically used when rate limits are hit.

Changes

Add fallback_llms field to LLMConfig to support a list of fallback LLM configurations
Implement automatic switching to fallback LLMs when rate limits are hit
Add automatic reset to primary LLM when rate limit expires
Add unit tests to verify the functionality

Usage Example

[llm]
model = "claude-3-5-sonnet-20241022"
api_key = "..."

[[llm.fallback_llms]]
model = "gpt-4"
api_key = "..."

[[llm.fallback_llms]]
model = "llama2-70b"
api_key = "..."

The system will automatically switch between these LLMs when rate limits are hit, and return to the primary LLM when the rate limit expires.

Fixes #1263

To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:da0d740-nikolaik   --name openhands-app-da0d740   docker.all-hands.dev/all-hands-ai/openhands:da0d740

- Add fallback_llms field to LLMConfig - Implement automatic switching to fallback LLMs on rate limits - Add automatic reset when rate limit expires - Add unit tests for fallback functionality Fixes #1263

enyst · 2024-12-10T02:53:45Z

openhands/llm/llm.py

-                resp: ModelResponse = self._completion_unwrapped(*args, **kwargs)
+                try:
+                    resp: ModelResponse = self._completion_unwrapped(*args, **kwargs)
+                except RateLimitError as e:


Just a small thing here, there already is a try / except, it seems we're adding another? Or does github interface fool me 😅

Haha yeah there is already a try/except; I was experimenting with OH and wanted to push it beyond its comfort zone to see if it implement weird things 😆

enyst · 2024-12-10T02:58:38Z

openhands/llm/llm.py

+                    if self._fallback_llms:
+                        # Extract wait time from error message
+                        wait_time = None
+                        if 'Please try again in' in str(e):


On what LLMs does this work? It really seems like something that litellm would support, it does a lot of string matching on a 100+ LLMs. For example, it does this when it matches various exception strings to a single unified exception that it returns to us (like ContextWindowExceeded across many providers)

Last I checked, they do have fallback LLMs. It doesn't (or didn't? maybe they do now?) do this bit - it doesn't read the response (which may also be in part outside the exception, in the usage field or something, depends on LLM provider)

Ah yes they do!

enyst · 2024-12-10T03:04:00Z

Related issues:

do fallbacks via litellm Use litellm Router for rate limiting and/or fallback LLMs #4056
switch automatically [LLM] Support LLM routing through notdiamond #4184
or do it ourselves (this PR)

malhotra5 · 2024-12-10T03:38:54Z

Closing this PR as implementation is subpar

feat: Configure fallback LLMs for rate limit handling

da0d740

- Add fallback_llms field to LLMConfig - Implement automatic switching to fallback LLMs on rate limits - Add automatic reset when rate limit expires - Add unit tests for fallback functionality Fixes #1263

malhotra5 added the lint-fix label Dec 9, 2024

🤖 Auto-fix Python linting issues

bba0311

malhotra5 requested a review from xingyaoww December 9, 2024 22:04

malhotra5 removed the lint-fix label Dec 9, 2024

enyst reviewed Dec 10, 2024

View reviewed changes

malhotra5 closed this Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Configure fallback LLMs for rate limit handling #5494

feat: Configure fallback LLMs for rate limit handling #5494

malhotra5 commented Dec 9, 2024 •

edited by github-actions bot

Loading

enyst Dec 10, 2024

malhotra5 Dec 10, 2024

enyst Dec 10, 2024

malhotra5 Dec 10, 2024

enyst commented Dec 10, 2024

malhotra5 commented Dec 10, 2024

feat: Configure fallback LLMs for rate limit handling #5494

feat: Configure fallback LLMs for rate limit handling #5494

Conversation

malhotra5 commented Dec 9, 2024 • edited by github-actions bot Loading

Changes

Usage Example

enyst Dec 10, 2024

Choose a reason for hiding this comment

malhotra5 Dec 10, 2024

Choose a reason for hiding this comment

enyst Dec 10, 2024

Choose a reason for hiding this comment

malhotra5 Dec 10, 2024

Choose a reason for hiding this comment

enyst commented Dec 10, 2024

malhotra5 commented Dec 10, 2024

malhotra5 commented Dec 9, 2024 •

edited by github-actions bot

Loading