Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Configure fallback LLMs for rate limit handling #5494

Closed
wants to merge 2 commits into from

Conversation

malhotra5
Copy link
Contributor

@malhotra5 malhotra5 commented Dec 9, 2024

This PR adds support for configuring fallback LLMs that are automatically used when rate limits are hit.

Changes

  • Add fallback_llms field to LLMConfig to support a list of fallback LLM configurations
  • Implement automatic switching to fallback LLMs when rate limits are hit
  • Add automatic reset to primary LLM when rate limit expires
  • Add unit tests to verify the functionality

Usage Example

[llm]
model = "claude-3-5-sonnet-20241022"
api_key = "..."

[[llm.fallback_llms]]
model = "gpt-4"
api_key = "..."

[[llm.fallback_llms]]
model = "llama2-70b"
api_key = "..."

The system will automatically switch between these LLMs when rate limits are hit, and return to the primary LLM when the rate limit expires.

Fixes #1263


To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:da0d740-nikolaik   --name openhands-app-da0d740   docker.all-hands.dev/all-hands-ai/openhands:da0d740

- Add fallback_llms field to LLMConfig
- Implement automatic switching to fallback LLMs on rate limits
- Add automatic reset when rate limit expires
- Add unit tests for fallback functionality

Fixes #1263
@malhotra5 malhotra5 requested a review from xingyaoww December 9, 2024 22:04
@malhotra5 malhotra5 removed the lint-fix label Dec 9, 2024
resp: ModelResponse = self._completion_unwrapped(*args, **kwargs)
try:
resp: ModelResponse = self._completion_unwrapped(*args, **kwargs)
except RateLimitError as e:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small thing here, there already is a try / except, it seems we're adding another? Or does github interface fool me 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha yeah there is already a try/except; I was experimenting with OH and wanted to push it beyond its comfort zone to see if it implement weird things 😆

if self._fallback_llms:
# Extract wait time from error message
wait_time = None
if 'Please try again in' in str(e):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On what LLMs does this work? It really seems like something that litellm would support, it does a lot of string matching on a 100+ LLMs. For example, it does this when it matches various exception strings to a single unified exception that it returns to us (like ContextWindowExceeded across many providers)

Last I checked, they do have fallback LLMs. It doesn't (or didn't? maybe they do now?) do this bit - it doesn't read the response (which may also be in part outside the exception, in the usage field or something, depends on LLM provider)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes they do!

@enyst
Copy link
Collaborator

enyst commented Dec 10, 2024

Related issues:

@malhotra5
Copy link
Contributor Author

Closing this PR as implementation is subpar

@malhotra5 malhotra5 closed this Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

(feat) Configure fallback llm's in case of rate limit errors
3 participants