-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Configure fallback LLMs for rate limit handling #5494
Conversation
- Add fallback_llms field to LLMConfig - Implement automatic switching to fallback LLMs on rate limits - Add automatic reset when rate limit expires - Add unit tests for fallback functionality Fixes #1263
resp: ModelResponse = self._completion_unwrapped(*args, **kwargs) | ||
try: | ||
resp: ModelResponse = self._completion_unwrapped(*args, **kwargs) | ||
except RateLimitError as e: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small thing here, there already is a try / except, it seems we're adding another? Or does github interface fool me 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha yeah there is already a try/except; I was experimenting with OH and wanted to push it beyond its comfort zone to see if it implement weird things 😆
if self._fallback_llms: | ||
# Extract wait time from error message | ||
wait_time = None | ||
if 'Please try again in' in str(e): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On what LLMs does this work? It really seems like something that litellm would support, it does a lot of string matching on a 100+ LLMs. For example, it does this when it matches various exception strings to a single unified exception that it returns to us (like ContextWindowExceeded across many providers)
Last I checked, they do have fallback LLMs. It doesn't (or didn't? maybe they do now?) do this bit - it doesn't read the response (which may also be in part outside the exception, in the usage field or something, depends on LLM provider)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes they do!
Related issues:
|
Closing this PR as implementation is subpar |
This PR adds support for configuring fallback LLMs that are automatically used when rate limits are hit.
Changes
fallback_llms
field toLLMConfig
to support a list of fallback LLM configurationsUsage Example
The system will automatically switch between these LLMs when rate limits are hit, and return to the primary LLM when the rate limit expires.
Fixes #1263
To run this PR locally, use the following command: