Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Resolver] Unhandled RateLimitError when calling litellm.completion in issue_definitions.py #5030

Closed
neubig opened this issue Nov 15, 2024 · 6 comments
Labels
bug Something isn't working resolver Related to OpenHands Resolver Stale Inactive for 30 days

Comments

@neubig
Copy link
Contributor

neubig commented Nov 15, 2024

Unhandled RateLimitError when calling litellm.completion in issue_definitions.py

Description

When running resolve_issue.py, the script throws multiple errors due to an unhandled RateLimitError from the Anthropic API. This occurs during the call to litellm.completion in the guess_success method of issue_definitions.py. The error indicates that the number of tokens has exceeded the per-minute rate limit imposed by the Anthropic API.

Context

  • Anthropic API Tier: The issue occurs on Tier 1 of the Anthropic API, which has a rate limit of 50 requests per minute.
  • Documentation Reference: See Anthropic API rate limits for more details.

Steps to Reproduce

  1. Use the Anthropic API with a Tier 1 account (50 requests per minute limit).
  2. Run OpenHands-resolver GitHub action workflow from the examples directory to resolve the issue.
  3. Observe that the script throws multiple 429 errors RateLimitError from the LLM.

Expected Behavior

The app should handle the RateLimitError gracefully by:

  • Catching the exception and implementing a retry mechanism with exponential backoff or appropriate delay.
  • Providing a clear and user-friendly error message.
  • Adjusting the rate of API requests to comply with the Anthropic API rate limits.

Actual Behavior

The app crashes and outputs the following error stack trace:

Error Logs

09:27:25 - openhands:INFO: resolve_issue.py:446 - Finished.
ERROR:root:  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/openhands_resolver/resolve_issue.py", line 609, in <module>
    main()
  File "/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/openhands_resolver/resolve_issue.py", line 589, in main
    asyncio.run(
  File "/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/asyncio/runners.py", line 194, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/asyncio/base_events.py", line 687, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/openhands_resolver/resolve_issue.py", line 429, in resolve_issue
    output = await process_issue(
             ^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/openhands_resolver/resolve_issue.py", line 255, in process_issue
    success, comment_success, success_explanation = issue_handler.guess_success(issue, state.history, llm_config)
                                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/openhands_resolver/issue_definitions.py", line 178, in guess_success
    response = litellm.completion(
               ^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/litellm/utils.py", line 960, in wrapper
    raise e
  File "/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/litellm/utils.py", line 849, in wrapper
    result = original_function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/litellm/main.py", line 3034, in completion
    raise exception_type(
          ^^^^^^^^^^^^^^^
  File "/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 2125, in exception_type
    raise e
  File "/opt/hostedtoolcache/Python/3.12.7/x64/lib/python3.12/site-packages/litellm/litellm_core_utils/exception_mapping_utils.py", line 490, in exception_type
    raise RateLimitError(

ERROR:root:<class 'litellm.exceptions.RateLimitError'>: litellm.RateLimitError: AnthropicException - {"type":"error","error":{"type":"rate_limit_error","message":"Number of tokens has exceeded your per-minute rate limit (https://docs.anthropic.com/en/api/rate-limits); see the response headers for current usage. Please reduce the prompt length or the maximum tokens requested, or try again later. You may also contact sales at https://www.anthropic.com/contact-sales to discuss your options for a rate limit increase."}}

Possible Solutions

We can handle this issue by implementing one or more of the following solutions:

a) Set up an environment variable for Maximum Requests Per Minute

  • Description: Introduce an environment variable that specifies the maximum number of requests per minute allowed for the LLM provider.
  • Implementation:
    • Add a new environment variable, e.g., LLM_MAX_REQUESTS_PER_MINUTE.
    • Modify the app to read this variable and throttle the requests accordingly.
    • Use a rate limiter to ensure the number of requests does not exceed this value.

b) Configure an Environment Variable for Anthropic API Tier

  • Description: Set up an environment variable that represents the current Anthropic API tier (since different tiers have different rate limits).
  • Implementation:
    • Add a new environment variable, e.g., ANTHROPIC_API_TIER.
    • Map the tier to its corresponding rate limit within the app.
    • Adjust the request rate based on the tier's rate limit.

c) Auto-detect Rate Limit Exceeded and Implement Retry Logic

  • Description: Modify the app to detect when a RateLimitError occurs and handle it gracefully.
  • Implementation:
    • Catch the RateLimitError exception in the guess_success method.
    • Implement a sleep() function to wait before retrying the request.
    • Optionally use exponential backoff to increase the wait time after each retry.
    • Limit the number of retries to prevent infinite loops.

Additional Context

  • Anthropic API Rate Limits: Refer to the Anthropic API rate limits documentation for more details.
  • Best Practices: Implementing these solutions will help the script comply with the API's terms of service and improve its robustness.

Please let me know if any additional information is required to resolve this issue.


Moved from All-Hands-AI/openhands-resolver#348

@mamoodi mamoodi added bug Something isn't working resolver Related to OpenHands Resolver labels Nov 15, 2024
@malhotra5
Copy link
Contributor

malhotra5 commented Nov 21, 2024

I could take a shot at this! I'm thinking of implementing Solution C but with the same schema as Openhands proper. It include:

LLM_NUM_RETRIES (Default of 8)
LLM_RETRY_MIN_WAIT (Default of 15 seconds)
LLM_RETRY_MAX_WAIT (Default of 120 seconds)
LLM_RETRY_MULTIPLIER (Default of 2)

This would help with consistency in retry methods.

@enyst
Copy link
Collaborator

enyst commented Nov 21, 2024

Please see also: #5087

@enyst
Copy link
Collaborator

enyst commented Nov 29, 2024

Just curious here, sorry if I'm missing something obvious, why are we using litellm.completion(), could we use our llm.completion() instead? It was intended to be compatible with litellm.completion() and it has a retry mechanism.

        response = litellm.completion(
            model=llm_config.model,
            messages=[{'role': 'user', 'content': prompt}],
            api_key=llm_config.api_key,
            base_url=llm_config.base_url,
        )

e.g.

        @self.retry_decorator(
            num_retries=self.config.num_retries,
            retry_exceptions=LLM_RETRY_EXCEPTIONS,
            retry_min_wait=self.config.retry_min_wait,
            retry_max_wait=self.config.retry_max_wait,
            retry_multiplier=self.config.retry_multiplier,
        )
        def wrapper(*args, **kwargs):
            """Wrapper for the litellm completion function. Logs the input and output of the completion function."""
...
@property
    def completion(self):
        """Decorator for the litellm completion function.

        Check the complete documentation at https://litellm.vercel.app/docs/completion
        """
        return self._completion

@malhotra5
Copy link
Contributor

Ah yeah I noticed this too @enyst! I've implemented your suggestions in #5187; it was approved but I didn't have push access to merge it at the time 😅

I'll try to get it in soon

Copy link
Contributor

github-actions bot commented Jan 2, 2025

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale Inactive for 30 days label Jan 2, 2025
@enyst
Copy link
Collaborator

enyst commented Jan 2, 2025

I think this has been addressed by reusing the llm.completion method, which obeys the user configuration (like min_retry).

The rest of the problem here is tracked in other issues, for example on implementing an automated routing mechanism or other features that would improve the behavior with rate limits. (example)

I'll close this, but please feel free to reopen if you see fit.

@enyst enyst closed this as completed Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working resolver Related to OpenHands Resolver Stale Inactive for 30 days
Projects
None yet
Development

No branches or pull requests

4 participants