Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: (eval) Instance results with llm proxy OpenAIException errors got merged into output.jsonl #4166

Closed
2 tasks done
ryanhoangt opened this issue Oct 2, 2024 · 8 comments
Labels
bug Something isn't working evaluation Related to running evaluations with OpenHands severity:medium Affecting multiple users Stale Inactive for 30 days

Comments

@ryanhoangt
Copy link
Contributor

ryanhoangt commented Oct 2, 2024

Is there an existing issue for the same bug?

Describe the bug

When running the eval via the All Hands AI's LLM proxy, sometimes the server crashed with 502 response. The eval result is still collected into the output.jsonl file with the error field being:

"error": "There was an unexpected error while running the agent: litellm.APIError: APIError: OpenAIException - <html><head>\n<meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<title>502 Server Error</title>\n</head>\n<body text=#000000 bgcolor=#ffffff>\n<h1>Error: Server Error</h1>\n<h2>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.</h2>\n<h2></h2>\n</body></html>",

Then we have to manually filter out instances with that error and rerun. Maybe we should have some kind of logic to automatically retry for this scenario.

Current OpenHands version

0.9.7

Installation and Configuration

ALLHANDS_API_KEY="<all hands ai remote runtime key>" RUNTIME=remote SANDBOX_REMOTE_RUNTIME_API_URL="https://runtime.eval.all-hands.dev" EVAL_DOCKER_IMAGE_PREFIX="us-central1-docker.pkg.dev/evaluation-092424/swe-bench-images" ./evaluation/swe_bench/scripts/run_infer.sh llm.eval_sonnet_3_5 HEAD CoActPlannerAgent 100 40 1 "princeton-nlp/SWE-bench_Lite" test

Model and Agent

  • Model: openai/claude-3-5-sonnet@20240620
  • Agent: CoActPlannerAgent

Operating System

Linux

Reproduction Steps

No response

Logs, Errors, Screenshots, and Additional Context

No response

@ryanhoangt ryanhoangt added the bug Something isn't working label Oct 2, 2024
@enyst
Copy link
Collaborator

enyst commented Oct 2, 2024

@ryanhoangt Can you please post a traceback from the logs if you have, by any chance, or the .jsonl ? I made a quick fix in the linked PR, I'd like to look at it some more though.

@ryanhoangt
Copy link
Contributor Author

ryanhoangt commented Oct 2, 2024

Unfortunately from the trajectory in the jsonl file there're no traceback. There's only one last entry from the history field beside the error field above. I can try capturing the traceback (if having any) from the log directly next time.

{
    "id": 84,
    "timestamp": "2024-10-02T10:06:45.050451",
    "source": "agent",
    "message": "There was an unexpected error while running the agent",
    "observation": "error",
    "content": "There was an unexpected error while running the agent",
    "extras": {}
}

I'm also quite confused about whether it is litellm.APIError or OpenAIException. From the doc seems to me like OpenAIException is a provider-specific exception and litellm.APIError is a wrapper for all providers.

@mamoodi mamoodi added evaluation Related to running evaluations with OpenHands severity:medium Affecting multiple users labels Oct 2, 2024
@enyst
Copy link
Collaborator

enyst commented Oct 5, 2024

The linked PR added retries from our LLM class, but I think a better fix will retry the eval or make sure it's not in jsonl so that it will be attempted again.

@ryanhoangt
Copy link
Contributor Author

Thanks for the fix! Btw can you explain why retrying the whole eval is better? Not sure about the architectural side, but imo it may be not necessary to run again from the first step (especially when we're at the very end of the trajectory).

@enyst
Copy link
Collaborator

enyst commented Oct 7, 2024

Oh, they're not exclusive. The request is retried now, and we can configure the retry settings to make more attempts (in config.toml for the respective llm.eval group). You may want to do that, give it as much time as you see fit... That will retry from the current state.

But well, there will be a limit, so my thinking here is simply that if the proxy continues to be unavailable at that time I'm guessing the reasonable thing is to give it up, just don't save it in the jsonl so we can rerun it. 🤔

Copy link
Contributor

github-actions bot commented Nov 7, 2024

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

@github-actions github-actions bot added the Stale Inactive for 30 days label Nov 7, 2024
@enyst
Copy link
Collaborator

enyst commented Nov 7, 2024

I think I saw another error merged into the jsonl, but... when it was only 1 task and 1 worker. We usually use multiprocessing lately, which might be why we don't see it. Maybe.

On the other hand, we have meanwhile made more fixes and added some retry when inference ends abnormally, before it gets to the output file, maybe it was fixed.

@ryanhoangt
Copy link
Contributor Author

Yeah, from my side I can see the retries happen after your fix. Recently with the new LLM proxy I don't even receive 502 errors anymore. Maybe this PR can be closed.

@ryanhoangt ryanhoangt closed this as not planned Won't fix, can't repro, duplicate, stale Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working evaluation Related to running evaluations with OpenHands severity:medium Affecting multiple users Stale Inactive for 30 days
Projects
None yet
Development

No branches or pull requests

3 participants