[Bug]: (eval) Instance results with llm proxy `OpenAIException` errors got merged into output.jsonl #4166

ryanhoangt · 2024-10-02T06:29:07Z

Is there an existing issue for the same bug?

I have checked the troubleshooting document at https://docs.all-hands.dev/modules/usage/troubleshooting
I have checked the existing issues.

Describe the bug

When running the eval via the All Hands AI's LLM proxy, sometimes the server crashed with 502 response. The eval result is still collected into the output.jsonl file with the error field being:

"error": "There was an unexpected error while running the agent: litellm.APIError: APIError: OpenAIException - <html><head>\n<meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<title>502 Server Error</title>\n</head>\n<body text=#000000 bgcolor=#ffffff>\n<h1>Error: Server Error</h1>\n<h2>The server encountered a temporary error and could not complete your request.<p>Please try again in 30 seconds.</h2>\n<h2></h2>\n</body></html>",

Then we have to manually filter out instances with that error and rerun. Maybe we should have some kind of logic to automatically retry for this scenario.

Current OpenHands version

0.9.7

Installation and Configuration

ALLHANDS_API_KEY="<all hands ai remote runtime key>" RUNTIME=remote SANDBOX_REMOTE_RUNTIME_API_URL="https://runtime.eval.all-hands.dev" EVAL_DOCKER_IMAGE_PREFIX="us-central1-docker.pkg.dev/evaluation-092424/swe-bench-images" ./evaluation/swe_bench/scripts/run_infer.sh llm.eval_sonnet_3_5 HEAD CoActPlannerAgent 100 40 1 "princeton-nlp/SWE-bench_Lite" test

Model and Agent

Model: openai/claude-3-5-sonnet@20240620
Agent: CoActPlannerAgent

Operating System

Linux

Reproduction Steps

No response

Logs, Errors, Screenshots, and Additional Context

No response

The text was updated successfully, but these errors were encountered:

enyst · 2024-10-02T10:54:10Z

@ryanhoangt Can you please post a traceback from the logs if you have, by any chance, or the .jsonl ? I made a quick fix in the linked PR, I'd like to look at it some more though.

ryanhoangt · 2024-10-02T11:11:12Z

Unfortunately from the trajectory in the jsonl file there're no traceback. There's only one last entry from the history field beside the error field above. I can try capturing the traceback (if having any) from the log directly next time.

{
    "id": 84,
    "timestamp": "2024-10-02T10:06:45.050451",
    "source": "agent",
    "message": "There was an unexpected error while running the agent",
    "observation": "error",
    "content": "There was an unexpected error while running the agent",
    "extras": {}
}

I'm also quite confused about whether it is litellm.APIError or OpenAIException. From the doc seems to me like OpenAIException is a provider-specific exception and litellm.APIError is a wrapper for all providers.

enyst · 2024-10-05T22:22:48Z

The linked PR added retries from our LLM class, but I think a better fix will retry the eval or make sure it's not in jsonl so that it will be attempted again.

ryanhoangt · 2024-10-07T05:23:47Z

Thanks for the fix! Btw can you explain why retrying the whole eval is better? Not sure about the architectural side, but imo it may be not necessary to run again from the first step (especially when we're at the very end of the trajectory).

enyst · 2024-10-07T07:01:11Z

Oh, they're not exclusive. The request is retried now, and we can configure the retry settings to make more attempts (in config.toml for the respective llm.eval group). You may want to do that, give it as much time as you see fit... That will retry from the current state.

But well, there will be a limit, so my thinking here is simply that if the proxy continues to be unavailable at that time I'm guessing the reasonable thing is to give it up, just don't save it in the jsonl so we can rerun it. 🤔

github-actions · 2024-11-07T01:57:38Z

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

enyst · 2024-11-07T02:05:16Z

I think I saw another error merged into the jsonl, but... when it was only 1 task and 1 worker. We usually use multiprocessing lately, which might be why we don't see it. Maybe.

On the other hand, we have meanwhile made more fixes and added some retry when inference ends abnormally, before it gets to the output file, maybe it was fixed.

ryanhoangt · 2024-11-07T09:12:47Z

Yeah, from my side I can see the retries happen after your fix. Recently with the new LLM proxy I don't even receive 502 errors anymore. Maybe this PR can be closed.

ryanhoangt added the bug Something isn't working label Oct 2, 2024

enyst mentioned this issue Oct 2, 2024

Retry on litellm's APIError, which includes 502 #4167

Merged

mamoodi added evaluation Related to running evaluations with OpenHands severity:medium Affecting multiple users labels Oct 2, 2024

github-actions bot added the Stale Inactive for 30 days label Nov 7, 2024

ryanhoangt closed this as not planned Won't fix, can't repro, duplicate, stale Nov 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: (eval) Instance results with llm proxy `OpenAIException` errors got merged into output.jsonl #4166

[Bug]: (eval) Instance results with llm proxy `OpenAIException` errors got merged into output.jsonl #4166

ryanhoangt commented Oct 2, 2024 •

edited

Loading

enyst commented Oct 2, 2024

ryanhoangt commented Oct 2, 2024 •

edited

Loading

enyst commented Oct 5, 2024

ryanhoangt commented Oct 7, 2024

enyst commented Oct 7, 2024

github-actions bot commented Nov 7, 2024

enyst commented Nov 7, 2024

ryanhoangt commented Nov 7, 2024

[Bug]: (eval) Instance results with llm proxy OpenAIException errors got merged into output.jsonl #4166

[Bug]: (eval) Instance results with llm proxy OpenAIException errors got merged into output.jsonl #4166

Comments

ryanhoangt commented Oct 2, 2024 • edited Loading

Is there an existing issue for the same bug?

Describe the bug

Current OpenHands version

Installation and Configuration

Model and Agent

Operating System

Reproduction Steps

Logs, Errors, Screenshots, and Additional Context

enyst commented Oct 2, 2024

ryanhoangt commented Oct 2, 2024 • edited Loading

enyst commented Oct 5, 2024

ryanhoangt commented Oct 7, 2024

enyst commented Oct 7, 2024

github-actions bot commented Nov 7, 2024

enyst commented Nov 7, 2024

ryanhoangt commented Nov 7, 2024

[Bug]: (eval) Instance results with llm proxy `OpenAIException` errors got merged into output.jsonl #4166

[Bug]: (eval) Instance results with llm proxy `OpenAIException` errors got merged into output.jsonl #4166

ryanhoangt commented Oct 2, 2024 •

edited

Loading

ryanhoangt commented Oct 2, 2024 •

edited

Loading