feat(eval): misc SWE-Bench improvement - use different resources for different instances #6313

xingyaoww · 2025-01-16T17:44:54Z

End-user friendly description of the problem this fixes or functionality that this introduces

Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions

Improve update output script so it can work on larger output file
Change timeout for scripts to under 600 secs to fits in recent runtime API constraints
move eval_infer create runtime into try-except-finally so we cleanup runtime resource properly when failed
add a resource mapping for different evaluation instances that requires more resource to run (to save some OOM restart)

Link of any specific issues this addresses

To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:9c912b6-nikolaik   --name openhands-app-9c912b6   docker.all-hands.dev/all-hands-ai/openhands:9c912b6

…different instances

evaluation/benchmarks/swe_bench/scripts/run_infer.sh

…rce-fix

evaluation/benchmarks/swe_bench/eval_infer.py

evaluation/benchmarks/swe_bench/resource/mapping.py

xingyaoww · 2025-01-16T18:19:39Z

Thanks Calvin!

OpenHands can you help me address the comments?

openhands-agent · 2025-01-16T18:20:02Z

OpenHands started fixing the pr! You can monitor the progress here.

…resources for different instances

openhands-agent · 2025-01-16T18:23:03Z

Here's a concise overview of the changes and their status:

✅ Successfully Resolved:

File handling improvement:
- Implemented proper context manager (with statement) for file operations
- Fixed potential file handle leak issues
- Ensures proper resource cleanup
Type hint modernization:
- Removed deprecated typing.Dict
- Replaced with built-in dict type hints
- Updated all relevant type annotations

🔍 Current Status:

All identified issues appear to have been successfully addressed
Changes maintain existing functionality while improving code quality
No remaining issues are evident from the provided information

The changes effectively modernized the code and improved resource management without altering the core functionality.

evaluation/benchmarks/swe_bench/resource/princeton-nlp__SWE-bench_Verified-test.json

csmith49

Looks good to me, looking forward to not having so many restarts!

…different instances (All-Hands-AI#6313) Co-authored-by: openhands <[email protected]>

feat(eval): misc SWE-Bench improvement - use different resources for …

33b79a7

…different instances

xingyaoww commented Jan 16, 2025

View reviewed changes

evaluation/benchmarks/swe_bench/scripts/run_infer.sh Outdated Show resolved Hide resolved

xingyaoww marked this pull request as ready for review January 16, 2025 17:45

Merge commit '0bed17758f05aee421bb0148ab0d6df59bd574fe' into xw/resou…

20ea911

…rce-fix

xingyaoww force-pushed the xw/resource-fix branch from 90d027b to 20ea911 Compare January 16, 2025 17:47

remove extra

d84c8dc

xingyaoww requested a review from csmith49 January 16, 2025 17:49

xingyaoww assigned csmith49 Jan 16, 2025

csmith49 reviewed Jan 16, 2025

View reviewed changes

evaluation/benchmarks/swe_bench/eval_infer.py Outdated Show resolved Hide resolved

csmith49 reviewed Jan 16, 2025

View reviewed changes

evaluation/benchmarks/swe_bench/resource/mapping.py Outdated Show resolved Hide resolved

xingyaoww added the fix-me Attempt to fix this issue with OpenHands label Jan 16, 2025

Fix pr #6313: feat(eval): misc SWE-Bench improvement - use different …

9c912b6

…resources for different instances

csmith49 reviewed Jan 16, 2025

View reviewed changes

evaluation/benchmarks/swe_bench/resource/princeton-nlp__SWE-bench_Verified-test.json Show resolved Hide resolved

csmith49 approved these changes Jan 16, 2025

View reviewed changes

xingyaoww merged commit 72af7bb into main Jan 16, 2025
24 checks passed

xingyaoww deleted the xw/resource-fix branch January 16, 2025 18:48

csmith49 pushed a commit to csmith49/OpenHands that referenced this pull request Jan 19, 2025

feat(eval): misc SWE-Bench improvement - use different resources for …

2e573f6

…different instances (All-Hands-AI#6313) Co-authored-by: openhands <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(eval): misc SWE-Bench improvement - use different resources for different instances #6313

feat(eval): misc SWE-Bench improvement - use different resources for different instances #6313

xingyaoww commented Jan 16, 2025 •

edited by github-actions bot

Loading

xingyaoww commented Jan 16, 2025

openhands-agent commented Jan 16, 2025

openhands-agent commented Jan 16, 2025

csmith49 left a comment

feat(eval): misc SWE-Bench improvement - use different resources for different instances #6313

feat(eval): misc SWE-Bench improvement - use different resources for different instances #6313

Conversation

xingyaoww commented Jan 16, 2025 • edited by github-actions bot Loading

xingyaoww commented Jan 16, 2025

openhands-agent commented Jan 16, 2025

openhands-agent commented Jan 16, 2025

csmith49 left a comment

Choose a reason for hiding this comment

xingyaoww commented Jan 16, 2025 •

edited by github-actions bot

Loading