Docs: Clarify config.toml usage in evaluation harness #6828

xingyaoww · 2025-02-19T13:07:20Z

This PR updates the documentation to clarify the usage of config.toml in the evaluation harness, fix #6813.

Changes made:

Updated the documentation in evaluation/README.md to clarify that only the LLM section in config.toml will be used for evaluation.
Explained that other configurations like save_trajectory_path are set in the get_config function of the respective run_infer.py file for each benchmark.

These changes ensure that users understand how configuration settings are applied during the evaluation process.

Update for issue #6813:
Thank you for reporting this issue. We have clarified the documentation regarding the use of config.toml in the evaluation harness. Here's a summary of the situation:

The evaluation harness for swe_bench doesn't read the save_trajectory_path configuration from config.toml.
We've updated the documentation to clarify that only the LLM section in config.toml will be used for evaluation.
For other configurations specific to evaluation, such as save_trajectory_path, these are set in the get_config function of the respective run_infer.py file for each benchmark.

This PR implements these documentation changes to prevent confusion in the future.

Thank you for your patience and for helping us improve OpenHands!

To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:e616cbb-nikolaik   --name openhands-app-e616cbb   docker.all-hands.dev/all-hands-ai/openhands:e616cbb

openhands-agent added 3 commits February 19, 2025 13:01

Fix save_trajectory_path setting in SWE-Bench evaluation harness

5e33e35

Update pull request description with issue status

ee8802a

Revert changes to evaluation/benchmarks/swe_bench/run_infer.py

e616cbb

xingyaoww changed the title ~~Fix save_trajectory_path setting in SWE-Bench evaluation harness~~ Docs: Clarify config.toml usage in evaluation harness Feb 19, 2025

xingyaoww marked this pull request as ready for review February 19, 2025 13:12

xingyaoww requested a review from li-boxuan February 19, 2025 13:12

li-boxuan approved these changes Feb 20, 2025

View reviewed changes

li-boxuan merged commit e52aee1 into main Feb 21, 2025
14 checks passed

li-boxuan deleted the fix-save-trajectory-path branch February 21, 2025 06:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs: Clarify config.toml usage in evaluation harness #6828

Docs: Clarify config.toml usage in evaluation harness #6828

xingyaoww commented Feb 19, 2025 •

edited

Loading

Docs: Clarify config.toml usage in evaluation harness #6828

Docs: Clarify config.toml usage in evaluation harness #6828

Conversation

xingyaoww commented Feb 19, 2025 • edited Loading

xingyaoww commented Feb 19, 2025 •

edited

Loading