Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docs: Clarify config.toml usage in evaluation harness #6828

Merged
merged 3 commits into from
Feb 21, 2025

Conversation

xingyaoww
Copy link
Collaborator

@xingyaoww xingyaoww commented Feb 19, 2025

This PR updates the documentation to clarify the usage of config.toml in the evaluation harness, fix #6813.

Changes made:

  1. Updated the documentation in evaluation/README.md to clarify that only the LLM section in config.toml will be used for evaluation.
  2. Explained that other configurations like save_trajectory_path are set in the get_config function of the respective run_infer.py file for each benchmark.

These changes ensure that users understand how configuration settings are applied during the evaluation process.

Update for issue #6813:
Thank you for reporting this issue. We have clarified the documentation regarding the use of config.toml in the evaluation harness. Here's a summary of the situation:

  1. The evaluation harness for swe_bench doesn't read the save_trajectory_path configuration from config.toml.
  2. We've updated the documentation to clarify that only the LLM section in config.toml will be used for evaluation.
  3. For other configurations specific to evaluation, such as save_trajectory_path, these are set in the get_config function of the respective run_infer.py file for each benchmark.

This PR implements these documentation changes to prevent confusion in the future.

Thank you for your patience and for helping us improve OpenHands!


To run this PR locally, use the following command:

docker run -it --rm   -p 3000:3000   -v /var/run/docker.sock:/var/run/docker.sock   --add-host host.docker.internal:host-gateway   -e SANDBOX_RUNTIME_CONTAINER_IMAGE=docker.all-hands.dev/all-hands-ai/runtime:e616cbb-nikolaik   --name openhands-app-e616cbb   docker.all-hands.dev/all-hands-ai/openhands:e616cbb

@xingyaoww xingyaoww changed the title Fix save_trajectory_path setting in SWE-Bench evaluation harness Docs: Clarify config.toml usage in evaluation harness Feb 19, 2025
@xingyaoww xingyaoww marked this pull request as ready for review February 19, 2025 13:12
@xingyaoww xingyaoww requested a review from li-boxuan February 19, 2025 13:12
@li-boxuan li-boxuan merged commit e52aee1 into main Feb 21, 2025
14 checks passed
@li-boxuan li-boxuan deleted the fix-save-trajectory-path branch February 21, 2025 06:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: AppConfig in config.toml not taking effect in evaluation harness
3 participants