Skip to content

Commit

Permalink
Docs: Clarify config.toml usage in evaluation harness (#6828)
Browse files Browse the repository at this point in the history
Co-authored-by: openhands <[email protected]>
  • Loading branch information
xingyaoww and openhands-agent authored Feb 21, 2025
1 parent c27b191 commit e52aee1
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions evaluation/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ To evaluate an agent, you can provide the agent's name to the `run_infer.py` pro
### Evaluating Different LLMs

OpenHands in development mode uses `config.toml` to keep track of most configuration.
**IMPORTANT: For evaluation, only the LLM section in `config.toml` will be used. Other configurations, such as `save_trajectory_path`, are not applied during evaluation.**

Here's an example configuration file you can use to define and use multiple LLMs:

```toml
Expand All @@ -40,6 +42,8 @@ api_key = "XXX"
temperature = 0.0
```

For other configurations specific to evaluation, such as `save_trajectory_path`, these are typically set in the `get_config` function of the respective `run_infer.py` file for each benchmark.

## Supported Benchmarks

The OpenHands evaluation harness supports a wide variety of benchmarks across [software engineering](#software-engineering), [web browsing](#web-browsing), [miscellaneous assistance](#misc-assistance), and [real-world](#real-world) tasks.
Expand Down

0 comments on commit e52aee1

Please sign in to comment.