forked from All-Hands-AI/OpenHands
-
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
11 changed files
with
1,582 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# DiscoveryBench with OpenHands | ||
|
||
[DiscoveryBench](https://github.com/allenai/discoverybench/) [(Paper)](https://arxiv.org/abs/2407.01725v1) contains 264 tasks collected across 6 diverse domains, such as biology, economics, and sociology. It incorporates discovery workflows from published papers to approximate the real-world challenges faced by researchers. | ||
|
||
<p align="center"> | ||
<a href="[https://github.com/allenai/discoverybench](https://github.com/allenai/discoverybench)"> | ||
<img src="https://raw.githubusercontent.com/allenai/discoverybench/refs/heads/main/assets/discoverybench-openhands-teaser.png" width="100%" alt="DiscoveryBench Background" /> | ||
</a> | ||
</p> | ||
|
||
|
||
## Setup Environment and LLM Configuration | ||
|
||
1. Please follow instructions mentioned [here](https://github.com/openlocus/OpenHands/blob/discoverybench-openhands-integration/evaluation/README.md#setup) to setup OpenHands development environment and LLMs locally | ||
|
||
2. Execute the bash script to start DiscoveryBench Evaluation | ||
|
||
``` | ||
./evaluation/discoverybench/scripts/run_infer.sh [YOUR MODEL CONFIG] | ||
``` | ||
Replace `[YOUR MODEL CONFIG]` with any model the model that you have set up in `config.toml` | ||
|
||
|
||
## Run Inference on DiscoveryBench Instances | ||
|
||
When the `run_infer.sh` script is started, it will automatically pull the latest DiscoveryBench instances & set up the agent environment. The OpenHands agent is invoked to process the task within this environment, producing a hypothesis. We then evaluate it against the “gold” hypothesis provided by DiscoveryBench. The evaluation result, along with the agent chat history is logged to `output.jsonl` under `evaluation_outputs`. | ||
|
||
|
||
``` | ||
./evaluation/discoverybench/scripts/run_infer.sh [MODEL_CONFIG] [GIT_COMMIT] [AGENT] [EVAL_LIMIT] [NUM_WORKERS] | ||
``` | ||
|
||
- `MODEL_CONFIG`: Name of the model you want to evaluate with | ||
- `GIT_COMMIT`: This should be the git commit hash or release tag for OpenHands, e.g., HEAD or a specific tag like 0.6.2. | ||
- `AGENT`: Use CoderActAgent, right now it only supports that. | ||
- `EVAL_LIMIT`: Number of samples to evaluate. | ||
- `NUM_WORKERS`: Number of workers to parallelize the evaluation process. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
## DiscoveryBench Evaluation Utils | ||
|
||
- **`eval_w_subhypo_gen.py`**: Implements the DiscoveryBench logic for evaluating agent-generated hypotheses. | ||
- **`lm_utils.py`**: Provides utility functions necessary for the evaluation process. | ||
- **`openai_helpers.py`**: Includes helper functions for OpenAI-related tasks. | ||
- **`openai_semantic_gen_prompts.py`**: Contains prompts used for semantic generation. | ||
- **`response_parser.py`**: Handles the parsing of agent-generated hypotheses. |
Empty file.
Oops, something went wrong.