Skip to content

Commit

Permalink
Update
Browse files Browse the repository at this point in the history
  • Loading branch information
yan-gao-GY committed Sep 6, 2024
1 parent f9d2fdf commit e03011d
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 332 deletions.
6 changes: 6 additions & 0 deletions benchmarks/flowertune-llm/evaluation/code/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ pip install -r requirements.txt

# Log in HuggingFace account
huggingface-cli login

# Download main.py script
git clone https://github.com/bigcode-project/bigcode-evaluation-harness.git && cd bigcode-evaluation-harness && git checkout 0f3e95f0806e78a4f432056cdb1be93604a51d69 && mv main.py ../ && cd .. && rm -rf bigcode-evaluation-harness
```

After that, install `Node.js` and `g++` for the evaluation of JavaScript, C++:
Expand All @@ -41,14 +44,17 @@ sudo apt-get install g++
```bash
python main.py \
--model=mistralai/Mistral-7B-v0.3
--peft_model=/path/to/fine-tuned-peft-model-dir/ # e.g., ./peft_1
--max_length_generation=1024 # change to 2048 when running mbpp
--batch_size=4
--allow_code_execution
--save_generations
--save_references
--tasks=humaneval # chosen from [mbpp, humaneval, multiple-js, multiple-cpp]
--metric_output_path=./evaluation_results_humaneval.json # change dataset name based on your choice
```

The model answers and pass@1 scores will be saved to `generations_{dataset_name}.json` and `evaluation_results_{dataset_name}.json`, respectively.

> [!NOTE]
Expand Down
332 changes: 0 additions & 332 deletions benchmarks/flowertune-llm/evaluation/code/main.py

This file was deleted.

0 comments on commit e03011d

Please sign in to comment.