This repo was created internally to utilize the 🌟logickor🌟 evaluation for self-evaluation.
Maybe, our code is same manner as logickor v2.
Our code provides
zero-shot
code only. (New update; 09. 07) We add1-shot
andcot-1-shot
.
Gukbap-Mistral-7B🍚 (6.06):
Gukbap-Qwen2-7B🍚 (6.70):
Gukbap-Gemma2-9B🍚 (8.77):
There are many issues with evaluating Gemma2 in vllm.
Therefore, you should follow the installation below.
- Download
vllm 0.5.1
version.
pip install vllm==0.5.1
- Add
FLASHINFER
backend in your script file.
export VLLM_ATTENTION_BACKEND=FLASHINFER
- And then, download
flashinfer
package through this link.
- If there are some error, then try: solution2.
Please check the script file.
# Example
export VLLM_ATTENTION_BACKEND=FLASHINFER
python mtbench.py \
--is_multi_turn 1 \
--eval_model gpt-4-1106-preview \
--repo_name HumanF-MarkrAI \
--base_model Gukbap-Gemma2-9B \
--max_token 4096
If you want to test other models (mistral, qwen, ...), then you need to remove
export VLLM_ATTENTION_BACKEND=FLASHINFER
. If you test the Gemma2 models, you need to set max_token < 8192. Cuurently, vllm cannot apply 8192 token with Gemma2.
Please check the script file.
export VLLM_ATTENTION_BACKEND=FLASHINFER
python 1_shot_mtbench.py \
--is_multi_turn 1 \
--eval_model gpt-4-1106-preview \
--repo_name HumanF-MarkrAI \
--base_model Gukbap-Gemma2-9B \
--max_token 4096 \
--prompt cot-1-shot # You select [cot-1-shot or 1-shot]
Gemma2 do not support
system
prompt.
@article{HumanF-MarkrAI,
title={Gukbap-Series-LLM},
author={MarkrAI},
year={2024},
url={https://huggingface.co/HumanF-MarkrAI}
}