Logickor-Gemma2-Eval

This repo was created internally to utilize the 🌟logickor🌟 evaluation for self-evaluation.
Maybe, our code is same manner as logickor v2.

Our code provides zero-shot code only. (New update; 09. 07) We add 1-shot and cot-1-shot.

Gukbap-Mistral-7B🍚 (6.06):
Gukbap-Qwen2-7B🍚 (6.70):
Gukbap-Gemma2-9B🍚 (8.77):

Dependency (important)

There are many issues with evaluating Gemma2 in vllm.
Therefore, you should follow the installation below.

Download vllm 0.5.1 version.

pip install vllm==0.5.1

Add FLASHINFER backend in your script file.

export VLLM_ATTENTION_BACKEND=FLASHINFER

And then, download flashinfer package through this link.

If there are some error, then try: solution2.

Evaluation (zero-shot)

Please check the script file.

# Example
export VLLM_ATTENTION_BACKEND=FLASHINFER 

python mtbench.py \
    --is_multi_turn 1 \
    --eval_model gpt-4-1106-preview \
    --repo_name HumanF-MarkrAI \ 
    --base_model Gukbap-Gemma2-9B \ 
    --max_token 4096

If you want to test other models (mistral, qwen, ...), then you need to remove export VLLM_ATTENTION_BACKEND=FLASHINFER. If you test the Gemma2 models, you need to set max_token < 8192. Cuurently, vllm cannot apply 8192 token with Gemma2.

Evaluation (1-shot)

Please check the script file.

export VLLM_ATTENTION_BACKEND=FLASHINFER

python 1_shot_mtbench.py \
    --is_multi_turn 1 \
    --eval_model gpt-4-1106-preview \
    --repo_name HumanF-MarkrAI \
    --base_model Gukbap-Gemma2-9B \
    --max_token 4096 \
    --prompt cot-1-shot # You select [cot-1-shot or 1-shot]

Gemma2 do not support system prompt.

Example

HumanF-MarkrAI/Gukbap-Mistral-7B🍚
HumanF-MarkrAI/Gukbap-Qwen2-7B🍚
HumanF-MarkrAI/Gukbap-Gemma2-7B🍚

BibTex

@article{HumanF-MarkrAI,
  title={Gukbap-Series-LLM},
  author={MarkrAI},
  year={2024},
  url={https://huggingface.co/HumanF-MarkrAI}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Logickor-Gemma2-Eval

Dependency (important)

Evaluation (zero-shot)

Evaluation (1-shot)

Example

BibTex

Files

README.md

Latest commit

History

README.md

File metadata and controls

Logickor-Gemma2-Eval

Dependency (important)

Evaluation (zero-shot)

Evaluation (1-shot)

Example

BibTex