Add prompt version `0.2.1` for JCommonsenseQA #104

mkshing · 2023-10-12T05:46:19Z

Background

In principle, "base" models (trained as just language modeling and without specific prompt format) should be evaluated with prompt version 0.2. But, we were reported 0.3 outperformed 0.2, which is weird.

So, we did comparison between 0.2 and 0.3 for some models. (thank you @mrorii !) And, we found using 0.3 increased scores for all base models in JCommonsenseQA and JNLI.

Summary

JCommonsenseQA is a question answering task given 5 choices. In 0.2, the prompt looks like below. (reference link)

質問と回答の選択肢を入力として受け取り、選択肢から回答を選択してください。なお、回答は選択肢の番号（例：0）でするものとします。 

質問:街のことは？
選択肢:0.タウン,1.劇場,2.ホーム,3.ハウス,4.ニューヨークシティ
回答:

The prompt encourages to answer by "index" rather than the text itself. But, the targets are actually texts. So, I assume models were messed up somehow by this gap. (code)

Solution

Introduced a new prompt version 0.2.1 for base models, which outperformed 0.2 and 0.3.

Model	prompt version	acc
elyza/ELYZA-japanese-Llama-2-7b	0.2	31.64
elyza/ELYZA-japanese-Llama-2-7b	0.3	38.96
elyza/ELYZA-japanese-Llama-2-7b	0.21 (NEW!)	45.49
matsuo-lab/weblab-10b	0.2	23.32
matsuo-lab/weblab-10b	0.3	42.27
matsuo-lab/weblab-10b	0.21 (NEW!)	25.47

* all evals were performed with hf-causal-experimental and jcommonsenseqa-1.1-{prompt version}

# Conflicts:

mrorii

Sorry for the late review 🙇, LGTM 👍

polm-stability

One small change requested.

Also, this is in draft - are you actually still working on it, or is it ready?

lm_eval/tasks/ja/jcommonsenseqa.py

mkshing · 2023-10-24T05:10:58Z

Although all scores of 0.2.1 didn't outperform 0.3, we confirmed 0.2.1 is way better than at least 0.2 and 0.3 for base models. So, I will merge this PR.

Add prompt version 0.21 for JCommonsenseQA

ff702e7

mkshing requested review from mrorii and polm-stability October 12, 2023 05:46

mkshing self-assigned this Oct 12, 2023

Merge branch 'jp-stable' into mkshing/jcomm-0-21

2cf2114

# Conflicts:

mkshing changed the title ~~Add prompt version 0.21 for JCommonsenseQA~~ Add prompt version 0.2.1 for JCommonsenseQA Oct 12, 2023

rename to "0.2.1" and add warning

ca467ca

mrorii approved these changes Oct 23, 2023

View reviewed changes

polm-stability requested changes Oct 24, 2023

View reviewed changes

lm_eval/tasks/ja/jcommonsenseqa.py Outdated Show resolved Hide resolved

change print to warnings

c623c82

mkshing marked this pull request as ready for review October 24, 2023 04:40

mkshing requested a review from jon-tow as a code owner October 24, 2023 04:40

polm-stability approved these changes Oct 24, 2023

View reviewed changes

mkshing removed the request for review from jon-tow October 24, 2023 04:45

mkshing merged commit 6af55ef into jp-stable Oct 24, 2023
1 check passed

mkshing deleted the mkshing/jcomm-0-21 branch October 24, 2023 05:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add prompt version `0.2.1` for JCommonsenseQA #104

Add prompt version `0.2.1` for JCommonsenseQA #104

mkshing commented Oct 12, 2023 •

edited

Loading

mrorii left a comment

polm-stability left a comment

mkshing commented Oct 24, 2023

Add prompt version 0.2.1 for JCommonsenseQA #104

Add prompt version 0.2.1 for JCommonsenseQA #104

Conversation

mkshing commented Oct 12, 2023 • edited Loading

Background

Summary

Solution

mrorii left a comment

Choose a reason for hiding this comment

polm-stability left a comment

Choose a reason for hiding this comment

mkshing commented Oct 24, 2023

Add prompt version `0.2.1` for JCommonsenseQA #104

Add prompt version `0.2.1` for JCommonsenseQA #104

mkshing commented Oct 12, 2023 •

edited

Loading