Differences in MATH-Lvl-5 on Llama3.1 between LM Eval Harness #15

vithursant · 2024-10-10T20:42:17Z

what the difference is between Math Lvl 5 in the ZeroEval evaluation and the Eleuther eval harness (OpenLLM leaderboard v2)?

For 3.1-70B, ZeroEval is showing 43.3 and HF leaderboard is showing 28%

yuchenlin · 2024-10-16T19:33:36Z

hi vithursant,

thanks for the question! We use a structured prompt template to encourage the model to first output its reasoning step and then output the final answer. I'm not sure whether Eleuther is doing the same.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Differences in MATH-Lvl-5 on Llama3.1 between LM Eval Harness #15

Differences in MATH-Lvl-5 on Llama3.1 between LM Eval Harness #15

vithursant commented Oct 10, 2024

yuchenlin commented Oct 16, 2024

Differences in MATH-Lvl-5 on Llama3.1 between LM Eval Harness #15

Differences in MATH-Lvl-5 on Llama3.1 between LM Eval Harness #15

Comments

vithursant commented Oct 10, 2024

yuchenlin commented Oct 16, 2024