You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
thanks for the question! We use a structured prompt template to encourage the model to first output its reasoning step and then output the final answer. I'm not sure whether Eleuther is doing the same.
what the difference is between Math Lvl 5 in the ZeroEval evaluation and the Eleuther eval harness (OpenLLM leaderboard v2)?
For 3.1-70B, ZeroEval is showing 43.3 and HF leaderboard is showing 28%
The text was updated successfully, but these errors were encountered: