You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I found that you provide many mmlu test methods.
Take mmlu_stem as an example, including mmlu_stem_test, mmlu_stem, mmlu_stem_var, mmlu_stem_mc_5shot, mmlu_humanities_mc_5shot, mmlu_humanities_mc_5shot_test.
Which one is more recommended?
The text was updated successfully, but these errors were encountered:
For initial testing, it’s recommended to start with easier tasks like the 5-shot methods (e.g., mmlu_stem_mc_5shot or mmlu_humanities_mc_5shot_test). These are useful for evaluating the model’s ability to generalize with a few examples. However, for less capable models, it is not recommended to rely on multiple-choice (MC) tasks right away, as they may not perform well. The focus should be on simpler tasks to gauge the model’s baseline performance before moving to more complex evaluations like MC.
❓ The question
I found that you provide many mmlu test methods.
Take
mmlu_stem
as an example, includingmmlu_stem_test
,mmlu_stem
,mmlu_stem_var
,mmlu_stem_mc_5shot
,mmlu_humanities_mc_5shot
,mmlu_humanities_mc_5shot_test
.Which one is more recommended?
The text was updated successfully, but these errors were encountered: