使用 mmlu 数据集进行模型效果的时候，为什么 mmlu_gen 比 mmlu_ppl 评估时间要多用几个小时？ #401

amulil · 2023-09-15T07:29:33Z

amulil
Sep 15, 2023

rt.

还有这两种数据集评估具体有什么区别和适用场景，是 chat 模型推荐用 mmlu_gen、非 chat 模型用 mmlu_ppl 吗？

JaggerQ · 2023-10-10T03:23:38Z

想请问一下你的 mmlu_ppl复现的结果和opencompass榜单上给的结果是一样的吗？

0 replies

tonysy · 2023-11-14T13:14:00Z

The model tends to predict between 10 and 100 words in its generalization mode, which can noticeably slow down the inference process.

0 replies