You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The steps described result in the model generating long outputs, not single answers.
Also, batching doesn't work because MMLU has prompts of different lengths.
The introduction says:
from deepeval.benchmarks.task import MMLUTask
but really, it is
from deepeval.benchmarks.mmlu.task import MMLUTask
The text was updated successfully, but these errors were encountered: