Benchmarks introduction does setup wrong #1284

AMindToThink · 2025-01-18T08:26:52Z

The introduction says:
from deepeval.benchmarks.task import MMLUTask

but really, it is

from deepeval.benchmarks.mmlu.task import MMLUTask

AMindToThink · 2025-01-18T19:36:04Z

https://docs.confident-ai.com/docs/benchmarks-mmlu

The steps described result in the model generating long outputs, not single answers.
Also, batching doesn't work because MMLU has prompts of different lengths.

penguine-ip · 2025-01-22T22:22:31Z

Hey @AMindToThink can you please create a quick PR for this?

AMindToThink · 2025-01-25T03:11:38Z

I made a pull request to change one location with the wrong MMLUTask import. There may be others.

And I am unable to fix the other issues.

AMindToThink closed this as completed Jan 25, 2025

AMindToThink reopened this Jan 25, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarks introduction does setup wrong #1284

Benchmarks introduction does setup wrong #1284

AMindToThink commented Jan 18, 2025

AMindToThink commented Jan 18, 2025

penguine-ip commented Jan 22, 2025

AMindToThink commented Jan 25, 2025

Benchmarks introduction does setup wrong #1284

Benchmarks introduction does setup wrong #1284

Comments

AMindToThink commented Jan 18, 2025

AMindToThink commented Jan 18, 2025

penguine-ip commented Jan 22, 2025

AMindToThink commented Jan 25, 2025