-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reproduce llama 3 evals #2557
Comments
Hi, @baberabb do you plan to add the benchmark that can reproduce the gpqa result as well? |
added as |
Also @baberabb do you know why the max_gen_toks is set to 10 for the mmlu_llama here https://github.com/EleutherAI/lm-evaluation-harness/blob/main/lm_eval/tasks/llama3/instruct/mmlu/_continuation_template_yaml? It only needs to generate max 2 tokens right? (Which will also make this test run faster) |
Also @baberabb do you know if the mmlu_llama with |
The prompt should work well with most instruct models, I think. Have you looked at the outputs (saved with And the generation tokens are 10 as thats whats used in llama evals (they do not provide the stop tokens though, but I added |
llama 3.{1,2,3} have released most of their eval details here in HF and also some in this repo. Would be great if we can upstream them here. I'm adding some in #2556
The text was updated successfully, but these errors were encountered: