Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trying to reproduce MATH benchmark + inconsistency between llama docs #315

Open
fzyzcjy opened this issue Aug 13, 2024 · 0 comments
Open

Comments

@fzyzcjy
Copy link

fzyzcjy commented Aug 13, 2024

Hi thanks for the great open source model! I am trying to reproduce the MATH benchmark, but currently I only achieve 50.9% (average over 10 retries) instead of the 51.9% reported by official llama. Thus I wonder whether it is normal to have such difference, or I am doing something wrong here. Especially, it would be great if I could know the correct prompts and templates to evaluate MATH.

I also seem to find a bit of inconsistency between llama docs. https://github.com/meta-llama/llama3/blob/main/eval_details.md says "4-shot", while https://ai.meta.com/blog/meta-llama-3-1/ (table) says "0-shot CoT". Thus I wonder whether the numbers are 4-shot or 0-shot?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant