Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replication of Still 1.5 Preview AIME: 32% AIME not 39% #22

Open
michaelzhiluo opened this issue Jan 31, 2025 · 1 comment
Open

Replication of Still 1.5 Preview AIME: 32% AIME not 39% #22

michaelzhiluo opened this issue Jan 31, 2025 · 1 comment

Comments

@michaelzhiluo
Copy link

michaelzhiluo commented Jan 31, 2025

I have ran Still 1.5 Pass@16 averaged and got 31% Pass 1 averaged over 16 passes of AIME, temperature 0.6 (like in deepseek) and top_p=0.95 (like in deepseek).

Are there other parameters that need to be tuned to replicate these results?

@michaelzhiluo
Copy link
Author

Update: We tried 32K (max context length for qwen) and did it over pass 16:

Total problems: 30
Pass@1 average over 16 runs: 32.71%
Pass@16: 73.33%

@michaelzhiluo michaelzhiluo changed the title Replication of Still 1.5 Preview AIME Results Replication of Still 1.5 Preview AIME: 32% AIME not 39% Feb 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant