Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducing Humaneval Benchmark Results #11

Open
YSLIU627 opened this issue Aug 6, 2024 · 0 comments
Open

Reproducing Humaneval Benchmark Results #11

YSLIU627 opened this issue Aug 6, 2024 · 0 comments

Comments

@YSLIU627
Copy link

YSLIU627 commented Aug 6, 2024

Hi, we re-ran the training phase and evaluated the trained model by the evaluation script in your repo. However, we find that there is a performance in the Humaneval benchmark between the trained model and the release score in the paper. We also evaluate the released model in the hugging face and report the results as follows.
20240806-135618
Here the first column is the released score in the paper, the second column is the evaluation result of the released model, and the last column is the evaluation result of our re-trained model. We did not modify any hyper-parameters before training and found that the loss curve of our re-trained model is identical to the one that you released in the other issue (issue #6). We are not sure if you evaluate the model that is saved in the end of the training or some intermediate checkpoint (for example, checkpoint-2000). We will appreciate it quite a lot if you could offer the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant