Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Commit message from CommitPack unused? #23

Closed
SeanHeelan opened this issue Dec 2, 2023 · 1 comment
Closed

Commit message from CommitPack unused? #23

SeanHeelan opened this issue Dec 2, 2023 · 1 comment

Comments

@SeanHeelan
Copy link

Hey folks,

Nice work! Something came to mind as I browsed your code: it looks like you only used the commit subject from CommitPack during training of OctoCoder. Is that correct? (I'm concluding it based on what you have said in #9 regarding the format, and the contents of the repository here).

Did you guys experiment with using the commit message as well? Or was there a reason you decided not to use it?

Thanks!

@Muennighoff
Copy link
Collaborator

Yes that's correct. The reason is that the message is usually exactly the same. If it's not the same it often includes external references which we don't want.

You can browse samples here: https://huggingface.co/datasets/bigcode/commitpackft/viewer/kotlin?row=1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants