Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experiments with Larger Vocabularies for Llama 2 Models? #2

Open
wdlctc opened this issue Oct 11, 2024 · 1 comment
Open

Experiments with Larger Vocabularies for Llama 2 Models? #2

wdlctc opened this issue Oct 11, 2024 · 1 comment

Comments

@wdlctc
Copy link

wdlctc commented Oct 11, 2024

Thank you for this interesting study on vocabulary scaling laws.

I'm curious if you ran any experiments comparing the performance of Llama 2 models with larger vocabularies as predicted by your approaches - specifically Llama 2 7B with a 57K vocabulary, Llama 2 13B with a 79K vocabulary, and Llama 2 70B with a 216K vocabulary.

If so, how did the results compare to the original Llama 2 models with 32K vocabularies?
If not, do you have plans to conduct such experiments in future work? Is it bottlenecked by GPU memory wall?

It is not shown on paper but I think if it is memory problem I can help on this issue.

It would be valuable to see empirical validation of your predictions on these widely-used model scales. Thank you!

@SivilTaram
Copy link
Collaborator

Hello @wdlctc, thank you for your interest in our work! We appreciate your inquiry regarding experiments on 7B-level models. Due to budget constraints, we haven't been able to conduct these specific experiments yet. However, we will provide more insights on 7B-level models in the camera-ready version of our paper. We'd be very grateful if any sponsorship opportunities arise to support these experiments. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants