Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I guess mamba.step could be deleted if selective_scan_fn can accept ssm_state as an input param. #233

Open
agiwave opened this issue Mar 10, 2024 · 4 comments · May be fixed by #488
Open

Comments

@agiwave
Copy link

agiwave commented Mar 10, 2024

Maybe there are some benifit below:
1, The code could be simplier.
2, The inference could be faster.
3, The inference can accept multi-tokens in this way.
There are some reference codes here. https://github.com/agiwave/Mamba/blob/main/Mamba.py

@tridao
Copy link
Collaborator

tridao commented Mar 10, 2024

Yep, in some sense step is the specialized version of forward that accepts ssm_state and only move by 1 step.

@agiwave
Copy link
Author

agiwave commented Mar 11, 2024

Mamba is really a great work on model size control. On this point, there's another thinking. The mamba vocab embedding size is 502801024(370m) = about 50m parameters. But it really need d_model=1024 to encode a word? My assumption is: The vocab-dim can be reduced to 128(or even 64) to encode a word. The latent_dim(d_model) can still be 1024 or more to contain more info beyond a word. So, could have in_proj and out_proj to mapping between vocab_dim and d_model. Maybe it also can give us:
1, Sharing vocab_embedding between diff-size models: 370m, 790m .......
2, Vocab_embbeding size will be cut from 50280
1024 to 50280128, or maybe 250000128 will be better.
3, The train could be a little fast because the out_proj can focus on the word encoding info and remove those high-level encoding info(noise for training loss on the last step).
I can't play LLM for resource limit reason. So, this can only be an assumption. Free to ignore it:)

@agiwave
Copy link
Author

agiwave commented Mar 30, 2024

Maybe that version above can help a lot on this point. I don't know whether our owner will take it into consideration. So, i tried it by myself:).

@AnaRhisT94
Copy link

AnaRhisT94 commented Jul 18, 2024

Ok I made it work finally. #477 (With GPU support)

@mzusman mzusman linked a pull request Jul 24, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants