-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
I guess mamba.step could be deleted if selective_scan_fn can accept ssm_state as an input param. #233
Comments
Yep, in some sense |
Mamba is really a great work on model size control. On this point, there's another thinking. The mamba vocab embedding size is 502801024(370m) = about 50m parameters. But it really need d_model=1024 to encode a word? My assumption is: The vocab-dim can be reduced to 128(or even 64) to encode a word. The latent_dim(d_model) can still be 1024 or more to contain more info beyond a word. So, could have in_proj and out_proj to mapping between vocab_dim and d_model. Maybe it also can give us: |
Maybe that version above can help a lot on this point. I don't know whether our owner will take it into consideration. So, i tried it by myself:). |
Ok I made it work finally. #477 (With GPU support) |
Maybe there are some benifit below:
1, The code could be simplier.
2, The inference could be faster.
3, The inference can accept multi-tokens in this way.
There are some reference codes here. https://github.com/agiwave/Mamba/blob/main/Mamba.py
The text was updated successfully, but these errors were encountered: