I guess mamba.step could be deleted if selective_scan_fn can accept ssm_state as an input param. #233

agiwave · 2024-03-10T08:13:39Z

Maybe there are some benifit below:
1, The code could be simplier.
2, The inference could be faster.
3, The inference can accept multi-tokens in this way.
There are some reference codes here. https://github.com/agiwave/Mamba/blob/main/Mamba.py

tridao · 2024-03-10T20:03:59Z

Yep, in some sense step is the specialized version of forward that accepts ssm_state and only move by 1 step.

agiwave · 2024-03-11T02:13:34Z

Mamba is really a great work on model size control. On this point, there's another thinking. The mamba vocab embedding size is 502801024(370m) = about 50m parameters. But it really need d_model=1024 to encode a word? My assumption is: The vocab-dim can be reduced to 128(or even 64) to encode a word. The latent_dim(d_model) can still be 1024 or more to contain more info beyond a word. So, could have in_proj and out_proj to mapping between vocab_dim and d_model. Maybe it also can give us:
1, Sharing vocab_embedding between diff-size models: 370m, 790m .......
2, Vocab_embbeding size will be cut from 502801024 to 50280128, or maybe 250000128 will be better.
3, The train could be a little fast because the out_proj can focus on the word encoding info and remove those high-level encoding info(noise for training loss on the last step).
I can't play LLM for resource limit reason. So, this can only be an assumption. Free to ignore it：）

agiwave · 2024-03-30T23:30:51Z

Maybe that version above can help a lot on this point. I don't know whether our owner will take it into consideration. So, i tried it by myself:).

AnaRhisT94 · 2024-07-18T12:19:26Z

Ok I made it work finally. #477 (With GPU support)

govorunov mentioned this issue Mar 20, 2024

Consider allowing hidden state initialisation via ssm_state input parameter for selective_scan_fn #258

Open

mzusman linked a pull request Jul 24, 2024 that will close this issue

feat: Initial state support for Mamba SSM (1) #488

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I guess mamba.step could be deleted if selective_scan_fn can accept ssm_state as an input param. #233

I guess mamba.step could be deleted if selective_scan_fn can accept ssm_state as an input param. #233

agiwave commented Mar 10, 2024

tridao commented Mar 10, 2024

agiwave commented Mar 11, 2024 •

edited

Loading

agiwave commented Mar 30, 2024 •

edited

Loading

AnaRhisT94 commented Jul 18, 2024 •

edited

Loading

I guess mamba.step could be deleted if selective_scan_fn can accept ssm_state as an input param. #233

I guess mamba.step could be deleted if selective_scan_fn can accept ssm_state as an input param. #233

Comments

agiwave commented Mar 10, 2024

tridao commented Mar 10, 2024

agiwave commented Mar 11, 2024 • edited Loading

agiwave commented Mar 30, 2024 • edited Loading

AnaRhisT94 commented Jul 18, 2024 • edited Loading

agiwave commented Mar 11, 2024 •

edited

Loading

agiwave commented Mar 30, 2024 •

edited

Loading

AnaRhisT94 commented Jul 18, 2024 •

edited

Loading