You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some details about mamba2-2.7b are not provided in the paper. May I ask about the nheads in mamba2-2.7b What are the parameters d_head, Training steps, Learning Rate, and how to set Num_ attention heads? If possible, please let me know. Thank you very much! It would be even better if we could provide weight files in HF format.
There is another question: how should the number of attention and MLP be set?
The text was updated successfully, but these errors were encountered:
Some details about mamba2-2.7b are not provided in the paper. May I ask about the nheads in mamba2-2.7b What are the parameters d_head, Training steps, Learning Rate, and how to set Num_ attention heads? If possible, please let me know. Thank you very much! It would be even better if we could provide weight files in HF format.
There is another question: how should the number of attention and MLP be set?
The text was updated successfully, but these errors were encountered: