You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In both Mamba and Mamba2, model used associative scan or similar concepts. The reason associative scan can speed up 1-D scan is that recurrent data dependency do harms to parallelism. While both the operands in SSM and SSD are not 1-D which means we can explore parallism in other dimensions, such as head dimension or d_state. Also associative scan introduces additional FLOPs because the additional multiplication in the binary operation. Are there any results can demonstrate the effectiveness of associative scan?
The text was updated successfully, but these errors were encountered:
Devil-SX
changed the title
Is the associative scan really works?
Does the associative scan really work?
Dec 14, 2024
You can also do linear scan if you have enough parallelism in other dimensions (batch, heads). It's an empirical question, you can implement both and see which one is faster. It also depends on the hardware, e.g. on TPU there's less parallelism and the Google DM folks found linear scan to be just as fast (https://arxiv.org/abs/2402.19427).
In both Mamba and Mamba2, model used associative scan or similar concepts. The reason associative scan can speed up 1-D scan is that recurrent data dependency do harms to parallelism. While both the operands in SSM and SSD are not 1-D which means we can explore parallism in other dimensions, such as head dimension or d_state. Also associative scan introduces additional FLOPs because the additional multiplication in the binary operation. Are there any results can demonstrate the effectiveness of associative scan?
The text was updated successfully, but these errors were encountered: