Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Does the associative scan really work? #646

Open
Devil-SX opened this issue Dec 14, 2024 · 1 comment
Open

Does the associative scan really work? #646

Devil-SX opened this issue Dec 14, 2024 · 1 comment

Comments

@Devil-SX
Copy link

Devil-SX commented Dec 14, 2024

In both Mamba and Mamba2, model used associative scan or similar concepts. The reason associative scan can speed up 1-D scan is that recurrent data dependency do harms to parallelism. While both the operands in SSM and SSD are not 1-D which means we can explore parallism in other dimensions, such as head dimension or d_state. Also associative scan introduces additional FLOPs because the additional multiplication in the binary operation. Are there any results can demonstrate the effectiveness of associative scan?

@Devil-SX Devil-SX changed the title Is the associative scan really works? Does the associative scan really work? Dec 14, 2024
@tridao
Copy link
Collaborator

tridao commented Dec 14, 2024

You can also do linear scan if you have enough parallelism in other dimensions (batch, heads). It's an empirical question, you can implement both and see which one is faster. It also depends on the hardware, e.g. on TPU there's less parallelism and the Google DM folks found linear scan to be just as fast (https://arxiv.org/abs/2402.19427).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants