-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question about support for sequence parallel #176
Comments
In general, yes. |
Thanks for your timely response! |
Nothing is built-in, but it'll be implemented in the future. |
Got it. Thanks! |
I'm looking into implementing this, has there been any work done yet? |
Hello, I and my colleagues have already implemented this feature. You can just try my PR or try my colleagues' PR |
Hi @zigzagcai thanks for your reply. The PR looks very interesting, but I think you may be referring to a different kind of sequence parallelism than referenced above in Megatron-LM. Rather than in your implementation where a batch of sequences is aggregated into a single sequence for the purpose of computation, I'm referring to splitting a massive sequence onto multiple GPUs. Please correct me if I'm wrong, though. Specifically I'm working on implementing the kind of context parallelism referenced in the Mamba 2 paper in #664. https://arxiv.org/pdf/2405.21060 |
Hi,
I recently learnt about this selective SSM architecture, and it was awesome!
But I have some questions. We know that the Transformer architecture supports sequence parallelism, so does Mamba (the potential alternative of Transformer) support sequence parallelism?
The text was updated successfully, but these errors were encountered: