Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about support for sequence parallel #176

Open
zigzagcai opened this issue Feb 19, 2024 · 7 comments
Open

Question about support for sequence parallel #176

zigzagcai opened this issue Feb 19, 2024 · 7 comments

Comments

@zigzagcai
Copy link
Contributor

zigzagcai commented Feb 19, 2024

Hi,

I recently learnt about this selective SSM architecture, and it was awesome!
But I have some questions. We know that the Transformer architecture supports sequence parallelism, so does Mamba (the potential alternative of Transformer) support sequence parallelism?

@tridao
Copy link
Collaborator

tridao commented Feb 19, 2024

In general, yes.
Which flavor of sequence parallelism are you referring to? The one in Megatron-LM?

@zigzagcai
Copy link
Contributor Author

zigzagcai commented Feb 19, 2024

In general, yes. Which flavor of sequence parallelism are you referring to? The one in Megatron-LM?

Thanks for your timely response!
Sure. I am referring to the one in Megatron-LM. I am wondering does Mamba has built-in support for this kind of sequence parallel, or we need to implement it manually?

@tridao
Copy link
Collaborator

tridao commented Feb 19, 2024

Nothing is built-in, but it'll be implemented in the future.

@zigzagcai
Copy link
Contributor Author

Got it. Thanks!

@josiahbjorgaard
Copy link

I'm looking into implementing this, has there been any work done yet?

@zigzagcai
Copy link
Contributor Author

zigzagcai commented Sep 13, 2024

I'm looking into implementing this, has there been any work done yet?

Hello, I and my colleagues have already implemented this feature.

You can just try my PR or try my colleagues' PR

@josiahbjorgaard
Copy link

josiahbjorgaard commented Sep 19, 2024

Hi @zigzagcai thanks for your reply. The PR looks very interesting, but I think you may be referring to a different kind of sequence parallelism than referenced above in Megatron-LM. Rather than in your implementation where a batch of sequences is aggregated into a single sequence for the purpose of computation, I'm referring to splitting a massive sequence onto multiple GPUs. Please correct me if I'm wrong, though.

Specifically I'm working on implementing the kind of context parallelism referenced in the Mamba 2 paper in #664. https://arxiv.org/pdf/2405.21060

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants