-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
how to use tutel on Megatron Deepspeed #207
Comments
Do you mean Megatron and Deepspeed respectively, or working together for them all? |
@ghostplant Can tutel work concurrently with Megatron or Deepspeed respectively? |
Yes, Tutel is just an MoE layer implementation which is pluggable for any distributed frameworks. The way for other framework to use Tutel MoE layer is by passing distributed processing group properly, e.g.: my_processing_group = deepspeed.new_group(..)
moe_layer = tutel_moe.moe_layer(
..,
group=my_processing_group
) If other frameworks are not available, Tutel itself also provides a 1-line initialization to generate groups you need, which works for both distributed gpu (i.e. nccl) and distributed cpu (i.e. gloo): from tutel import system
parallel_env = system.init_data_model_parallel(backend='nccl' if args.device == 'cuda' else 'gloo')
my_processing_group = [ parallel_env.data_group | parallel_env.model_group | parallel_env.global_group ]
... |
Thanks for your prompt response! |
can tutel be used with Megatron Deepspeed?
The text was updated successfully, but these errors were encountered: