New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

smartmoe性能问题 #1

Open

cccc0der opened this issue Aug 10, 2023 · 0 comments

cccc0der commented Aug 10, 2023

你好

我在megatron-deepspeed里分别继承了megatron的switch mlp和smartmoe里的megatron-mlp进行对比。

模型采用GPT结构，1.3B大小，两种实现分别设置2个专家实验，未设置专家并行，从模型结构上看没有什么问题。

smart-moe: MegatronMLP

megatron-lm: SwitchMLP

实验结果上同样数据集和batchsize，SwitchMLP要高于MegatronMLP，TFlops分别是10.x 和 8.x。

在论文中你们比较了deepspeed-moe和上一版本的fastmoe，我想请问一下是否有做过和Megatron-LM的moe性能比较

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment