Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smartmoe性能问题 #1

Open
cccc0der opened this issue Aug 10, 2023 · 0 comments
Open

smartmoe性能问题 #1

cccc0der opened this issue Aug 10, 2023 · 0 comments

Comments

@cccc0der
Copy link

你好

我在megatron-deepspeed里分别继承了megatron的switch mlp和smartmoe里的megatron-mlp进行对比。

模型采用GPT结构,1.3B大小,两种实现分别设置2个专家实验,未设置专家并行,从模型结构上看没有什么问题。

smart-moe: MegatronMLP
image

megatron-lm: SwitchMLP
image

实验结果上同样数据集和batchsize,SwitchMLP要高于MegatronMLP,TFlops分别是10.x 和 8.x。

在论文中你们比较了deepspeed-moe和上一版本的fastmoe,我想请问一下是否有做过和Megatron-LM的moe性能比较

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant