We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
你好
我在megatron-deepspeed里分别继承了megatron的switch mlp和smartmoe里的megatron-mlp进行对比。
模型采用GPT结构,1.3B大小,两种实现分别设置2个专家实验,未设置专家并行,从模型结构上看没有什么问题。
smart-moe: MegatronMLP
megatron-lm: SwitchMLP
实验结果上同样数据集和batchsize,SwitchMLP要高于MegatronMLP,TFlops分别是10.x 和 8.x。
在论文中你们比较了deepspeed-moe和上一版本的fastmoe,我想请问一下是否有做过和Megatron-LM的moe性能比较
The text was updated successfully, but these errors were encountered:
No branches or pull requests
你好
我在megatron-deepspeed里分别继承了megatron的switch mlp和smartmoe里的megatron-mlp进行对比。
模型采用GPT结构,1.3B大小,两种实现分别设置2个专家实验,未设置专家并行,从模型结构上看没有什么问题。
smart-moe: MegatronMLP
megatron-lm: SwitchMLP
实验结果上同样数据集和batchsize,SwitchMLP要高于MegatronMLP,TFlops分别是10.x 和 8.x。
在论文中你们比较了deepspeed-moe和上一版本的fastmoe,我想请问一下是否有做过和Megatron-LM的moe性能比较
The text was updated successfully, but these errors were encountered: