You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, yes! The Megatron-LM binding is just what I used for experiments. You should be able to use the dMoE layer from other codebases relatively easily. Is there a particular codebase you had in mind for integrating into?
Here are a couple other repos that integrate it. These both actually do something a bit more complicated than necessary because they wanted to add features specific to their frameworks.
I have an internal code base with a pretty vanilla decoder only transformer. I am hoping to swap that out with a dmoe. Thank you for these pointers - it seems like a simpler version of the nanotron example is what I'll try to implement!
Is it possible to import the dmoe model itself into another training script without training via megatron?
The text was updated successfully, but these errors were encountered: