You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am training a ~520 M model, but I have found that the megablocks moe version uses substantially more memory and takes longer to train than a dense model of corresponding size. I am using a model embedding dimension of 1536. The moe model has 48 experts with 8 active and and expert size of 128. I set lbl loss weight to 0.001.
The text was updated successfully, but these errors were encountered:
samuelwheeler
changed the title
MOE uses much more memory than dense model and is substantially slower
MOE uses more memory than dense model and is substantially slower
Mar 3, 2025
samuelwheeler
changed the title
MOE uses more memory than dense model and is substantially slower
MOE uses more memory than dense model and is slower
Mar 3, 2025
I am training a ~520 M model, but I have found that the megablocks moe version uses substantially more memory and takes longer to train than a dense model of corresponding size. I am using a model embedding dimension of 1536. The moe model has 48 experts with 8 active and and expert size of 128. I set lbl loss weight to 0.001.
The text was updated successfully, but these errors were encountered: