Skip to content

Release 0.3.5

Compare
Choose a tag to compare
@workingloong workingloong released this 29 Mar 07:02
· 577 commits to master since this release

Features:

  • Flash checkpoint supports saving and loading Megatron-LM MOE models. #1042
  • APIs to extend the module to check the node with different chips. #1023
  • Automatically mark the node as unschedulable if the node fails. #1025

BugFix:

  • Fix the DDP example of mnist to save and load checkpoint. #1051
  • Fix the checkpoint name of DDP. #1034