Skip to content

Release 0.3.0

Compare
Choose a tag to compare
@workingloong workingloong released this 03 Jan 06:54
· 754 commits to master since this release

Features:

  • Flash Checkpoint to asynchronously persist checkpoint to storage.
  • Flash Checkpoint recovers failure in memory.
  • Flash Checkpoint supports DDP/FSDP/DeepSpeed/Megatron
  • Node detection supports NPU.

Examples

  • The example of training nanoGPT using DeepSpeed.
  • The example to save/load sharding FSDP checkpoint.