Release 0.3.0
Features:
- Flash Checkpoint to asynchronously persist checkpoint to storage.
- Flash Checkpoint recovers failure in memory.
- Flash Checkpoint supports DDP/FSDP/DeepSpeed/Megatron
- Node detection supports NPU.
Examples
- The example of training nanoGPT using DeepSpeed.
- The example to save/load sharding FSDP checkpoint.