v0.6.0: Checkpointing and bfloat16 support
This release adds support for bloat16 mixed precision training (requires PyTorch >= 1.10) and a brand-new checkpoint utility to help with resuming interrupted trainings. We also get a completely revamped documentation frontend.
Checkpoints
Save the current state of all your objects (models, optimizers, RNG states) with accelerator.save_state(path_to_checkpoint)
and reload everything by calling accelerator.load_state(path_to_checkpoint)
- Add in checkpointing capability by @muellerzr in #255
- Implementation of saving and loading custom states by @muellerzr in #270
BFloat16 support
Accelerate now supports bfloat16 mixed precision training. As a result the old --fp16
argument has been deprecated to be replaced by the more generic --mixed-precision
.
- Add bfloat16 support #243 by @ikergarcia1996 in #247
New env subcommand
You can now type accelerate env
to have a copy-pastable summary of your environment and default configuration. Very convenient when opening a new issue!
New doc frontend
The documentation has been switched to the new Hugging Face frontend, like Transformers and Datasets.
What's Changed
- Fix send_to_device with non-tensor data by @sgugger in #177
- Handle UserDict in all utils by @sgugger in #179
- Use collections.abc.Mapping to handle both the dict and the UserDict types by @mariosasko in #180
- fix: use
store_true
on argparse in nlp example by @monologg in #183 - Update README.md by @TevenLeScao in #187
- Add signature check for
set_to_none
in Optimizer.zero_grad by @sgugger in #189 - fix typo in code snippet by @MrZilinXiao in #199
- Add high-level API reference to README by @Chris-hughes10 in #204
- fix rng_types in accelerator by @s-kumano in #206
- Pass along drop_last in DispatchDataLoader by @sgugger in #212
- Rename state to avoid name conflicts with pytorch's Optimizer class. by @yuxinyuan in #224
- Fix lr scheduler num samples by @sgugger in #227
- Add customization point for init_process_group kwargs by @sgugger in #228
- Fix typo in installation docs by @jaketae in #234
- make deepspeed optimizer match parameters of passed optimizer by @jmhessel in #246
- Upgrade black to version ~=22.0 by @LysandreJik in #250
- add support of gather_object by @ZhiyuanChen in #238
- Add launch flags --module and --no_python (#256) by @parameter-concern in #258
- Accelerate + Animus/Catalyst = 🚀 by @Scitator in #249
- Add
debug_launcher
by @sgugger in #259 - enhance compatibility of honor type by @ZhiyuanChen in #241
- Add a flag to use CPU only in the config by @sgugger in #263
- Basic fixes for DeepSpeed by @sgugger in #264
- Ability to set the seed with randomness from inside Accelerate by @muellerzr in #266
- Don't use dispatch_batches when torch is < 1.8.0 by @sgugger in #269
- Make accelerated model with AMP possible to pickle by @BenjaminBossan in #274
- Contributing guide by @LysandreJik in #254
- replace texts and link (master -> main) by @johnnv1 in #282
- Use workflow from doc-builder by @sgugger in #275
- Pass along execution info to the exit of autocast by @sgugger in #284
New Contributors
- @mariosasko made their first contribution in #180
- @monologg made their first contribution in #183
- @TevenLeScao made their first contribution in #187
- @MrZilinXiao made their first contribution in #199
- @Chris-hughes10 made their first contribution in #204
- @s-kumano made their first contribution in #206
- @yuxinyuan made their first contribution in #224
- @jaketae made their first contribution in #234
- @jmhessel made their first contribution in #246
- @ikergarcia1996 made their first contribution in #247
- @ZhiyuanChen made their first contribution in #238
- @parameter-concern made their first contribution in #258
- @Scitator made their first contribution in #249
- @muellerzr made their first contribution in #255
- @BenjaminBossan made their first contribution in #274
- @johnnv1 made their first contribution in #280
Full Changelog: v0.5.1...v0.6.0