Skip to content

Releases: huggingface/accelerate

v0.7.1 Patch release

29 Apr 13:16
Compare
Choose a tag to compare

v0.7.1 Patch release

  • Fix fdsp config in cluster 331
  • Add guards for batch size finder 334
  • Patchfix infinite loop 335

v0.7.0: Logging API, FSDP, batch size finder and examples revamp

28 Apr 17:14
Compare
Choose a tag to compare

v0.7.0: Logging API, FSDP, batch size finder and examples revamp

Logging API

Use any of your favorite logging libraries (TensorBoard, Wandb, CometML...) with just a few lines of code inside your training scripts with Accelerate. All details are in the documentation.

Support for FSDP (fully sharded DataParallel)

PyTorch recently released a new model wrapper for sharded DDP training called FSDP. This release adds support for it (note that it doesn't work with mixed precision yet). See all caveats in the documentation.

Batch size finder

Say goodbye to the CUDA OOM errors with the new find_executable_batch_size decorator. Just decorate your training function and pick a starting batch size, then let Accelerate do the rest.

  • Add a memory-aware decorator for CUDA OOM avoidance by @muellerzr in #324

Examples revamp

The Accelerate examples are now split in two: you can find in the base folder a very simple nlp and computer vision examples, as well as complete versions incorporating all features. But you can also browse the examples in the by_feature subfolder, which will show you exactly what code to add for each given feature (checkpointing, tracking, cross-validation etc.)

What's Changed

New Contributors

Full Changelog: v0.6.0...v0.7.0

v0.6.2: Fix launcher with mixed precision

31 Mar 13:28
Compare
Choose a tag to compare

The launcher was ignoring the mixed precision attribute of the config since v0.6.0. This patch fixes that.

v0.6.1: Hot fix

18 Mar 21:47
Compare
Choose a tag to compare

Patches an issue with mixed precision (see #286)

v0.6.0: Checkpointing and bfloat16 support

18 Mar 13:47
Compare
Choose a tag to compare

This release adds support for bloat16 mixed precision training (requires PyTorch >= 1.10) and a brand-new checkpoint utility to help with resuming interrupted trainings. We also get a completely revamped documentation frontend.

Checkpoints

Save the current state of all your objects (models, optimizers, RNG states) with accelerator.save_state(path_to_checkpoint) and reload everything by calling accelerator.load_state(path_to_checkpoint)

BFloat16 support

Accelerate now supports bfloat16 mixed precision training. As a result the old --fp16 argument has been deprecated to be replaced by the more generic --mixed-precision.

New env subcommand

You can now type accelerate env to have a copy-pastable summary of your environment and default configuration. Very convenient when opening a new issue!

New doc frontend

The documentation has been switched to the new Hugging Face frontend, like Transformers and Datasets.

  • Convert documentation to the new front by @sgugger in #271

What's Changed

New Contributors

Full Changelog: v0.5.1...v0.6.0

v0.5.1: Patch release

27 Sep 15:05
Compare
Choose a tag to compare

v0.5.1: Patch release

Fix the two following bugs:

  • convert_to_fp32 returned booleans instead of tensors #173
  • wrong dataloader lenght when dispatch_batches=True #175

v0.5.0 Dispatch batches from main DataLoader

23 Sep 14:38
Compare
Choose a tag to compare

v0.5.0 Dispatch batches from main DataLoader

This release introduces support for iterating through a DataLoader only on the main process, that then dispatches the batches to all processes.

Dispatch batches from main DataLoader

The motivation behind this come from dataset streaming which introduces two difficulties:

  • there might be some timeouts for some elements of the dataset, which might then be different in each process launched, thus it's impossible to make sure the data is iterated though the same way on each process
  • when using IterableDataset, each process goes through the dataset, thus applies the preprocessing on all elements. This can yield to the training being slowed down by this preprocessing.

This new feature is activated by default for all IterableDataset.

Various fixes

v0.4.0 Experimental DeepSpeed and multi-node CPU support

10 Aug 09:46
Compare
Choose a tag to compare

v0.4.0 Experimental DeepSpeed support

This release adds support for DeepSpeed. While the basics are there to support ZeRO-2, ZeRo-3, as well a CPU and NVME offload, the API might evolve a little bit as we polish it in the near future.

It also adds support for multi-node CPU. In both cases, just filling the questionnaire outputted by accelerate config and then launching your script with accelerate launch is enough, there are no changes in the main API.

DeepSpeed support

  • Add DeepSpeed support #82 (@vasudevgupta7)
  • DeepSpeed documentation #140 (@sgugger)

Multinode CPU support

  • Add distributed multi-node cpu only support (MULTI_CPU) #63 (@ddkalamk)

Various fixes

v0.3.0 Notebook launcher and multi-node training

29 Apr 15:45
Compare
Choose a tag to compare

v0.3.0 Notebook launcher and multi-node training

Notebook launcher

After doing all the data preprocessing in your notebook, you can launch your training loop using the new notebook_launcher functionality. This is especially useful for Colab or Kaggle with TPUs! Here is an example on Colab (don't forget to select a TPU runtime).

This launcher also works if you have multiple GPUs on your machine. You just have to pass along num_processes=your_number_of_gpus in the call to notebook_launcher.

Multi-node training

Our multi-node training test setup was flawed and the previous releases of 🤗 Accelerate were not working for multi-node distributed training. This is all fixed now and we have ensured to have more robust tests!

Various bug fixes

v0.2.1: Patch release

19 Apr 17:32
Compare
Choose a tag to compare

Fix a bug preventing the load of a config with accelerate launch