Skip to content

Releases: huggingface/accelerate

v0.4.0 Experimental DeepSpeed and multi-node CPU support

10 Aug 09:46
Compare
Choose a tag to compare

v0.4.0 Experimental DeepSpeed support

This release adds support for DeepSpeed. While the basics are there to support ZeRO-2, ZeRo-3, as well a CPU and NVME offload, the API might evolve a little bit as we polish it in the near future.

It also adds support for multi-node CPU. In both cases, just filling the questionnaire outputted by accelerate config and then launching your script with accelerate launch is enough, there are no changes in the main API.

DeepSpeed support

  • Add DeepSpeed support #82 (@vasudevgupta7)
  • DeepSpeed documentation #140 (@sgugger)

Multinode CPU support

  • Add distributed multi-node cpu only support (MULTI_CPU) #63 (@ddkalamk)

Various fixes

v0.3.0 Notebook launcher and multi-node training

29 Apr 15:45
Compare
Choose a tag to compare

v0.3.0 Notebook launcher and multi-node training

Notebook launcher

After doing all the data preprocessing in your notebook, you can launch your training loop using the new notebook_launcher functionality. This is especially useful for Colab or Kaggle with TPUs! Here is an example on Colab (don't forget to select a TPU runtime).

This launcher also works if you have multiple GPUs on your machine. You just have to pass along num_processes=your_number_of_gpus in the call to notebook_launcher.

Multi-node training

Our multi-node training test setup was flawed and the previous releases of 🤗 Accelerate were not working for multi-node distributed training. This is all fixed now and we have ensured to have more robust tests!

Various bug fixes

v0.2.1: Patch release

19 Apr 17:32
Compare
Choose a tag to compare

Fix a bug preventing the load of a config with accelerate launch

v0.2.0 SageMaker launcher

15 Apr 16:00
Compare
Choose a tag to compare

v0.2.0 SageMaker launcher

SageMaker launcher

It's now possible to launch your training script on AWS instances using SageMaker via accelerate launch.

Kwargs handlers

To customize how the different objects used for mixed precision or distributed training are instantiated, a new API called KwargsHandler is added. This allows the user to pass along the kwargs that will be passed to those objects if used (and it is ignored if those are not used in the current setup, so the script can still run on any kind of setup).

Pad across processes

Trying to gather tensors that are not of the same size across processes resulted in a process hang, a new method Accelerator.pad_across_processes has been added to help with that.

  • Add utility to pad tensor across processes to max length #19 (@sgugger )

Various bug fixes

v0.1.0 Initial release

05 Mar 21:59
Compare
Choose a tag to compare

Initial release of 🤗 Accelerate. Checkout the main README or the docs to learn more about it!