Skip to content

Releases: huggingface/accelerate

v0.17.1: Patch release

13 Mar 21:02
c266cf0
Compare
Choose a tag to compare

v0.17.0: PyTorch 2.0 support, Process Control Enhancements, TPU pod support and FP8 mixed precision training

09 Mar 18:22
1a63f7d
Compare
Choose a tag to compare

PyTorch 2.0 support

This release fully supports the upcoming PyTorch 2.0 release. You can choose to use torch.compile or not and then customize the options in accelerate.config or via a TorchDynamoPlugin.

Process Control Enhancements

This release adds a new PartialState, which contains most of the capabilities of the AcceleratorState however it is designed to be used by the user to assist in any process control mechanisms around it. With this, users also now do not need to have if accelerator.state.is_main_process when utilizing classes such as the Tracking API, as these now will automatically use only the main process for their work by default.

  • Refactor process executors to be in AcceleratorState by @muellerzr in #1039

TPU Pod Support (Experimental)

Launching from TPU pods is now supported, please see this issue for more information

FP8 mixed precision training (Experimental)

This release adds experimental support for FP8 mixed precision training, which requires the transformer-engine library as well as a Hopper GPU (or higher).

What's new?

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @Yard1
    • Refactor launch for greater extensibility (#1123)

v0.16.0: Improved and Interactive Documentation, DataLoader Improvements

31 Jan 19:53
36beea9
Compare
Choose a tag to compare

New code exploration doc tool

A new interactive tool has been introduced to the documentation to help users quickly learn how to utilize features of the framework before providing more details on them as shown below:
image

Not only does it provide a code diff, but it also includes an explanation and links to more resources the user should check out to learn more:

image

Try it out today in the docs

Skip batches in dataloaders

When resuming training, you can more efficiently skip batches in your dataloader with the new skip_first_batches function (also available as a method on your Accelerator).

DeepSpeed integration enhancements:

A new ZeRO-3 init context manager is added to provide granular control to users in situations involving nested/multiple models. Refactoring of DeepSpeed Config file support to remove ambiguity between it and Accelerate config.

Adding support for auto entries in the DeeSpeed config file to be filled via the accelerate launch command. Try it out today by referring to the section Things to note when using DeepSpeed Config File

  • ds zero-3 init context manager by @pacman100 in #932
  • raise error for duplicate accelerate config values when using deepspeed_config_file by @pacman100 in #941

What's new?

v0.15.0: Pytorch 2.0 stack support

02 Dec 16:03
cf22df9
Compare
Choose a tag to compare

PyTorch 2.0 stack support

We are very excited by the newly announced PyTorch 2.0 stack and you can try it using Accelerate on any model by using the dynamo_backend argument of the Accelerator, or when filling your config with accelerate config.

Note that to get the best performance, we recommend:

  • using an Ampere GPU (or more recent)
  • sticking to fixed shaped for now

New CLI commands

  • Added two new commands, accelerate config update and accelerate config default. The first will update a config file to have the latest keys added from latter releases of Accelerate, and the second will create a default configuration file automatically mimicking write_default_config() introduced in #851 and #853 by @muellerzr
  • Also introduced a filterable help for accelerate launch which will show options relevant to the choices shown, such as accelerate launch --multi_gpu will show launch parameters relevant to multi-gpu training.

What's new?

Significant community contributions

The following contributors have made significant changes to the library over the last release:

v0.14.0: Megatron-LM integration and support for PyTorch 1.13

08 Nov 19:36
4e2c511
Compare
Choose a tag to compare

Megatron LM integration

Accelerate now supports Megatron-LM for the three model classes (BERT, GPT-2 and T5). You can learn more in the documentation.

  • Megatron-LM integration by @pacman100 in #667
  • ensure megatron is 2.2.0+ by @jeffra in #755
  • updating docs to use fork of megatron-lm and minor example/docs fix by @pacman100 in #766
  • adding support to return logits and generate for Megatron-LM GPT models by @pacman100 in #819

PyTorch 1.13 support

Fixes a bug that returned SIGKILL errors on Windows.

Kaggle support with the notebook_launcher

With Kaggle now giving instances with two T4 GPUs, Accelerate can leverage this to do multi-gpu training from the notebook

What's new?

Significant community contributions

The following contributors have made significant changes to the library over the last release:

v0.13.2 Patch release

17 Oct 15:13
8d0a3ee
Compare
Choose a tag to compare

v0.13.1 Patch release

07 Oct 16:33
0f3828a
Compare
Choose a tag to compare

v0.13.0 Launcher update (multinode and GPU selection) and mutliple bug fixes

05 Oct 18:47
a54cd0a
Compare
Choose a tag to compare

Better multinode support in the launcher

The accelerate command launch did not work well for distributed training using several machines. This is fixed in this version.

Launch training on specific GPUs only

Instead of prefixing your launch command with CUDA_VISIBLE_DEVICES=xxx you can now specify the GPUs you want to use in your Accelerate config.

Better tracebacks and rich support

The tracebacks are now cleaned up to avoid printing several times the same error, and rich is integrated as an optional dependency.

What's new?

v0.12.0 New doc, gather_for_metrics, balanced device map and M1 support

04 Aug 13:14
Compare
Choose a tag to compare

New documentation

The whole documentation has been revamped, just go look at it here!

New gather_for_metrics method

When doing distributed evaluation, the dataloader loops back at the beginning of the dataset to make batches that have a round multiple of the number of processes. This causes the predictions to be slightly bigger than the length of the dataset, which used to require some truncating. This is all done behind the scenes now if you replace the gather your did in evaluation by gather_for_metrics.

Balanced device maps

When loading big models for inference, device_map="auto" used to fill the GPUs sequentially, making it hard to use a batch size > 1. It now balances the weights evenly on the GPUs so if you have more GPU space than the model size, you can do predictions with a bigger batch size!

M1 GPU support

Accelerate now supports M1 GPUs, to learn more about how to setup your environment, see the documentation.

What's new?

Significant community contributions

The following contributors have made significant changes to the library over the last release:

  • @sywangyi
    • ccl version check and import different module according to version (#567)
    • set default num_cpu_threads_per_process to improve oob performance (#562)
    • fix some parameter setting does not work for CPU DDP and bf16 fail in… (#527)
  • @ZhiyuanChen
    • add on_main_process decorators (#488)

v0.11.0 Gradient Accumulation and SageMaker Data Parallelism

18 Jul 13:02
eebeb59
Compare
Choose a tag to compare

Gradient Accumulation

Accelerate now handles gradient accumulation if you want, just pass along gradient_accumulation_steps=xxx when instantiating the Accelerator and put all your training loop step under a with accelerator.accumulate(model):. Accelerate will then handle the loss re-scaling and gradient accumulation for you (avoiding slowdowns in distributed training when gradients only need to be synced when you want to step). More details in the documentation.

  • Add gradient accumulation doc by @muellerzr in #511
  • Make gradient accumulation work with dispatched dataloaders by @muellerzr in #510
  • Introduce automatic gradient accumulation wrapper + fix a few test issues by @muellerzr in #484

Support for SageMaker Data parallelism

Accelerate now support SageMaker specific brand of data parallelism.

  • SageMaker enhancements to allow custom docker image, input channels referring to s3/remote data locations and metrics logging by @pacman100 in #504
  • SageMaker DP Support by @pacman100 in #494

What's new?