Release v0.23.0: Model Memory Estimation tool, Breakpoint API, Multi-Node Notebook Launcher Support, and more! · huggingface/accelerate

Model Memory Estimator

A new model estimation tool to help calculate how much memory is needed for inference has been added. This does not download the pretrained weights, and utilizes init_empty_weights to stay memory efficient during the calculation.

Usage directions:

accelerate estimate-memory {model_name} --library {library_name} --dtypes fp16 int8

Or:

from accelerate.commands.estimate import estimate_command_parser, estimate_command, gather_data

parser = estimate_command_parser()
args = parser.parse_args(["bert-base-cased", "--dtypes", "float32"])
output = gather_data(args)

🤗 Hub is a first-class citizen

We've made the huggingface_hub library a first-class citizen of the framework! While this is mainly for the model estimation tool, this opens the doors for further integrations should they be wanted

`Accelerator` Enhancements:

gather_for_metrics will now also de-dupe for non-tensor objects. See #1937
mixed_precision="bf16" support on NPU devices. See #1949
New breakpoint API to help when dealing with trying to break from a condition on a single process. See #1940

Notebook Launcher Enhancements:

The notebook launcher now supports launching across multiple nodes! See #1913

FSDP Enhancements:

Activation checkpointing is now natively supported in the framework. See #1891
torch.compile support was fixed. See #1919

DeepSpeed Enhancements:

XPU/ccl support (#1827)
Easier gradient accumulation support, simply set gradient_accumulation_steps to "auto" in your deepspeed config, and Accelerate will use the one passed to Accelerator instead (#1901)
Support for custom schedulers and deepspeed optimizers (#1909)

What's Changed

Update release instructions by @sgugger in #1877
fix detach_hook by @SunMarc in #1880
Enable power users to bypass device_map="auto" training block by @muellerzr in #1881
Introduce model memory estimator by @muellerzr in #1876
Update with new url for explore by @muellerzr in #1884
Enable a token to be used by @muellerzr in #1886
Add doc on model memory usage by @muellerzr in #1887
Add hub as core dep by @muellerzr in #1885
update import of deepspeed integration from transformers by @pacman100 in #1894
Final nits on model util by @muellerzr in #1896
Fix nb launcher test by @muellerzr in #1899
Add FSDP activation checkpointing feature by @arde171 in #1891
Solve at least one failing test by @muellerzr in #1898
Deepspeed integration for XPU/ccl by @abhilash1910 in #1827
Add PR template by @muellerzr in #1906
deepspeed grad_acc_steps fixes by @pacman100 in #1901
Skip pypi transformers until release by @muellerzr in #1911
Fix docker images by @muellerzr in #1910
Use hosted CI runners for building docker images by @muellerzr in #1915
fix: add debug argument to sagemaker configuration by @maximegmd in #1904
improve help info when run accelerate config on npu by @statelesshz in #1895
support logging with mlflow in case of mlflow-skinny installed by @ghtaro in #1874
More CI fun - run all test parts always by @muellerzr in #1916
Expose auto in dataclass by @muellerzr in #1914
Add support for deepspeed optimizer and custom scheduler by @pacman100 in #1909
reduce gradient first for XLA when unscaling the gradients in mixed precision training with AMP. by @statelesshz in #1926
Check for invalid keys by @muellerzr in #1935
clean num devices by @SunMarc in #1936
Bring back pypi to runners by @muellerzr in #1939
Support multi-node notebook launching by @ggaaooppeenngg in #1913
fix the fsdp docs by @pacman100 in #1947
Fix docs by @ggaaooppeenngg in #1951
Protect tensorflow dependency by @SunMarc in #1959
fix safetensor saving by @SunMarc in #1954
FIX: patch_environment restores pre-existing environment variables when finished by @BenjaminBossan in #1960
Better guards for slow imports by @muellerzr in #1963
[Tests] Finish all todos by @younesbelkada in #1957
Rm strtobool by @muellerzr in #1964
Implementing gather_for_metrics with dedup for non tensor objects by @Lorenzobattistela in #1937
add bf16 mixed precision support for NPU by @statelesshz in #1949
Introduce breakpoint API by @muellerzr in #1940
fix torch compile with FSDP by @pacman100 in #1919
Add force_hooks to dispatch_model by @austinapatel in #1969
update FSDP and DeepSpeed docs by @pacman100 in #1973
Flex fix patch for accelerate by @abhilash1910 in #1972
Remove checkpoints only on main process by @Kepnu4 in #1974

New Contributors

@arde171 made their first contribution in #1891
@maximegmd made their first contribution in #1904
@ghtaro made their first contribution in #1874
@ggaaooppeenngg made their first contribution in #1913
@Lorenzobattistela made their first contribution in #1937
@austinapatel made their first contribution in #1969
@Kepnu4 made their first contribution in #1974

Full Changelog: v0.22.0...v0.23.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.23.0: Model Memory Estimation tool, Breakpoint API, Multi-Node Notebook Launcher Support, and more!

Model Memory Estimator

🤗 Hub is a first-class citizen

`Accelerator` Enhancements:

Notebook Launcher Enhancements:

FSDP Enhancements:

DeepSpeed Enhancements:

What's Changed

New Contributors

Contributors

v0.23.0: Model Memory Estimation tool, Breakpoint API, Multi-Node Notebook Launcher Support, and more!

Model Memory Estimator

🤗 Hub is a first-class citizen

Accelerator Enhancements:

Notebook Launcher Enhancements:

FSDP Enhancements:

DeepSpeed Enhancements:

What's Changed

New Contributors

Contributors

`Accelerator` Enhancements: