v0.23.0: Model Memory Estimation tool, Breakpoint API, Multi-Node Notebook Launcher Support, and more!
Model Memory Estimator
A new model estimation tool to help calculate how much memory is needed for inference has been added. This does not download the pretrained weights, and utilizes init_empty_weights
to stay memory efficient during the calculation.
Usage directions:
accelerate estimate-memory {model_name} --library {library_name} --dtypes fp16 int8
Or:
from accelerate.commands.estimate import estimate_command_parser, estimate_command, gather_data
parser = estimate_command_parser()
args = parser.parse_args(["bert-base-cased", "--dtypes", "float32"])
output = gather_data(args)
🤗 Hub is a first-class citizen
We've made the huggingface_hub
library a first-class citizen of the framework! While this is mainly for the model estimation tool, this opens the doors for further integrations should they be wanted
Accelerator
Enhancements:
gather_for_metrics
will now also de-dupe for non-tensor objects. See #1937mixed_precision="bf16"
support on NPU devices. See #1949- New
breakpoint
API to help when dealing with trying to break from a condition on a single process. See #1940
Notebook Launcher Enhancements:
- The notebook launcher now supports launching across multiple nodes! See #1913
FSDP Enhancements:
- Activation checkpointing is now natively supported in the framework. See #1891
torch.compile
support was fixed. See #1919
DeepSpeed Enhancements:
- XPU/ccl support (#1827)
- Easier gradient accumulation support, simply set
gradient_accumulation_steps
to"auto"
in your deepspeed config, and Accelerate will use the one passed toAccelerator
instead (#1901) - Support for custom schedulers and deepspeed optimizers (#1909)
What's Changed
- Update release instructions by @sgugger in #1877
- fix detach_hook by @SunMarc in #1880
- Enable power users to bypass device_map="auto" training block by @muellerzr in #1881
- Introduce model memory estimator by @muellerzr in #1876
- Update with new url for explore by @muellerzr in #1884
- Enable a token to be used by @muellerzr in #1886
- Add doc on model memory usage by @muellerzr in #1887
- Add hub as core dep by @muellerzr in #1885
- update import of deepspeed integration from transformers by @pacman100 in #1894
- Final nits on model util by @muellerzr in #1896
- Fix nb launcher test by @muellerzr in #1899
- Add FSDP activation checkpointing feature by @arde171 in #1891
- Solve at least one failing test by @muellerzr in #1898
- Deepspeed integration for XPU/ccl by @abhilash1910 in #1827
- Add PR template by @muellerzr in #1906
- deepspeed grad_acc_steps fixes by @pacman100 in #1901
- Skip pypi transformers until release by @muellerzr in #1911
- Fix docker images by @muellerzr in #1910
- Use hosted CI runners for building docker images by @muellerzr in #1915
- fix: add debug argument to sagemaker configuration by @maximegmd in #1904
- improve help info when run
accelerate config
on npu by @statelesshz in #1895 - support logging with mlflow in case of mlflow-skinny installed by @ghtaro in #1874
- More CI fun - run all test parts always by @muellerzr in #1916
- Expose auto in dataclass by @muellerzr in #1914
- Add support for deepspeed optimizer and custom scheduler by @pacman100 in #1909
- reduce gradient first for XLA when unscaling the gradients in mixed precision training with AMP. by @statelesshz in #1926
- Check for invalid keys by @muellerzr in #1935
- clean num devices by @SunMarc in #1936
- Bring back pypi to runners by @muellerzr in #1939
- Support multi-node notebook launching by @ggaaooppeenngg in #1913
- fix the fsdp docs by @pacman100 in #1947
- Fix docs by @ggaaooppeenngg in #1951
- Protect tensorflow dependency by @SunMarc in #1959
- fix safetensor saving by @SunMarc in #1954
- FIX: patch_environment restores pre-existing environment variables when finished by @BenjaminBossan in #1960
- Better guards for slow imports by @muellerzr in #1963
- [
Tests
] Finish all todos by @younesbelkada in #1957 - Rm strtobool by @muellerzr in #1964
- Implementing gather_for_metrics with dedup for non tensor objects by @Lorenzobattistela in #1937
- add bf16 mixed precision support for NPU by @statelesshz in #1949
- Introduce breakpoint API by @muellerzr in #1940
- fix torch compile with FSDP by @pacman100 in #1919
- Add
force_hooks
todispatch_model
by @austinapatel in #1969 - update FSDP and DeepSpeed docs by @pacman100 in #1973
- Flex fix patch for accelerate by @abhilash1910 in #1972
- Remove checkpoints only on main process by @Kepnu4 in #1974
New Contributors
- @arde171 made their first contribution in #1891
- @maximegmd made their first contribution in #1904
- @ghtaro made their first contribution in #1874
- @ggaaooppeenngg made their first contribution in #1913
- @Lorenzobattistela made their first contribution in #1937
- @austinapatel made their first contribution in #1969
- @Kepnu4 made their first contribution in #1974
Full Changelog: v0.22.0...v0.23.0