v0.8.0: Big model inference
v0.8.0: Big model inference
Big model inference
To handle very large models, new functionality has been added in Accelerate:
- a context manager to initalize empty models
- a function to load a sharded checkpoint directly on the right devices
- a set of custom hooks that allow execution of a model split on different devices, as well as CPU or disk offload
- a magic method that auto-determines a device map for a given model, maximizing the GPU spaces, available RAM before using disk offload as a last resort.
- a function that wraps the last three blocks in one simple call (
load_checkpoint_and_dispatch
)
See more in the documentation
What's new
- Create peak_memory_uasge_tracker.py by @pacman100 in #336
- Fixed a typo to enable running accelerate correctly by @Idodox in #339
- Introduce multiprocess logger by @muellerzr in #337
- Refactor utils into its own module by @muellerzr in #340
- Improve num_processes question in CLI by @muellerzr in #343
- Handle Manual Wrapping in FSDP. Minor fix of fsdp example. by @pacman100 in #342
- Better prompt for number of training devices by @muellerzr in #344
- Fix prompt for num_processes by @pacman100 in #347
- Fix sample calculation in examples by @muellerzr in #352
- Fixing metric eval in distributed setup by @pacman100 in #355
- DeepSpeed and FSDP plugin support through script by @pacman100 in #356