v0.8.0: Big model inference

Big model inference

To handle very large models, new functionality has been added in Accelerate:

a context manager to initalize empty models
a function to load a sharded checkpoint directly on the right devices
a set of custom hooks that allow execution of a model split on different devices, as well as CPU or disk offload
a magic method that auto-determines a device map for a given model, maximizing the GPU spaces, available RAM before using disk offload as a last resort.
a function that wraps the last three blocks in one simple call (load_checkpoint_and_dispatch)

See more in the documentation

Create peak_memory_uasge_tracker.py by @pacman100 in #336
Fixed a typo to enable running accelerate correctly by @Idodox in #339
Introduce multiprocess logger by @muellerzr in #337
Refactor utils into its own module by @muellerzr in #340
Improve num_processes question in CLI by @muellerzr in #343
Handle Manual Wrapping in FSDP. Minor fix of fsdp example. by @pacman100 in #342
Better prompt for number of training devices by @muellerzr in #344
Fix prompt for num_processes by @pacman100 in #347
Fix sample calculation in examples by @muellerzr in #352
Fixing metric eval in distributed setup by @pacman100 in #355
DeepSpeed and FSDP plugin support through script by @pacman100 in #356