Skip to content

v0.8.0: Big model inference

Compare
Choose a tag to compare
@sgugger sgugger released this 12 May 15:01
· 1364 commits to main since this release

v0.8.0: Big model inference

Big model inference

To handle very large models, new functionality has been added in Accelerate:

  • a context manager to initalize empty models
  • a function to load a sharded checkpoint directly on the right devices
  • a set of custom hooks that allow execution of a model split on different devices, as well as CPU or disk offload
  • a magic method that auto-determines a device map for a given model, maximizing the GPU spaces, available RAM before using disk offload as a last resort.
  • a function that wraps the last three blocks in one simple call (load_checkpoint_and_dispatch)

See more in the documentation

What's new