Skip to content

v5.6.0rc1

Pre-release
Pre-release
Compare
Choose a tag to compare
@psychedelicious psychedelicious released this 07 Jan 09:30
· 189 commits to main since this release

This release brings a two major improvements to Invoke's memory management: partial model loading (aka Low-VRAM mode) and dynamic memory limits.

Memory Management Improvements

Thanks to @RyanJDick for designing and implementing these improved memory management features!

Partial Model Loading (Low-VRAM mode)

Invoke's previous "all or nothing" model loading strategy required your GPU to have enough VRAM to hold whole models during generation.

As a result, as image generation models increased in size and auxiliary models (e.g. ControlNet) became critical to workflows, Invoke's VRAM requirements have increased at the same rate. The increased VRAM requirements have prevent many of our users from running Invoke with the latest and greatest models.

Partial model loading allows Invoke to load only the parts of the model that are actively being used onto the GPU, substantially reducing Invoke's VRAM requirements.

  • Applies to systems with a CUDA device.
  • Enables large models to run with limited GPU VRAM (e.g. Full 24GB FLUX dev on an 8GB GPU)
  • When models are too large to fit on the GPU, they will be partially offloaded to RAM. The model weights are still streamed to the GPU for fast inference. Inference speed won't be as fast as when a model is fully loaded, but will be much faster than running on the CPU.
  • The recommended minimum CUDA GPU size is 8GB. An 8GB GPU should now be capable of running all models supported by Invoke (even the full 24GB FLUX models with ControlNet).
  • If there is sufficient demand, we could probably support 4GB cards in the future by moving the VAE decoding operation fully to the CPU.

Dynamic Memory Limits

Previously, the amount of RAM and VRAM used for model caching were set to hard limits. Now, the amount of RAM and VRAM used is adjusted dynamically based on what's available.

For most users, this will result in more effective use of their RAM/VRAM without having to tune configuration values.

Users can expect:

  • Faster average model load times on systems with extra memory
  • Fewer out-of-memory errors when combined with Partial Model Loading

Enabling Partial Model Loading and Dynamic Memory Limits

Partial Model Loading is disabled by default. To enable it, set enable_partial_loading: true in your invokeai.yaml:

enable_partial_loading: true

This is highly recommended for users with limited VRAM. Users with 24GB+ of VRAM may prefer to leave this option disabled to guarantee that models get fully-loaded and run at full speed.

Dynamic memory limits are enabled by default, but can be overridden by setting ram or vram in your invokeai.yaml.

# Override the dynamic cache limits to ram=6GB and vram=20GB.
ram: 6
vram: 20

🚨 Note: Users who previously set ram or vram in their invokeai.yaml will need to delete these overrides in order to benefit from the new dynamic memory limits.

All Changes

  • Added support for partial model loading.
  • Added support for dynamic memory limits.
  • Fixed issue where excessively long board names could cause performance issues.
  • Reworked error handling when installing models from a URL.
  • Fixed link to Scale setting's support docs.
  • Tidied some unused variables. Thanks @rikublock!
  • Added typegen check to CI pipeline. Thanks @rikublock!
  • Added stereogram nodes to Community Nodes docs. Thanks @simonfuhrmann!
  • Updated installation-related docs (quick start, manual install, dev install).

Installing and Updating

The new Invoke Launcher is the recommended way to install, update and run Invoke. It takes care of a lot of details for you - like installing the right version of python - and runs Invoke as a desktop application.

Follow the Quick Start guide to get started with the launcher.

If you already have the launcher, you can use it to update your existing install.

We've just updated the launcher to v1.2.0 with a handful of fixes. To update the launcher itself, download the latest version from the quick start guide - the download links are kept up to date.

Legacy Scripts (not recommended!)

We recommend using the launcher, as described in the previous section!

To install or update with the outdated legacy scripts 😱, download the latest legacy scripts and follow the legacy scripts instructions.

What's Changed

New Contributors

Full Changelog: v5.5.0...v5.6.0rc1