Skip to content

Troubleshooting

ArrowM edited this page Mar 18, 2023 · 3 revisions

OOM

To get the most out of your training a card with at least 12GB of VRAM is reccomended.
Supported currently are only 10GB and higher VRAM GPUs

Low VRAM

Settings known to use more VRAM

  • High Batch Size
  • Set Gradients to None When Zeroing
  • Use EMA
  • Full Precision
  • Default Memory attention
  • Cache Latents
  • Text Encoder

Settings that lowers VRAM

  • Low Batch Size
  • Gradient Checkpointing
  • fp16/bf16 precision
  • xformers/flash_attention
  • Step Ratio of Text Encoder Training 0 (no text encoder)

Overtraining

WIP

Debugging

Here's a bunch of random stuff I added that seemed useful, but didn't seem to fit anywhere else.
Preview Prompts - Return a JSON string of the prompts that will be used for training. It's not pretty, but you can tell if things are going to work right.
Generate Sample Image - Generate a sample using the specified seed and prompt below.
Sample Prompt - What the sample should be.
Sample Seed - The seed to use for your sample. Leave at -1 to use a random seed.
Train Imagic Only - Imagic is basically dreambooth, but uses only one image and is significantly faster.
If using Imagic, the first image in the first concept's Instance Data Dir will be used for training.
See https://github.com/ShivamShrirao/diffusers/tree/main/examples/imagic for more details.

Batching and Grad

Batch size

Batch size increases speed, but requires more VRAM
Higher batch size might need a higher learning rate

Grad Accumulation

Grad size 3, should be on paper, similar to batch 3
Grad 3 batch 1, will do 3 batches of size 1 but only apply the learning at the end of the 3 iteration.
It will be the same speed as batch 1, but should have the training result of batch 3
So grad 3 batch 1 has an equivalent batch size of 3, training wise

Equivalent Batch Size

Grad 3 batch 2 => equivalent batch size 6
Gradient accumulation allows to replicate the results of high batch sizes(think of 48+ GB graphic card) on low VRAM environment.
The trade off is speed.
!!You want the equivalent batch size to be able to divide the training images and leave no remainder!!
For example, for 77 images with no class images your only batch options are

Batch Size Grad Size Equivalent
1 1 1
1 7 7
1 11 11
1 77 77
7 1 7
7 11 77
11 1 11
11 7 77
77 1 77

Batch size suggestions

If speed is the main focus and VRAM is plenty, go for the highest batch size you are able to run(leaving no remainder).

High Batch Size

Training at high batch sizes (or equivalent) will produce a training that assimilates the features of the instance images more deeply.
This is good for style but might results on weird generations.
elephant at equivalent batch size 150(15*10) trained at 1.5e-4 (NO TENC)(150 captioned images, no class images used)
image

Low Batch Size

Low batch size (or equivalent) will produce images that usually maintain a higher integrity.
This is good when training on pictures of yourself, or on a specific object.