Skip to content
This repository has been archived by the owner on Oct 22, 2023. It is now read-only.

ZeroDivisionError: division by zero #95

Open
NoteToSelfFindGoodNickname opened this issue Feb 1, 2023 · 1 comment
Open

ZeroDivisionError: division by zero #95

NoteToSelfFindGoodNickname opened this issue Feb 1, 2023 · 1 comment

Comments

@NoteToSelfFindGoodNickname

Environment name is set as "ST" as per environment.yaml
anaconda3/miniconda3 detected in C:\Users\tomwe\miniconda3
Starting conda environment "ST" from C:\Users\tomwe\miniconda3
warning: redirecting to https://github.com/devilismyfriend/StableTuner.git/
Latest git hash: ef51982

(ST) C:\Users\tomwe\st4>accelerate "launch" "--mixed_precision=fp16" "scripts/trainer.py" "--attention=xformers" "--model_variant=base" "--normalize_masked_area_loss" "--unmasked_probability=0.0" "--max_denoising_strength=1.0" "--disable_cudnn_benchmark" "--use_text_files_as_captions" "--sample_step_interval=50" "--pretrained_model_name_or_path=stabilityai/stable-diffusion-2-1-base" "--pretrained_vae_name_or_path=" "--output_dir=models/iconex" "--seed=3434554" "--resolution=512" "--train_batch_size=24" "--num_train_epochs=100" "--mixed_precision=fp16" "--use_bucketing" "--aspect_mode=dynamic" "--aspect_mode_action_preference=add" "--use_8bit_adam" "--gradient_checkpointing" "--gradient_accumulation_steps=1" "--learning_rate=3e-6" "--lr_warmup_steps=0" "--lr_scheduler=constant" "--train_text_encoder" "--concepts_list=stabletune_concept_list.json" "--num_class_images=200" "--save_every_n_epoch=5" "--n_save_sample=2" "--sample_height=512" "--sample_width=512" "--dataset_repeats=1" "--add_sample_prompt=an apple by iconex" "--sample_on_training_start"
The following values were not passed to accelerate launch and had defaults used instead:
--num_processes was set to a value of 1
--num_machines was set to a value of 1
--dynamo_backend was set to a value of 'no'
To avoid this warning pass in values for each of the problematic parameters or run accelerate config.
Booting Up StableTuner
Please wait a moment as we load up some stuff...
You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
For effortless bug reporting copy-paste your error into this form: https://docs.google.com/forms/d/e/1FAIpQLScPB8emS3Thkp66nvqwmjTEgxp8Y9ufuWTzFyr9kJ5AoI47dQ/viewform?usp=sf_link

CUDA SETUP: Loading binary C:\Users\tomwe\miniconda3\envs\ST\lib\site-packages\bitsandbytes\libbitsandbytes_cuda116.dll...
C:\Users\tomwe\miniconda3\envs\ST\lib\site-packages\diffusers\configuration_utils.py:195: FutureWarning: It is deprecated to pass a pretrained model name or path to from_config.If you were trying to load a scheduler, please use <class 'diffusers.schedulers.scheduling_ddpm.DDPMScheduler'>.from_pretrained(...) instead. Otherwise, please make sure to pass a configuration dictionary instead. This functionality will be removed in v1.0.0.
deprecate("config-passed-as-path", "1.0.0", deprecation_message, standard_warn=False)
Creating Auto Bucketing Dataloader
Rounded resolution to: 512
Preloading images...
** Processing C:/Users/tomwe/Desktop/auswahl: 100%|█████████████████████████████| 165/165 [00:00<00:00, 10560.81it/s]
** Number of buckets: 1
** Bucket (512, 512) found 35 images, will drop 11 images due to batch size 24
Number of image-caption pairs: 24

** Validation Set: val, steps: 1, repeats: 1

Loading Latent Cache from models\iconex\logs\latent_cache
Latents are ready.
Traceback (most recent call last):
File "C:\Users\tomwe\st4\scripts\trainer.py", line 2902, in
main()
File "C:\Users\tomwe\st4\scripts\trainer.py", line 2216, in main
args.num_train_epochs = math.ceil(args.max_train_steps / num_update_steps_per_epoch)
ZeroDivisionError: division by zero
Traceback (most recent call last):
File "C:\Users\tomwe\miniconda3\envs\ST\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\tomwe\miniconda3\envs\ST\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "C:\Users\tomwe\miniconda3\envs\ST\Scripts\accelerate.exe_main
.py", line 7, in
File "C:\Users\tomwe\miniconda3\envs\ST\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
args.func(args)
File "C:\Users\tomwe\miniconda3\envs\ST\lib\site-packages\accelerate\commands\launch.py", line 1104, in launch_command
simple_launcher(args)
File "C:\Users\tomwe\miniconda3\envs\ST\lib\site-packages\accelerate\commands\launch.py", line 567, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['C:\Users\tomwe\miniconda3\envs\ST\python.exe', 'scripts/trainer.py', '--attention=xformers', '--model_variant=base', '--normalize_masked_area_loss', '--unmasked_probability=0.0', '--max_denoising_strength=1.0', '--disable_cudnn_benchmark', '--use_text_files_as_captions', '--sample_step_interval=50', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-2-1-base', '--pretrained_vae_name_or_path=', '--output_dir=models/iconex', '--seed=3434554', '--resolution=512', '--train_batch_size=24', '--num_train_epochs=100', '--mixed_precision=fp16', '--use_bucketing', '--aspect_mode=dynamic', '--aspect_mode_action_preference=add', '--use_8bit_adam', '--gradient_checkpointing', '--gradient_accumulation_steps=1', '--learning_rate=3e-6', '--lr_warmup_steps=0', '--lr_scheduler=constant', '--train_text_encoder', '--concepts_list=stabletune_concept_list.json', '--num_class_images=200', '--save_every_n_epoch=5', '--n_save_sample=2', '--sample_height=512', '--sample_width=512', '--dataset_repeats=1', '--add_sample_prompt=an apple by iconex', '--sample_on_training_start']' returned non-zero exit status 1

@NoteToSelfFindGoodNickname
Copy link
Author

There seems to be an error where you recalculate when images were dropped due to batch size.
For example, I had 29 images, but batch size was 24.
5 images were dropped.
Once I changed the batch size to 29, the error was gone.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant