Merge pull request #9 from Linaqruf/experimental

Publish Kohya Trainer V8
Linaqruf · Dec 16, 2022 · ae2149d · ae2149d
2 parents b422fdb + c897f87
commit ae2149d
Show file tree

Hide file tree

Showing 23 changed files with 6,027 additions and 2,218 deletions.
diff --git a/README.md b/README.md
@@ -1,4 +1,4 @@
-# Kohya Trainer V6 - VRAM 12GB
+# Kohya Trainer V8 - VRAM 12GB
 ### The Best Way for People Without Good GPUs to Fine-Tune the Stable Diffusion Model
 
 This notebook has been adapted for use in Google Colab based on the [Kohya Guide](https://note.com/kohya_ss/n/nbf7ce8d80f29#c9d7ee61-5779-4436-b4e6-9053741c46bb). </br>
@@ -13,7 +13,33 @@ You can find the latest update to the notebook [here](https://github.com/Linaqru
 - By default, does not train Text Encoder for fine tuning of the entire model, but option to train Text Encoder is available.
 - Ability to make learning even more flexible than with DreamBooth by preparing a certain number of images (several hundred or more seems to be desirable).
 
+## Run locally 
+Please refer to [bmaltais's repo](https://github.com/bmaltais) if you want to run it locally on your terminal
+- bmaltais's [kohya_ss](https://github.com/bmaltais/kohya_ss) (dreambooth)
+- bmaltais's [kohya_diffusers_fine_tuning](https://github.com/bmaltais/kohya_diffusers_fine_tuning) 
+- bmaltais's [kohya_diffusion](https://github.com/bmaltais/kohya_diffusion) (gen_img_diffusers)
+
+## Original post for each dedicated script:
+- [gen_img_diffusers](https://note.com/kohya_ss/n/n2693183a798e)
+- [merge_vae](https://note.com/kohya_ss/n/nf5893a2e719c)
+- [convert_diffusers20_original_sd](https://note.com/kohya_ss/n/n374f316fe4ad)
+- [detect_face_rotate](https://note.com/kohya_ss/n/nad3bce9a3622)
+- [diffusers_fine_tuning](https://note.com/kohya_ss/n/nbf7ce8d80f29)
+- [train_db_fixed](https://note.com/kohya_ss/n/nee3ed1649fb6)
+- [merge_block_weighted](https://note.com/kohya_ss/n/n9a485a066d5b)
+
 ## Change Logs:
+
+##### v8 (13/12):
+- Added support for training with fp16 gradients (experimental feature). This allows training with 8GB VRAM on SD1.x. See "Training with fp16 gradients (experimental feature)" for details.
+- Updated WD14Tagger script to automatically download weights.
+
+##### v7 (7/12):
+- Requires Diffusers 0.10.2 (0.10.0 or later will work, but there are reported issues with 0.10.0 so we recommend using 0.10.2). To update, run `pip install -U diffusers[torch]==0.10.2` in your virtual environment.
+- Added support for Diffusers 0.10 (uses code in Diffusers for `v-parameterization` training and also supports `safetensors`).
+- Added support for accelerate 0.15.0.
+- Added support for multiple teacher data folders. For caption and tag preprocessing, use the `--full_path` option. The arguments for the cleaning script have also changed, see "Caption and Tag Preprocessing" for details.
+
 ##### v6 (6/12):
 - Temporary fix for an error when saving in the .safetensors format with some models. If you experienced this error with v5, please try v6.
 
@@ -44,6 +70,14 @@ You can find the latest update to the notebook [here](https://github.com/Linaqru
 - Fixed a bug that caused data to be shuffled twice.
 - Corrected spelling mistakes in the options for each script.
 
+## Conclusion
+> While Stable Diffusion fine tuning is typically based on CompVis, using Diffusers as a base allows for efficient and fast fine tuning with less memory usage. We have also added support for the features proposed by Novel AI, so we hope this article will be useful for those who want to fine tune their models.
+
+ — kohya_ss 
+
 ## Credit
-[Kohya](https://twitter.com/kohya_ss) | Just for my part
+[Kohya](https://twitter.com/kohya_ss) | [Lopho](https://github.com/lopho/stable-diffusion-prune) for prune script | Just for my part
+
+
+
 
diff --git a/convert_diffusers20_original_sd/convert_diffusers20_original_sd.py b/convert_diffusers20_original_sd/convert_diffusers20_original_sd.py
@@ -0,0 +1,93 @@
+# convert Diffusers v1.x/v2.0 model to original Stable Diffusion
+# v1: initial version
+# v2: support safetensors
+# v3: fix to support another format
+# v4: support safetensors in Diffusers
+
+import argparse
+import os
+import torch
+from diffusers import StableDiffusionPipeline
+
+import model_util
+
+
+def convert(args):
+  # 引数を確認する
+  load_dtype = torch.float16 if args.fp16 else None
+
+  save_dtype = None
+  if args.fp16:
+    save_dtype = torch.float16
+  elif args.bf16:
+    save_dtype = torch.bfloat16
+  elif args.float:
+    save_dtype = torch.float
+
+  is_load_ckpt = os.path.isfile(args.model_to_load)
+  is_save_ckpt = len(os.path.splitext(args.model_to_save)[1]) > 0
+
+  assert not is_load_ckpt or args.v1 != args.v2, f"v1 or v2 is required to load checkpoint / checkpointの読み込みにはv1/v2指定が必要です"
+  assert is_save_ckpt or args.reference_model is not None, f"reference model is required to save as Diffusers / Diffusers形式での保存には参照モデルが必要です"
+
+  # モデルを読み込む
+  msg = "checkpoint" if is_load_ckpt else ("Diffusers" + (" as fp16" if args.fp16 else ""))
+  print(f"loading {msg}: {args.model_to_load}")
+
+  if is_load_ckpt:
+    v2_model = args.v2
+    text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint(v2_model, args.model_to_load)
+  else:
+    pipe = StableDiffusionPipeline.from_pretrained(args.model_to_load, torch_dtype=load_dtype, tokenizer=None, safety_checker=None)
+    text_encoder = pipe.text_encoder
+    vae = pipe.vae
+    unet = pipe.unet
+
+    if args.v1 == args.v2:
+      # 自動判定する
+      v2_model = unet.config.cross_attention_dim == 1024
+      print("checking model version: model is " + ('v2' if v2_model else 'v1'))
+    else:
+      v2_model = args.v1
+
+  # 変換して保存する
+  msg = ("checkpoint" + ("" if save_dtype is None else f" in {save_dtype}")) if is_save_ckpt else "Diffusers"
+  print(f"converting and saving as {msg}: {args.model_to_save}")
+
+  if is_save_ckpt:
+    original_model = args.model_to_load if is_load_ckpt else None
+    key_count = model_util.save_stable_diffusion_checkpoint(v2_model, args.model_to_save, text_encoder, unet,
+                                                            original_model, args.epoch, args.global_step, save_dtype, vae)
+    print(f"model saved. total converted state_dict keys: {key_count}")
+  else:
+    print(f"copy scheduler/tokenizer config from: {args.reference_model}")
+    model_util.save_diffusers_checkpoint(v2_model, args.model_to_save, text_encoder, unet, args.reference_model, vae, args.use_safetensors)
+    print(f"model saved.")
+
+
+if __name__ == '__main__':
+  parser = argparse.ArgumentParser()
+  parser.add_argument("--v1", action='store_true',
+                      help='load v1.x model (v1 or v2 is required to load checkpoint) / 1.xのモデルを読み込む')
+  parser.add_argument("--v2", action='store_true',
+                      help='load v2.0 model (v1 or v2 is required to load checkpoint) / 2.0のモデルを読み込む')
+  parser.add_argument("--fp16", action='store_true',
+                      help='load as fp16 (Diffusers only) and save as fp16 (checkpoint only) / fp16形式で読み込み（Diffusers形式のみ対応）、保存する（checkpointのみ対応）')
+  parser.add_argument("--bf16", action='store_true', help='save as bf16 (checkpoint only) / bf16形式で保存する（checkpointのみ対応）')
+  parser.add_argument("--float", action='store_true',
+                      help='save as float (checkpoint only) / float(float32)形式で保存する（checkpointのみ対応）')
+  parser.add_argument("--epoch", type=int, default=0, help='epoch to write to checkpoint / checkpointに記録するepoch数の値')
+  parser.add_argument("--global_step", type=int, default=0,
+                      help='global_step to write to checkpoint / checkpointに記録するglobal_stepの値')
+  parser.add_argument("--reference_model", type=str, default=None,
+                      help="reference model for schduler/tokenizer, required in saving Diffusers, copy schduler/tokenizer from this / scheduler/tokenizerのコピー元のDiffusersモデル、Diffusers形式で保存するときに必要")
+  parser.add_argument("--use_safetensors", action='store_true',
+                      help="use safetensors format to save Diffusers model (checkpoint depends on the file extension) / Duffusersモデルをsafetensors形式で保存する（checkpointは拡張子で自動判定）")
+
+  parser.add_argument("model_to_load", type=str, default=None,
+                      help="model to load: checkpoint file or Diffusers model's directory / 読み込むモデル、checkpointかDiffusers形式モデルのディレクトリ")
+  parser.add_argument("model_to_save", type=str, default=None,
+                      help="model to save: checkpoint (with extension) or Diffusers model's directory (without extension) / 変換後のモデル、拡張子がある場合はcheckpoint、ない場合はDiffusesモデルとして保存")
+
+  args = parser.parse_args()
+  convert(args)