big update

NeoGraph-K · NeoGraph-K · commit 9d95db0e3b19 · 2023-04-28T02:05:19.000+09:00
diff --git a/README.md b/README.md
@@ -1,42 +1,80 @@
 # sd-webui-ddsd
-A script that searches for specific keywords, inpaints them, and then upscales them
+자동으로 동작하는 후보정 작업 확장.
 
 ## What is
 ### Upscale
-Upscaling an image by a specific factor. Utilizes a tiled approach to scale with less memory
+이미지를 특정 크기로 잘라내어 타일별 업스케일을 하는 도구. 업스케일시 VRAM을 적게 소모.
+#### Upscale How to use
+1. 크기를 키울때 사용할 upscaler 모델 선택
+2. 크기를 키울 배수 선택
+3. 가로, 세로를 내가 단일로 생성할 수 있는 이미지의 최대 크기로 선택(이미지 생성 속도를 최대한 빠르게 하기 위하여)
+4. before running 체크
+    1. 체크시 업스케일을 먼저 돌려서 인페인팅의 퀄리티 상승. 단, 인페인팅시 더 많은 VRAM 요구
+5. 생성!
 ### Detect Detailer
-Inpainting with additional prompts after mask search with specific keywords. Add counts separated by semicolons
+특정 키워드로 이미지를 탐색 후 인페인팅하는 도구.
 #### Detect Detailer How to use
-0. Enable Inpaint Inner(or Outer) Mask Area(Use I2I Only)
-    1. When using the inpaint inner option, the mask is created only inside the inpaint mask.
-    2. When using the inpaint outer option, the mask is created only outside the inpaint mask.
-1. Input dino prompt
-    1. Inpaint the dino prompt multiple times, separated by tabs.
-    2. Additional options can be controlled.
-    3. Each dino prompt can be calculated with AND, OR, XOR, NOR, and NAND gates.
-        1. face OR (body NAND outfit) -> Create a body mask that does not overlap with the outfit. And composited with a face mask.
-        2. Use parentheses sparingly. Parentheses operations consume more VRAM because they generate masks in advance.
-    4. Option values ​​of each dino prompt can be entered by separating them with colons.
+0. 인페인팅의 범위 제한(I2I 전용)
+    1. Inner 옵션은 I2I의 인페인팅에서 칠한 범위 내부만 이미지를 탐색
+    2. Outer 옵션은 I2I의 인페인팅에서 칠한 범위 외부만 이미지를 탐색
+1. 탐색 키워드 작성
+    1. 탐색할 키워드를 작성(face, person 등등)
+        1. 탐색할 키워드는 문장형도 가능(happy face, running dog)
+        2. 탐색할 키워드를 .으로 분할 가능(face. arm, face. chest)
+    2. 탐색할 키워드에 사용 가능한 추가 옵션 존재
+        1. &lt;area:type&gt;을 이용하여 특정 범위 탐색 가능
+            1. 범위 종류는 left, right, top, bottom, all이 존재
+        2. &lt;file:filename&gt;을 이용하여 특정 파일 탐색 가능
+            1. 특정 파일의 위치는 models/ddsdmask
+        3. &lt;model:type&gt;을 이용하여 특정 모델 탐색 가능
+            1. type은 face_media_full, face_media_short와 파일명이 존재
+            2. 파일은 models/yolo에 위치
+        4. &lt;type1:type2:dilation:confidence&gt; 같이 type1과 type2외에 dilation과 confidence도 추가 입력 가능
+            1. confidence는 model 타입에서만 사용되는 값
+    3. 탐색한 범위를 AND, OR, XOR, NAND, NOR 등의 게이트 옵션으로 연산 가능
+        1. face OR (body NAND outfit) -> 괄호안의 body NAND outfit을 먼저 한 후에 face와 OR 연산을 동작
+        2. 괄호는 최대한 적게 이용. 많이 이용시 많은 VRAM 소모.
+        3. 동작은 왼쪽에서 오른쪽으로 순차적 동작.
+    4. 탐색할 키워드에 옵션으로 여러가지 옵션 조절 가능
         1. face:0:0.4:4 OR outfit:2:0.5:8
-        2. Each option, in order, is prompt, detection level (0-2:default 0), box threshold (0-1:default 0.3), and dilation value (0-128:default 8).
-        3. You can omit it if you wish. Replace with default value if omitted.
-2. Input positive prompt
-    1. Inpaint the positive prompt multiple times, separated by semicolons.
-3. Input negative prompt
-    1. Inpaint the negative prompt multiple times, separated by semicolons.
-4. Check the option to separate and inpaint the unconnected mask.
-    1. When separating and inpainting, the number of inpaintings increases. But quality rises.
-5. Select a small area of ​​pixels to remove from the inpainting area when inpainting by isolation.
-6. Generate!
+        2. 순서대로 탐색할 프롬프트, SAM 탐색 레벨(0-2), 민감도(0-1), 팽창값(0-512)을 가짐
+        3. 값을 생략하면 초기값으로 세팅
+2. 긍정 프롬프트 입력
+    1. 인페인팅시 동작시킬 긍정 프롬프트 입력
+3. 부정 프롬프트 입력
+    1. 인페인팅시 동작시킬 부정 프롬프트 입력
+4. Denoising, CFG, Steps, Clip skip, Ckpt, Vae 수정
+    1. 인페인팅시 동작에 영향을 주는 옵션
+5. Split Mask 옵션 체크
+    1. 체크시 마스크가 떨어져 있는것이 존재한다면 따로 인페인팅.
+        1. 따로 인페인팅시 퀄리티 상승. 하지만 더 많은 인페인팅을 요구하여 생성속도 하락.
+6. Remove Area 옵션 체크
+    1. Split Mask 옵션이 Enable 되어야만 동작
+    2. 분할 인페인팅시 일정 크기 이하의 면적은 인페인팅에서 제외
+6. 생성!
+### Postprocessing
+최종적으로 생성된 이미지에 가하는 후보정
+#### Postprocessing How to use
+1. 가하고자 하는 후보정을 선택
+2. 생성!
+### Watermark
+이미지 생성 최종본에 자신의 증명을 기입하는 기능
+#### Watermark How to use
+1. 기입할 증명의 종류 선택(글자, 이미지)
+2. 선택한 종류를 입력
+3. 선택한 종류의 크기와 위치를 지정
+4. Padding으로 해당 위치에서 얼만큼 떨어져 있을지 설정
+5. Alpha로 얼만큼 투명할지 결정
+6. 생성!
 ## Installation
-1. Download [CUDA](https://developer.nvidia.com/cuda-toolkit-archive) and [cuDNN](https://developer.nvidia.com/rdp/cudnn-archive)
-    1. You need current CUDA and cuDNN version
-    2. This is [CUDA 117](https://drive.google.com/file/d/1HRTOLTB44-pRcrwIw9lQak2OC2ohNle3/view?usp=share_link) and [cuDNN](https://drive.google.com/file/d/1QcgaxUra0WnCWrCLjsWp_QKw1PKcvqpj/view?usp=share_link)
-    3. After installing CUDA, overwrite cuDNN in the folder where you installed CUDA
-    4. Easy install support version. (torch == 1.13.1+cu117, torch==2.0.0+cu117 , torch==2.0.0+cu118)
-2. Install from the extensions tab with url `https://github.com/NeoGraph-K/sd-webui-ddsd`
-3. Start Sd web UI
-4. It takes some time to install sam model and dino model
+1. 다운로드 [CUDA](https://developer.nvidia.com/cuda-toolkit-archive)와 [cuDNN](https://developer.nvidia.com/rdp/cudnn-archive)
+    1. 자신이 가진 WebUI와 동일한 버전의 `CUDA`와 `cuDNN`버전으로 설치
+        1. 이것은 다운로드를 편하게 하기위한 구글링크. [CUDA 117](https://drive.google.com/file/d/1HRTOLTB44-pRcrwIw9lQak2OC2ohNle3/view?usp=share_link)와 [cuDNN](https://drive.google.com/file/d/1QcgaxUra0WnCWrCLjsWp_QKw1PKcvqpj/view?usp=share_link)
+    2. `CUDA` 설치 후 해당 폴더에 `cuDNN` 덮어쓰기
+    3. 일정 버전은 Easy Install을 지원. `CUDA`와 `cuDNN` 불필요.
+        1. 지원버전 (torch == 1.13.1+cu117, torch==2.0.0+cu117 , torch==2.0.0+cu118)
+2. 확장탭에서 설치 `https://github.com/NeoGraph-K/sd-webui-ddsd` 또는 다운로드 후 `extension/` 에 풀어넣기
+3. WebUI를 완전히 재시작
 
 ## Credits
 
@@ -51,3 +89,5 @@ IDEA-Research/[GroundingDINO](https://github.com/IDEA-Research/GroundingDINO)
 IDEA-Research/[Grounded-Segment-Anything](https://github.com/IDEA-Research/Grounded-Segment-Anything)
 
 continue-revolution/[sd-webui-segment-anything](https://github.com/continue-revolution/sd-webui-segment-anything)
+
+Bing-su/[adetailer](https://github.com/Bing-su/adetailer)
diff --git a/install.py b/install.py
@@ -72,8 +72,11 @@ def install_groundingdino():
 
 with open(req_file) as file:
     for lib in file:
+        version = None
         lib = lib.strip()
         lib = 'skimage' if lib == 'scikit-image' else lib
+        if '==' in lib:
+            lib, version = [x.strip() for x in lib.split('==')]
         if not launch.is_installed(lib):
             if lib == 'pycocotools':
                 install_pycocotools()
@@ -90,6 +93,7 @@ def install_groundingdino():
                     f'sd-webui-ddsd requirement: pillow_lut'
                 )
             else:
+                lib = lib if version is None else lib + '==' + version
                 launch.run_pip(
                     f'install {lib}',
                     f'sd-webui-ddsd requirement: {lib}'
diff --git a/requirements.txt b/requirements.txt
@@ -3,4 +3,6 @@ segment_anything
 groundingdino
 scipy
 scikit-image
-pillow_lut
+pillow_lut
+ultralytics==8.0.87
+mediapipe==0.9.3.0
diff --git a/scripts/ddsd.py b/scripts/ddsd.py
@@ -25,6 +25,7 @@
 grounding_models_path = os.path.join(models_path, "grounding")
 sam_models_path = os.path.join(models_path, "sam")
 lut_models_path = os.path.join(models_path, 'lut')
+yolo_models_path = os.path.join(models_path, 'yolo')
 ddsd_config_path = os.path.join(os.path.dirname(os.path.dirname(__file__)),'config')
 
 ckpt_model_name_pattern = re.compile('([\\w\\.\\[\\]\\\\\\+\\(\\)]+)\\s*\\[.*\\]')
@@ -56,6 +57,12 @@ def modeltitle(path, shorthash):
         return models
         
 def startup():
+    if (len(list_models(yolo_models_path, '.pth')) == 0) and (len(list_models(yolo_models_path, '.pt')) == 0):
+        print("No detection yolo models found, downloading...")
+        load_file_from_url('https://huggingface.co/Bingsu/adetailer/resolve/main/face_yolov8m.pt',yolo_models_path)
+        load_file_from_url('https://huggingface.co/Bingsu/adetailer/resolve/main/face_yolov8n.pt',yolo_models_path)
+        load_file_from_url('https://huggingface.co/Bingsu/adetailer/resolve/main/face_yolov8s.pt',yolo_models_path)
+        
     if (len(list_models(grounding_models_path, '.pth')) == 0):
         print("No detection groundingdino models found, downloading...")
         load_file_from_url('https://huggingface.co/ShilongLiu/GroundingDINO/resolve/main/groundingdino_swint_ogc.pth',grounding_models_path)
@@ -870,7 +877,8 @@ def postprocess(self, p, res, *args, **kargs):
         self.change_vae_model(self.vae)
         opts.CLIP_stop_at_last_layers = self.clip_skip
         if len(self.image_results) < 1: return
-        if p.n_iter > 1 or p.batch_size > 1:
+        final_count = len(res.images)
+        if (p.n_iter > 1 or p.batch_size > 1) and final_count != p.n_iter * p.batch_size:
             grid = res.images[0]
             res.images = res.images[1:]
             grid_texts = res.infotexts[0]
@@ -879,7 +887,7 @@ def postprocess(self, p, res, *args, **kargs):
         res.images = [image for sub in images for image in sub]
         infos = [[info] * (len(masks) + 1) for masks, info in zip(self.image_results, res.infotexts)]
         res.infotexts = [info for sub in infos for info in sub]
-        if p.n_iter > 1 or p.batch_size > 1:
+        if (p.n_iter > 1 or p.batch_size > 1) and final_count != p.n_iter * p.batch_size:
             res.images = [grid] + res.images
             res.infotexts = [grid_texts] + res.infotexts
     
diff --git a/scripts/ddsd_bs.py b/scripts/ddsd_bs.py
@@ -0,0 +1,72 @@
+from __future__ import annotations
+
+import os
+import torch
+
+import mediapipe as mp
+import numpy as np
+
+from PIL import Image, ImageDraw
+from ultralytics import YOLO
+
+from modules import safe
+from modules.shared import cmd_opts
+from modules.paths import models_path
+
+yolo_models_path = os.path.join(models_path, 'yolo')
+
+def mediapipe_face_detect(image, model_type, confidence):
+    width, height = image.size
+    image_np = np.array(image)
+    
+    mp_face_detection = mp.solutions.face_detection
+    with mp_face_detection.FaceDetection(model_selection=model_type, min_detection_confidence=confidence) as face_detector:
+        predictor = face_detector.process(image_np)
+    
+    if predictor.detections is None: return None
+    
+    bboxes = []
+    for detection in predictor.detections:
+        
+        bbox = detection.location_data.relative_bounding_box
+        x1 = bbox.xmin * width
+        y1 = bbox.ymin * height
+        x2 = x1 + bbox.width * width
+        y2 = y1 + bbox.height * height
+        bboxes.append([x1,y1,x2,y2])
+    
+    return create_mask_from_bbox(image, bboxes)
+
+def ultralytics_predict(image, model_type, confidence, device):
+    models = [os.path.join(yolo_models_path,x) for x in os.listdir(yolo_models_path) if (x.endswith('.pt') or x.endswith('.pth')) and os.path.splitext(os.path.basename(x))[0].upper() == model_type]
+    if len(models) == 0: return None
+    model = YOLO(models[0])
+    predictor = model(image, conf=confidence, show_labels=False, device=device)
+    bboxes = predictor[0].boxes.xyxy.cpu().numpy()
+    if bboxes.size == 0: return None
+    bboxes = bboxes.tolist()
+    return create_mask_from_bbox(image, bboxes)
+
+def create_mask_from_bbox(image, bboxes):
+    mask = Image.new('L', image.size, 0)
+    draw = ImageDraw.Draw(mask)
+    for bbox in bboxes:
+        draw.rectangle(bbox, fill=255)
+    return np.array(mask)
+        
+def bs_model(image, model_type, confidence):
+    print(model_type, confidence)
+    image = Image.fromarray(image)
+    orig = torch.load
+    torch.load = safe.unsafe_torch_load
+    if model_type == 'FACE_MEDIA_FULL':
+        mask = mediapipe_face_detect(image, 1, confidence)
+    elif model_type == 'FACE_MEDIA_SHORT':
+        mask = mediapipe_face_detect(image, 0, confidence)
+    else:
+        device = ''
+        if getattr(cmd_opts, 'lowvram', False) or getattr(cmd_opts, 'medvram', False):
+            device = 'cpu'
+        mask = ultralytics_predict(image, model_type, confidence, device)
+    torch.load = orig
+    return mask
diff --git a/scripts/ddsd_utils.py b/scripts/ddsd_utils.py
@@ -7,6 +7,7 @@
 from glob import glob
 from PIL import Image, ImageDraw, ImageFont
 from scripts.ddsd_sam import sam_predict, clear_cache, dilate_mask
+from scripts.ddsd_bs import bs_model
 from modules.devices import torch_gc
 from skimage import measure
 
@@ -75,10 +76,11 @@ def dino_detect_from_prompt(prompt:str, detailer_sam_model, detailer_dino_model,
     if inpaint_mask_mode == 'Outer': return cv2.bitwise_and(result, cv2.bitwise_not(image_mask))
     return None
     
-def dino_prompt_token_file(prompt:str, image_np_zero):
-    usage_type, usage, dilation = prompt_spliter(prompt, ':', 3)
+def dino_prompt_token_file(prompt:str, image_np_zero, image_np_rgb):
+    usage_type, usage, dilation, confidence = prompt_spliter(prompt, ':', 4)
     usage_type = usage_type.upper()
     usage = usage.upper()
+    confidence = try_convert(confidence, float, 0.3, 0, 1)
     if usage_type == 'AREA':
         if usage == 'LEFT':
             image_np_zero[:,:image_np_zero.shape[1] // 2] = 255
@@ -100,6 +102,10 @@ def dino_prompt_token_file(prompt:str, image_np_zero):
             h, w = image_np_zero.shape[:2]
             image = image.resize((w, h))
             image_np_zero = np.array(image)
+    if usage_type == 'MODEL':
+        mask = bs_model(image_np_rgb, usage, confidence)
+        if mask is None: return image_np_zero
+        image_np_zero = mask
     return dilate_mask(image_np_zero, try_convert(dilation, int, 2, 0, 512))
 
 def dino_prompt_detector(prompt:str, model_set, image_set):
@@ -128,7 +134,7 @@ def dino_prompt_detector(prompt:str, model_set, image_set):
                                     try_convert(sam_level.strip(), int, 0, 0, 2))
                     if left is None: left = image_set[3].copy()
                 else:
-                    left = dino_prompt_token_file(match.group(1), image_set[3].copy())
+                    left = dino_prompt_token_file(match.group(1), image_set[3].copy(), image_set[2].copy())
             else:
                 left = result_group[left.strip()]
         if not isinstance(right, np.ndarray):
@@ -143,7 +149,7 @@ def dino_prompt_detector(prompt:str, model_set, image_set):
                                     try_convert(sam_level.strip(), int, 0, 0, 2))
                     if right is None: right = image_set[3].copy()
                 else:
-                    right = dino_prompt_token_file(match.group(1), image_set[3].copy())
+                    right = dino_prompt_token_file(match.group(1), image_set[3].copy(), image_set[2].copy())
             else:
                 right = result_group[right.strip()]
         spliter[:3] = [combine_masks(left, operator, right)]
@@ -159,7 +165,7 @@ def dino_prompt_detector(prompt:str, model_set, image_set):
                                     try_convert(sam_level.strip(), int, 0, 0, 2))
         if target is None: return image_set[3].copy()
     else:
-        target = dino_prompt_token_file(match.group(1), image_set[3].copy())
+        target = dino_prompt_token_file(match.group(1), image_set[3].copy(), image_set[2].copy())
     return target
 
 def mask_spliter_and_remover(mask, area):