Repaint and LaMa Demo (IDEA-Research#259)

* refine * add lama demo * refine readme * update * update * add repaint
hsaigroup · May 15, 2023 · f90b798 · f90b798
1 parent a53b989
commit f90b798
Show file tree

Hide file tree

Showing 10 changed files with 321 additions and 9 deletions.
diff --git a/README.md b/README.md
@@ -14,8 +14,9 @@ We are very willing to **help everyone share and promote new projects** based on
 The **core idea** behind this project is to **combine the strengths of different models in order to build a very powerful pipeline for solving complex problems**. And it's worth mentioning that this is a workflow for combining strong expert models, where **all parts can be used separately or in combination, and can be replaced with any similar but different models (like replacing Grounding DINO with GLIP or other detectors / replacing Stable-Diffusion with ControlNet or GLIGEN/ Combining with ChatGPT)**.
 
 **🍇 Updates**
-- **`2023/05/14`**: Release [PaintByExample](./playground/generation/PaintByExample/) demo with SAM.
-- **`2023/05/11`**: We decide to share more interesting demo in [playground](./playground/) and we've already tested the [DeepFloyd](./playground/generation/DeepFloyd/) for image generation and style transfering and share some notes about using IF.
+- **`2023/05/15`**: Release [LaMa](./playground/LaMa/) and [RePaint](./playground/RePaint/) demo, thanks for nice tips by [Tao Yu](https://github.com/geekyutao).
+- **`2023/05/14`**: Release [PaintByExample](./playground/PaintByExample/) demo with SAM.
+- **`2023/05/11`**: We decide to share more interesting demo in [playground](./playground/) and we've already tested the [DeepFloyd](./playground/DeepFloyd/) for image generation and style transfering and share some notes about using IF.
 - **`2023/05/05`**: Release a simpler code for automatic labeling (combined with Tag2Text model): please see [automatic_label_simple_demo.py](./automatic_label_simple_demo.py)
 - **`2023/05/03`**: Checkout the [Automated Dataset Annotation and Evaluation with GroundingDINO and SAM](https://colab.research.google.com/github/roboflow-ai/notebooks/blob/main/notebooks/automated-dataset-annotation-and-evaluation-with-grounding-dino-and-sam.ipynb) which is an amazing tutorial on automatic labeling! Thanks a lot for [Piotr Skalski](https://github.com/SkalskiP) and [Robotflow](https://github.com/roboflow/notebooks)!
 
@@ -42,9 +43,11 @@ The **core idea** behind this project is to **combine the strengths of different
   - [Interactive Fashion-Edit Playground: Click for Segmentation And Editing](#dancers-interactive-editing)
   - [Interactive Human-face Editing Playground: Click And Editing Human Face](#dancers-interactive-editing)
   - [3D Box Via Segment Anything](#camera-3d-box-via-segment-anything)
-  - [Playground: More Interesting and Imaginative Demos](./playground/)
+  - [Playground: More Interesting and Imaginative Demos with Grounded-SAM](./playground/)
     - [DeepFloyd: Image Generation with Text Prompt](./playground/DeepFloyd/)
     - [PaintByExample: Exemplar-based Image Editing with Diffusion Models](./playground/PaintByExample/)
+    - [LaMa: Resolution-robust Large Mask Inpainting with Fourier Convolutions](./playground/LaMa/)
+    - [RePaint: Inpainting using Denoising Diffusion Probabilistic Models](./playground/RePaint/)
 
 
 ## Preliminary Works

diff --git a/grounded_sam_inpainting_demo.py b/grounded_sam_inpainting_demo.py
@@ -138,9 +138,6 @@ def show_box(box, ax, label):
     inpaint_prompt = args.inpaint_prompt
     output_dir = args.output_dir
     cache_dir=args.cache_dir
-    # if not os.path.exists(cache_dir):
-    #     print(f"create your cache dir:{cache_dir}")
-    #     os.mkdir(cache_dir)
     box_threshold = args.box_threshold
     text_threshold = args.text_threshold
     inpaint_mode = args.inpaint_mode

diff --git a/playground/DeepFloyd/README.md b/playground/DeepFloyd/README.md
@@ -122,6 +122,8 @@ export FORCE_MEM_EFFICIENT_ATTN=1
 ### Dream
 The `text-to-image` mode for DeepFloyd
 ```python
+cd playground/DeepFloyd
+
 export FORCE_MEM_EFFICIENT_ATTN=1 
 python dream.py
 ```
@@ -147,6 +149,8 @@ Download the original image from [here](https://github.com/IDEA-Research/detrex-
 </div>
 
 ```python
+cd playground/DeepFloyd
+
 export FORCE_MEM_EFFICIENT_ATTN=1 
 python style_transfer.py
 ```

diff --git a/playground/LaMa/README.md b/playground/LaMa/README.md
@@ -0,0 +1,87 @@
+## LaMa: Resolution-robust Large Mask Inpainting with Fourier Convolutions
+
+:grapes: [[Official Project Page](https://advimman.github.io/lama-project/)] &nbsp; :apple:[[LaMa Cleaner](https://github.com/Sanster/lama-cleaner)]
+
+We use the highly organized code [lama-cleaner](https://github.com/Sanster/lama-cleaner) to simplify the demo code for users.
+
+<div align="center">
+
+![](https://raw.githubusercontent.com/senya-ashukha/senya-ashukha.github.io/master/projects/lama_21/ezgif-4-0db51df695a8.gif)
+
+</div>
+
+## Abstract
+
+> Modern image inpainting systems, despite the significant progress, often struggle with large missing areas, complex geometric structures, and high-resolution images. We find that one of the main reasons for that is the lack of an ef-fective receptive field in both the inpainting network andthe loss function. To alleviate this issue, we propose anew method called large mask inpainting (LaMa). LaM ais based on: a new inpainting network architecture that uses fast Fourier convolutions, which have the image-widereceptive field
+a high receptive field perceptual loss; large training masks, which unlocks the potential ofthe first two components. Our inpainting network improves the state-of-the-art across a range of datasets and achieves excellent performance even in challenging scenarios, e.g.completion of periodic structures. Our model generalizes surprisingly well to resolutions that are higher than thoseseen at train time, and achieves this at lower parameter & compute costs than the competitive baselines.
+
+## Table of Contents
+- [Installation](#installation)
+- [LaMa Demos](#paint-by-example-demos)
+  - [Diffuser Demo](#paintbyexample-diffuser-demos)
+  - [PaintByExample with SAM](#paintbyexample-with-sam)
+
+
+## TODO
+- [x] LaMa Demo with lama-cleaner
+- [x] LaMa with SAM
+- [ ] LaMa with GroundingDINO
+- [ ] LaMa with Grounded-SAM
+
+
+## Installation
+We're using lama-cleaner for this demo, install it as follows:
+```bash
+pip install lama-cleaner
+```
+Please refer to [lama-cleaner](https://github.com/Sanster/lama-cleaner) for more details. 
+
+Then install Grounded-SAM follows [Grounded-SAM Installation](https://github.com/IDEA-Research/Grounded-Segment-Anything#installation) for some extension demos.
+
+## LaMa Demos
+Here we provide the demos for `LaMa`
+
+### LaMa Demo with lama-cleaner
+
+```bash
+cd playground/LaMa
+python lama_inpaint_demo.py
+```
+with the highly organized code lama-cleaner, this demo can be done in about 20 lines of code. The result will be saved as `lama_inpaint_demo.jpg`:
+
+<div align="center">
+
+| Input Image | Mask | Inpaint Output |
+|:----:|:----:|:----:|
+| ![](https://github.com/IDEA-Research/detrex-storage/blob/main/assets/grounded_sam/lama/example.jpg?raw=true) | ![](https://github.com/IDEA-Research/detrex-storage/blob/main/assets/grounded_sam/lama/mask.png?raw=true) | ![](https://github.com/IDEA-Research/detrex-storage/blob/main/assets/grounded_sam/lama/lama_inpaint_demo.jpg?raw=true) |
+
+</div>
+
+### LaMa with SAM
+
+```bash
+cd playground/LaMa
+python sam_lama.py
+```
+
+**Tips** 
+To make it better for inpaint, we should **dilate the mask first** to make it a bit larger to cover the whole region (Thanks a lot for [Inpaint-Anything](https://github.com/geekyutao/Inpaint-Anything) and [Tao Yu](https://github.com/geekyutao) for this)
+
+
+The `original mask` and `dilated mask` are shown as follows:
+
+<div align="center">
+
+| Mask | Dilated Mask |
+|:---:|:---:|
+| ![](https://github.com/IDEA-Research/detrex-storage/blob/main/assets/grounded_sam/lama/sam_demo_mask.png?raw=true) | ![](https://github.com/IDEA-Research/detrex-storage/blob/main/assets/grounded_sam/lama/dilated_mask.png?raw=true) |
+
+</div>
+
+
+And the inpaint result will be saved as `sam_lama_demo.jpg`:
+
+| Input Image | SAM Output | Dilated Mask | LaMa Inpaint |
+|:---:|:---:|:---:|:---:|
+| ![](https://github.com/IDEA-Research/detrex-storage/blob/main/assets/grounded_sam/paint_by_example/input_image.png?raw=true) | ![](https://github.com/IDEA-Research/detrex-storage/blob/main/assets/grounded_sam/paint_by_example/demo_with_point_prompt.png?raw=true) | ![](https://github.com/IDEA-Research/detrex-storage/blob/main/assets/grounded_sam/lama/dilated_mask.png?raw=true) | ![](https://github.com/IDEA-Research/detrex-storage/blob/main/assets/grounded_sam/lama/sam_lama_demo.jpg?raw=true) |
+
diff --git a/playground/LaMa/lama_inpaint_demo.py b/playground/LaMa/lama_inpaint_demo.py
@@ -0,0 +1,25 @@
+import cv2
+import PIL
+import requests
+import numpy as np
+from lama_cleaner.model.lama import LaMa
+from lama_cleaner.schema import Config
+
+
+def download_image(url):
+    image = PIL.Image.open(requests.get(url, stream=True).raw)
+    image = PIL.ImageOps.exif_transpose(image)
+    image = image.convert("RGB")
+    return image
+
+
+img_url = "https://raw.githubusercontent.com/Sanster/lama-cleaner/main/assets/dog.jpg"
+mask_url = "https://user-images.githubusercontent.com/3998421/202105351-9fcc4bf8-129d-461a-8524-92e4caad431f.png"
+
+image = np.asarray(download_image(img_url))
+mask = np.asarray(download_image(mask_url).convert("L"))
+
+# set to GPU for faster inference
+model = LaMa("cpu")
+result = model(image, mask, Config(hd_strategy="Original", ldm_steps=20, hd_strategy_crop_margin=128, hd_strategy_crop_trigger_size=800, hd_strategy_resize_limit=800))
+cv2.imwrite("lama_inpaint_demo.jpg", result)
diff --git a/playground/LaMa/sam_lama.py b/playground/LaMa/sam_lama.py
@@ -0,0 +1,96 @@
+# !pip install diffusers transformers
+
+import requests
+import cv2
+import numpy as np
+import PIL
+from PIL import Image
+from io import BytesIO
+
+from segment_anything import sam_model_registry, SamPredictor
+
+from lama_cleaner.model.lama import LaMa
+from lama_cleaner.schema import Config
+
+"""
+Step 1: Download and preprocess demo images
+"""
+def download_image(url):
+    image = PIL.Image.open(requests.get(url, stream=True).raw)
+    image = PIL.ImageOps.exif_transpose(image)
+    image = image.convert("RGB")
+    return image
+
+
+img_url = "https://github.com/IDEA-Research/detrex-storage/blob/main/assets/grounded_sam/paint_by_example/input_image.png?raw=true"
+
+
+init_image = download_image(img_url)
+init_image = np.asarray(init_image)
+
+
+"""
+Step 2: Initialize SAM and LaMa models
+"""
+
+DEVICE = "cuda:1"
+
+# SAM
+SAM_ENCODER_VERSION = "vit_h"
+SAM_CHECKPOINT_PATH = "/comp_robot/rentianhe/code/Grounded-Segment-Anything/sam_vit_h_4b8939.pth"
+sam = sam_model_registry[SAM_ENCODER_VERSION](checkpoint=SAM_CHECKPOINT_PATH).to(device=DEVICE)
+sam_predictor = SamPredictor(sam)
+sam_predictor.set_image(init_image)
+
+# LaMa
+model = LaMa(DEVICE)
+
+
+"""
+Step 3: Get masks with SAM by prompt (box or point) and inpaint the mask region by example image.
+"""
+
+input_point = np.array([[350, 256]])
+input_label = np.array([1])  # positive label
+
+masks, _, _ = sam_predictor.predict(
+    point_coords=input_point,
+    point_labels=input_label,
+    multimask_output=False
+)
+masks = masks.astype(np.uint8) * 255
+# mask_pil = Image.fromarray(masks[0])  # simply save the first mask
+
+
+"""
+Step 4: Dilate Mask to make it more suitable for LaMa inpainting
+
+The idea behind dilate mask is to mask a larger region which will be better for inpainting.
+
+Borrowed from Inpaint-Anything: https://github.com/geekyutao/Inpaint-Anything/blob/main/utils/utils.py#L18
+"""
+
+def dilate_mask(mask, dilate_factor=15):
+    mask = mask.astype(np.uint8)
+    mask = cv2.dilate(
+        mask,
+        np.ones((dilate_factor, dilate_factor), np.uint8),
+        iterations=1
+    )
+    return mask
+
+def save_array_to_img(img_arr, img_p):
+    Image.fromarray(img_arr.astype(np.uint8)).save(img_p)
+
+# [1, 512, 512] to [512, 512] and save mask
+save_array_to_img(masks[0], "./mask.png")
+
+mask = dilate_mask(masks[0], dilate_factor=15)
+
+save_array_to_img(mask, "./dilated_mask.png")
+
+"""
+Step 5: Run LaMa inpaint model
+"""
+result = model(init_image, mask, Config(hd_strategy="Original", ldm_steps=20, hd_strategy_crop_margin=128, hd_strategy_crop_trigger_size=800, hd_strategy_resize_limit=800))
+cv2.imwrite("sam_lama_demo.jpg", result)
diff --git a/playground/PaintByExample/README.md b/playground/PaintByExample/README.md
@@ -38,7 +38,7 @@ Here we provide the demos for `PaintByExample`
 
 ### PaintByExample Diffuser Demos
 ```python
-cd playground/generation/PaintByExample
+cd playground/PaintByExample
 python paint_by_example.py
 ```
 **Notes:** set `cache_dir` to save the pretrained weights to specific folder. The paint result will be save as `paint_by_example_demo.jpg`:
@@ -59,7 +59,7 @@ In this demo, we did inpaint task by:
 2. Inpaint with mask and example image
 
 ```python
-cd playground/generation/PaintByExample
+cd playground/PaintByExample
 python sam_paint_by_example.py
 ```
 **Notes:** We set a more `num_inference_steps` (like 200 to 500) to get higher quality image. And we've found that the mask region can influence a lot on the final result (like a panda can not be well inpainted with a region like dog). It needed to have more test on it.

diff --git a/playground/README.md b/playground/README.md
@@ -9,6 +9,11 @@ We will try more interesting **base models** and **build more fun demos** in the
 - [DeepFloyd: Text-to-Image Generation](./DeepFloyd/)
   - [Dream: Text-to-Image Generation](./DeepFloyd/dream.py)
   - [Style Transfer](./DeepFloyd/style_transfer.py)
-- [Paint By Example](./PaintByExample/)
+- [Paint by Example: Exemplar-based Image Editing with Diffusion Models](./PaintByExample/)
   - [Diffuser Demo](./PaintByExample/paint_by_example.py)
   - [PaintByExample with SAM](./PaintByExample/sam_paint_by_example.py)
+- [LaMa: Resolution-robust Large Mask Inpainting with Fourier Convolutions](./LaMa/)
+  - [LaMa Demo](./LaMa/lama_inpaint_demo.py)
+  - [LaMa with SAM](./LaMa/sam_lama.py)
+- [RePaint: Inpainting using Denoising Diffusion Probabilistic Models](./RePaint/)
+  - [RePaint Demo](./RePaint/repaint.py)
diff --git a/playground/RePaint/README.md b/playground/RePaint/README.md
@@ -0,0 +1,55 @@
+## RePaint: Inpainting using Denoising Diffusion Probabilistic Models
+
+:grapes: [[Official Project Page](https://github.com/andreas128/RePaint)]
+
+<div align="center">
+
+![](https://user-images.githubusercontent.com/11280511/150803812-a4729ef8-6ad4-46aa-ae99-8c27fbb2ea2e.png)
+
+</div>
+
+## Abstract
+
+> Free-form inpainting is the task of adding new content to an image in the regions specified by an arbitrary binary mask. Most existing approaches train for a certain distribution of masks, which limits their generalization capabilities to unseen mask types. Furthermore, training with pixel-wise and perceptual losses often leads to simple textural extensions towards the missing areas instead of semantically meaningful generation. In this work, we propose RePaint: A Denoising Diffusion Probabilistic Model (DDPM) based inpainting approach that is applicable to even extreme masks. We employ a pretrained unconditional DDPM as the generative prior. To condition the generation process, we only alter the reverse diffusion iterations by sampling the unmasked regions using the given image information. Since this technique does not modify or condition the original DDPM network itself, the model produces highquality and diverse output images for any inpainting form. We validate our method for both faces and general-purpose image inpainting using standard and extreme masks. RePaint outperforms state-of-the-art Autoregressive, and GAN approaches for at least five out of six mask distributions.
+
+
+## Table of Contents
+- [Installation](#installation)
+- [Repaint Demos](#repaint-demos)
+  - [Diffuser Demo](#repaint-diffuser-demos)
+
+
+## TODO
+- [x] RePaint Diffuser Demo
+- [ ] RePaint with SAM
+- [ ] RePaint with GroundingDINO
+- [ ] RePaint with Grounded-SAM
+
+## Installation
+We're using PaintByExample with diffusers, install diffusers as follows:
+```bash
+pip install diffusers==0.16.1
+```
+Then install Grounded-SAM follows [Grounded-SAM Installation](https://github.com/IDEA-Research/Grounded-Segment-Anything#installation) for some extension demos.
+
+## RePaint Demos
+Here we provide the demos for `RePaint`
+
+
+### RePaint Diffuser Demos
+```python
+cd playground/RePaint
+python repaint.py
+```
+**Notes:** set `cache_dir` to save the pretrained weights to specific folder. The paint result will be save as `repaint_demo.jpg`:
+
+<div align="center">
+
+| Input Image | Mask | Inpaint Result |
+|:----:|:----:|:----:|
+| ![](https://github.com/IDEA-Research/detrex-storage/blob/main/assets/grounded_sam/repaint/celeba_hq_256.png?raw=true) | ![](https://github.com/IDEA-Research/detrex-storage/blob/main/assets/grounded_sam/repaint/mask_256.png?raw=true) | ![](https://github.com/IDEA-Research/detrex-storage/blob/main/assets/grounded_sam/repaint/repaint_demo.jpg?raw=true) |
+
+
+</div>
+
+
diff --git a/playground/RePaint/repaint.py b/playground/RePaint/repaint.py
@@ -0,0 +1,40 @@
+from io import BytesIO
+
+import torch
+
+import PIL
+import requests
+from diffusers import RePaintPipeline, RePaintScheduler
+
+
+def download_image(url):
+    response = requests.get(url)
+    return PIL.Image.open(BytesIO(response.content)).convert("RGB")
+
+
+img_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/celeba_hq_256.png"
+mask_url = "https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/repaint/mask_256.png"
+
+# Load the original image and the mask as PIL images
+original_image = download_image(img_url).resize((256, 256))
+mask_image = download_image(mask_url).resize((256, 256))
+
+# Load the RePaint scheduler and pipeline based on a pretrained DDPM model
+DEVICE = "cuda:1"
+CACHE_DIR = "/comp_robot/rentianhe/weights/diffusers/"
+scheduler = RePaintScheduler.from_pretrained("google/ddpm-ema-celebahq-256", cache_dir=CACHE_DIR)
+pipe = RePaintPipeline.from_pretrained("google/ddpm-ema-celebahq-256", scheduler=scheduler, cache_dir=CACHE_DIR)
+pipe = pipe.to(DEVICE)
+
+generator = torch.Generator(device=DEVICE).manual_seed(0)
+output = pipe(
+    image=original_image,
+    mask_image=mask_image,
+    num_inference_steps=250,
+    eta=0.0,
+    jump_length=10,
+    jump_n_sample=10,
+    generator=generator,
+)
+inpainted_image = output.images[0]
+inpainted_image.save("./repaint_demo.jpg")