WIP: onnx support #1

chuanli11 · 2022-09-16T07:56:59Z

Add onnx support to sd_image model.

Convert sd_image model to onnx

python scripts/convert_sd_image_checkpoint_to_onnx.py \
--model_path <path-to-pytorch-ckpt-or-huggingfaces-url> \
--output_path <path-to-output-onnx-model>

Test onnx model

from pathlib import Path
from lambda_diffusers import StableDiffusionImageEmbedOnnxPipeline
from PIL import Image

pipe = StableDiffusionImageEmbedOnnxPipeline.from_pretrained(
    "path-to-output-onnx-model",
    revision="onnx",
    provider="CUDAExecutionProvider", # or CPUExecutionProvider for running the job on CPU
)

im = Image.open("path-to-your-input-image")
num_samples = 2
image = pipe(num_samples*[im])
image = image["sample"]
base_path = Path("./")
base_path.mkdir(exist_ok=True, parents=True)
for idx, im in enumerate(image):
    im.save(base_path/f"{idx:06}_onnx.png")

chuanli11 · 2022-09-16T08:09:15Z

It is not ready to be merged because the onnx model does not speed things up for GPU. This is not specifically for sd_image but also for the reference huggingface model. See issue here.

The onnx model does give ~35% iteration time when running on CPU. However it is still too slow to be used (GPU can be two orders of magnitude faster)

Model Format	CPU	CUDA RTX 8000	CUDA RTX 8000 + autocast
PyTorch	10.16s/it	4.57it/s	8.92 it/s
Onnx	6.70s/it	2.23it/s	N/A

chuanli11 · 2022-09-16T08:11:38Z

Hugginface diffusers has some issue with onnxruntime-gpu installation. See discussion here and here.

chuanli11 · 2022-09-16T08:26:26Z

The current onnx pipeline only use onnx for vae and unet, and keep the other parts of the model as PyTorch ckpts

This is due to

get_image_features is not a nn.Module so doesn't work with onnx_export. Similar discussion found here. Tried replacing it with CLIPVisionModel, was able to export but caused another error that is related to "TypeError: forward() takes 1 positional argument but 2 were given" during inference.
safety_checker onnx model doesn't work with batch size > 1. Haven't looked into the reason.

However, having these modules running in PyTorch (instead of onnx) doesn't seem to have much of a impact on the speed, since the most expensive compute is the diffusion step (unet) and those numbers of the sd_image model (this table) matches the numbers of the reference huggingface model (table below), despite all modules in the huggingface model can be converted into onnx.

Model Format	CPU	CUDA RTX 8000	CUDA RTX 8000 + autocast
PyTorch	10.16s/it	4.56it/s	8.78 it/s
Onnx	6.64s/it	2.21it/s	N/A

Add onnx support

32b0839

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: onnx support #1

WIP: onnx support #1

Uh oh!

chuanli11 commented Sep 16, 2022 •

edited

Loading

Uh oh!

chuanli11 commented Sep 16, 2022 •

edited

Loading

Uh oh!

chuanli11 commented Sep 16, 2022 •

edited

Loading

Uh oh!

chuanli11 commented Sep 16, 2022 •

edited

Loading

Uh oh!

Uh oh!

WIP: onnx support #1

Are you sure you want to change the base?

WIP: onnx support #1

Uh oh!

Conversation

chuanli11 commented Sep 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chuanli11 commented Sep 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chuanli11 commented Sep 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chuanli11 commented Sep 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

chuanli11 commented Sep 16, 2022 •

edited

Loading

chuanli11 commented Sep 16, 2022 •

edited

Loading

chuanli11 commented Sep 16, 2022 •

edited

Loading

chuanli11 commented Sep 16, 2022 •

edited

Loading