Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: onnx support #1

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

WIP: onnx support #1

wants to merge 1 commit into from

Conversation

chuanli11
Copy link
Contributor

@chuanli11 chuanli11 commented Sep 16, 2022

Add onnx support to sd_image model.

  • Convert sd_image model to onnx
python scripts/convert_sd_image_checkpoint_to_onnx.py \
--model_path <path-to-pytorch-ckpt-or-huggingfaces-url> \
--output_path <path-to-output-onnx-model>
  • Test onnx model
from pathlib import Path
from lambda_diffusers import StableDiffusionImageEmbedOnnxPipeline
from PIL import Image

pipe = StableDiffusionImageEmbedOnnxPipeline.from_pretrained(
    "path-to-output-onnx-model",
    revision="onnx",
    provider="CUDAExecutionProvider", # or CPUExecutionProvider for running the job on CPU
)

im = Image.open("path-to-your-input-image")
num_samples = 2
image = pipe(num_samples*[im])
image = image["sample"]
base_path = Path("./")
base_path.mkdir(exist_ok=True, parents=True)
for idx, im in enumerate(image):
    im.save(base_path/f"{idx:06}_onnx.png")

@chuanli11
Copy link
Contributor Author

chuanli11 commented Sep 16, 2022

It is not ready to be merged because the onnx model does not speed things up for GPU. This is not specifically for sd_image but also for the reference huggingface model. See issue here.

The onnx model does give ~35% iteration time when running on CPU. However it is still too slow to be used (GPU can be two orders of magnitude faster)

Model Format CPU CUDA RTX 8000 CUDA RTX 8000 + autocast
PyTorch 10.16s/it 4.57it/s 8.92 it/s
Onnx 6.70s/it 2.23it/s N/A

@chuanli11
Copy link
Contributor Author

chuanli11 commented Sep 16, 2022

Hugginface diffusers has some issue with onnxruntime-gpu installation. See discussion here and here.

@chuanli11
Copy link
Contributor Author

chuanli11 commented Sep 16, 2022

The current onnx pipeline only use onnx for vae and unet, and keep the other parts of the model as PyTorch ckpts

This is due to

  • get_image_features is not a nn.Module so doesn't work with onnx_export. Similar discussion found here. Tried replacing it with CLIPVisionModel, was able to export but caused another error that is related to "TypeError: forward() takes 1 positional argument but 2 were given" during inference.
  • safety_checker onnx model doesn't work with batch size > 1. Haven't looked into the reason.

However, having these modules running in PyTorch (instead of onnx) doesn't seem to have much of a impact on the speed, since the most expensive compute is the diffusion step (unet) and those numbers of the sd_image model (this table) matches the numbers of the reference huggingface model (table below), despite all modules in the huggingface model can be converted into onnx.

Model Format CPU CUDA RTX 8000 CUDA RTX 8000 + autocast
PyTorch 10.16s/it 4.56it/s 8.78 it/s
Onnx 6.64s/it 2.21it/s N/A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant