forked from openvinotoolkit/openvino.genai
-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
LoRA in Text2ImagePipeline (openvinotoolkit#911)
Co-authored-by: Ilya Lavrenov <[email protected]>
- Loading branch information
1 parent
b11f0d9
commit 41f1e7b
Showing
28 changed files
with
318 additions
and
111 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,76 @@ | ||
# Text to Image C++ Generation Pipeline | ||
|
||
Examples in this folder showcase inference of text to image models like Stable Diffusion 1.5, 2.1, LCM. The application doesn't have many configuration options to encourage the reader to explore and modify the source code. For example, change the device for inference to GPU. The sample features `ov::genai::Text2ImagePipeline` and uses a text prompt as input source. | ||
|
||
There are two sample files: | ||
- [`main.cpp`](./main.cpp) demonstrates basic usage of the text to image pipeline | ||
- [`lora.cpp`](./lora.cpp) shows how to apply LoRA adapters to the pipeline | ||
|
||
Users can change the sample code and play with the following generation parameters: | ||
|
||
- Change width or height of generated image | ||
- Generate multiple images per prompt | ||
- Adjust a number of inference steps | ||
- Play with [guidance scale](https://huggingface.co/spaces/stabilityai/stable-diffusion/discussions/9) (read [more details](https://arxiv.org/abs/2207.12598)) | ||
- (SD 1.x, 2.x only) Add negative prompt when guidance scale > 1 | ||
- Apply multiple different LoRA adapters and mix them with different blending coefficients | ||
|
||
## Download and convert the models and tokenizers | ||
|
||
The `--upgrade-strategy eager` option is needed to ensure `optimum-intel` is upgraded to the latest version. | ||
|
||
It's not required to install [../../requirements.txt](../../requirements.txt) for deployment if the model has already been exported. | ||
|
||
```sh | ||
pip install --upgrade-strategy eager -r ../../requirements.txt | ||
optimum-cli export openvino --model dreamlike-art/dreamlike-anime-1.0 --task stable-diffusion --weight-format fp16 dreamlike_anime_1_0_ov/FP16 | ||
``` | ||
|
||
## Run | ||
|
||
`stable_diffusion ./dreamlike_anime_1_0_ov/FP16 'cyberpunk cityscape like Tokyo New York with tall buildings at dusk golden hour cinematic lighting'` | ||
|
||
### Examples | ||
|
||
Prompt: `cyberpunk cityscape like Tokyo New York with tall buildings at dusk golden hour cinematic lighting` | ||
|
||
![](./512x512.bmp) | ||
|
||
## Supported models | ||
|
||
Models can be downloaded from [HiggingFace](https://huggingface.co/models). This sample can run the following list of models, but not limitied to: | ||
|
||
- [botp/stable-diffusion-v1-5](https://huggingface.co/botp/stable-diffusion-v1-5) | ||
- [stabilityai/stable-diffusion-2](https://huggingface.co/stabilityai/stable-diffusion-2) | ||
- [stabilityai/stable-diffusion-2-1](https://huggingface.co/stabilityai/stable-diffusion-2-1) | ||
- [dreamlike-art/dreamlike-anime-1.0](https://huggingface.co/dreamlike-art/dreamlike-anime-1.0) | ||
- [SimianLuo/LCM_Dreamshaper_v7](https://huggingface.co/SimianLuo/LCM_Dreamshaper_v7) | ||
|
||
## Run with optional LoRA adapters | ||
|
||
LoRA adapters can be connected to the pipeline and modify generated images to have certain style, details or quality. Adapters are supported in Safetensors format and can be downloaded from public sources like [Civitai](https://civitai.com) or [HuggingFace](https://huggingface.co/models) or trained by the user. Adapters compatible with a base model should be used only. A weighted blend of multiple adapters can be applied by specifying multple adapter files with corresponding alpha parameters in command line. Check `lora.cpp` source code to learn how to enable adapters and specify them in each `generate` call. | ||
|
||
Here is an example how to run the sample with a single adapter. First download adapter file from https://civitai.com/models/67927/soulcard page manually and save it as `soulcard.safetensors`. Or download it from command line: | ||
|
||
`wget -O soulcard.safetensors https://civitai.com/api/download/models/72591` | ||
|
||
Then run `lora_stable_diffusion` executable: | ||
|
||
`./lora_stable_diffusion dreamlike_anime_1_0_ov/FP16 'curly-haired unicorn in the forest, anime, line' soulcard.safetensors 0.7` | ||
|
||
The sample generates two images with and without adapters applied using the same prompt: | ||
- `lora.bmp` with adapters applied | ||
- `baseline.bmp` without adapters applied | ||
|
||
Check the difference: | ||
|
||
With adapter | Without adapter | ||
:---:|:---: | ||
![](./lora.bmp) | ![](./baseline.bmp) | ||
|
||
|
||
## Note | ||
|
||
- Image generated with HuggingFace / Optimum Intel is not the same generated by this C++ sample: | ||
|
||
C++ random generation with MT19937 results differ from `numpy.random.randn()` and `diffusers.utils.randn_tensor`. So, it's expected that image generated by Python and C++ versions provide different images, because latent images are initialize differently. Users can implement their own random generator derived from `ov::genai::Generator` and pass it to `Text2ImagePipeline::generate` method. |
Binary file not shown.
File renamed without changes.
File renamed without changes.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
// Copyright (C) 2023-2024 Intel Corporation | ||
// SPDX-License-Identifier: Apache-2.0 | ||
|
||
#include "openvino/genai/text2image/pipeline.hpp" | ||
|
||
#include "imwrite.hpp" | ||
|
||
int32_t main(int32_t argc, char* argv[]) try { | ||
OPENVINO_ASSERT(argc >= 3 && (argc - 3) % 2 == 0, "Usage: ", argv[0], " <MODEL_DIR> '<PROMPT>' [<LORA_SAFETENSORS> <ALPHA> ...]]"); | ||
|
||
const std::string models_path = argv[1], prompt = argv[2]; | ||
const std::string device = "CPU"; // GPU, NPU can be used as well | ||
|
||
ov::genai::AdapterConfig adapter_config; | ||
// Multiple LoRA adapters applied simultaniously are supported, parse them all and corresponding alphas from cmd parameters: | ||
for(size_t i = 0; i < (argc - 3)/2; ++i) { | ||
ov::genai::Adapter adapter(argv[3 + 2*i]); | ||
float alpha = std::atof(argv[3 + 2*i + 1]); | ||
adapter_config.add(adapter, alpha); | ||
} | ||
|
||
// LoRA adapters passed to the constructor will be activated by default in next generates | ||
ov::genai::Text2ImagePipeline pipe(models_path, device, ov::genai::adapters(adapter_config)); | ||
|
||
std::cout << "Generating image with LoRA adapters applied, resulting image will be in lora.bmp\n"; | ||
ov::Tensor image = pipe.generate(prompt, | ||
ov::genai::random_generator(std::make_shared<ov::genai::CppStdGenerator>(42)), | ||
ov::genai::width(512), | ||
ov::genai::height(896), | ||
ov::genai::num_inference_steps(20)); | ||
imwrite("lora.bmp", image, true); | ||
|
||
std::cout << "Generating image without LoRA adapters applied, resulting image will be in baseline.bmp\n"; | ||
image = pipe.generate(prompt, | ||
ov::genai::adapters(), // passing adapters in generate overrides adapters set in the constructor; adapters() means no adapters | ||
ov::genai::random_generator(std::make_shared<ov::genai::CppStdGenerator>(42)), | ||
ov::genai::width(512), | ||
ov::genai::height(896), | ||
ov::genai::num_inference_steps(20)); | ||
imwrite("baseline.bmp", image, true); | ||
|
||
return EXIT_SUCCESS; | ||
} catch (const std::exception& error) { | ||
try { | ||
std::cerr << error.what() << '\n'; | ||
} catch (const std::ios_base::failure&) {} | ||
return EXIT_FAILURE; | ||
} catch (...) { | ||
try { | ||
std::cerr << "Non-exception object thrown\n"; | ||
} catch (const std::ios_base::failure&) {} | ||
return EXIT_FAILURE; | ||
} |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.