This repository is the PyTorch implementation of the paper:
Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models (CVPR 2024)
Shweta Mahajan, Tanzila Rahman, Kwang Moo Yi, Leonid Sigal
The following code is based on the Stable-diffusion-repository.
The PH2P code modifies the following modules:
- ddpm.py with the LBFGS optimizer and saving the prompts after each iteration during optimization.
- Patch the
Clip_transformer
tolocalclip_transformer
.LocalCustomTokenEmbedding
with the projection algorithm. - Specify the model path in
main_textual_inversion.py
. - embedding_matrix.pt contains the embeddings for the CLIP vocabulary (vocab.json)
- Download the model checkpoint (stable diffusion v1.4 or v1.5) and save in
models/ldm/stable-diffusion-v1/model.ckpt
For running the prompt inversion specify image path in inversion_config.json
python main_textual_inversion.py
The prompts will be saved in ./logs_forward_pass/
.
The best prompt for a given image is obtained from the maximum clip similarity between the target image and the generated image for a prompt.
This additionally requires transformers 4.25.1
and diffusers 0.12.1
python get_best_text.py
@inproceedings{ph2p2024cvpr,
title = {Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models},
author = {Shweta Mahajan, Tanzila Rahman, Kwang Moo Yi, Leonid Sigal},
booktitle = {CVPR 2024 (To appear)},
year = {2024}
}