-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for premade checkpoints #24
Comments
Ok did some further investigation. For these they are all based on the default inference configs set inside
The inference config files:
These lines read the config in and load/call the modules in the config. Classes inside CompVis/stable-diffusion
Inside this repo we have src/pipelines/stable_diffusion.rs which has loaders the VAE, UNet, Autoencoder, Scheduler, and CLIP (CLIP for 1.x and OpenCLIP for 2.x). Is this something where the LatentDiffusion model needs to be made to support this? |
Ok, so it seems these checkpoint files have all the components inside, and are named in a way to rebuild from the python modules. I took some time reading through the diffusers library some more (wasn't using this before, just the CompVis forks mostly). HF diffusers has a To put these together, they have Diffusion pipelines. This allows various diffusion pipelines to be made, including the Stable Diffusion pipeline. The Stable Diffusion pipeline works to bring together these parts. This library has a stable_diffusion pipeline. So, to get premade checkpoints that include all these various components in the pipeline, there needs to be a translation process that converts all the named parameters in the checkpoint to the right components (autoencoder, unet). I started to try and piece together the names to possible internal representation, but I do not have a clear enough picture to process it. For example
Might be inside this data structure: UNetMidBlock2DCrossAttn {
attn_resnets: vec![(
SpatialTransformer {
transformer_blocks: vec![BasticTransformerBlock {
attn1
ff
attn2: CrossAttention {
to_q
to_k
to_v
to_out: nn::Linear {
ws: Tensor
bs: Option<Tensor>
},
},
norm1
norm2
norm3
}, ...]
}, _
)]
} But some / a few of the conversions are less clear. Well, a little closer in my head, but let me know if any opinions about this thought process. Seems doable, but the translating between libraries and python/rust is taking me some time to understand. |
Read a helpful post about Latent Diffusion that documented over the CompVis version, which highlighted what certain things are in the checkpoint files. Now it seems to all match up. Only thing that's a little off is there are no resnet inside the CompVis version but its there inside the local unet.
matches up with clip model
Matches the
Matches
Now, looking at that, how would we actually load this in? We are using |
Not sure to understand the full context here but my guess is that the easiest would be on the python side to perform the renaming and write separate |
That sounds good! I will write a python script to convert it over to the various |
I was working through making a conversion script when I realized that someone might have already made a script to convert. So went to google and found that diffusers has conversion scripts for the various formats. The scripts here should be good to convert over any of the other versions into a diffusers version. Since we want individual files, I will be trying to get the script to run in 1 step instead of 2 (instead of making a |
The conversion scripts take your checkpoint and extracts it into a hugging face diffusers format with many folders, configuration json files.
Which then has the extracted Downside to this approachCreates files we don't need in many cases like the text_encoder (for general Stable Diffusion uses CLIP or OpenCLIP), safety_checker (which detects NSFW, and other potentially harmful content) which can produce another couple of gigs of files. It creates a 2-step process to make the hugging face version and then convert the For me, I need the fp16 unet to fit onto my GPU so this would require another step of converting a fp32 over, which I haven't gotten working yet. UpsideMany models seem to use this conversion to put them up on hugging face, so we can just download these files from their extraction and use the unet and vae parts. Then a matter of converting to So for me the biggest issue is converting it from fp32 to fp16 at the moment and then the rest seems doable. After I get this working, I will write a little walkthrough for other users to utilize. |
Would it be possible to use models based on the CompVis style used by stabilityai and supported in HF diffusers? My personal goals are:
I tried the following to convert the file over, and got the names of the tensors using the tensor tools. Maybe these can be extracted and compiled back together?
Full list analog-diffusion-1.0.ot.log
Thanks!
The text was updated successfully, but these errors were encountered: