Feature/ Better LoRA : Dropout, Conv2d #133

cloneofsimo · 2023-01-13T21:05:34Z

So unlike classical LLMs, LDM also has other many modules as well. Arguably, it seems like many "important" features come from Resnet features. This is clearly demonstrated by, for example, plug-and-play prior

Check out the details on this paper : https://pnp-diffusion.github.io/

Natural question to ask is : Does dreambooth yield fine-grained level of details, such as eyes or skins, because it is able to tune resnets? Is Q, K, V, O simply not enough?

In this PR I will try to answer this questions with bunch of experiments.

Here are some initial results :

This result is LoRA rank 4, with "CrossAttention", "Attention", "GEGLU" Replaced

Now, this is LoRA rank 1, with "ResnetBlock2D", "CrossAttention", "Attention", "GEGLU" Replaced. Now, number of parameters are now about as 2 times higher : as high as 9.7MB. Of which, only 2MB is LoRA of "CrossAttention", "Attention", "GEGLU", so I suspect that if it helps, we might make LoRA of TR rank 4 and LoRA of Resnet rank 1.

All trained with same number of steps, sampled with same parameters. This case, it looks like its a tie. I'll try on other models as well. I think it is a good time to start implementing fidelity metrics as well, instead of CLIP alignment score.

v0.1.1

brian6091 · 2023-01-13T22:30:49Z

Excellent!

cloneofsimo · 2023-01-14T00:05:56Z

Been testing with rank 4 of resnet as well : but the result seems only marginally better. Of course this is totally subjective so it just doesn't worth the size increase.
Next step would be compare all sets : MLP, Attention, Resnet, train them one at a time to compare the results.

hafriedlander · 2023-01-14T01:22:44Z

Definitely interesting experiments.

What parameter are we optimizing for, total binary size, training speed, multiple concept accuracy, something else?

(The reason I ask is my first thought was "how does this compare to just rank 16 on CrossAttention / Attention / GEGLU")

Is there some way we can auto-detect when the rank is insufficient? (Like maybe a flat gradient when training while still having a high error rate).

hafriedlander · 2023-01-14T01:25:33Z

Also, just from a end-user POV, the greatest strength of LoRA for me is the easy adjust-ability of strength of the various adjustments. It'd be interesting to see how useful post-training adjustment of individual model LoRA weights was.

cloneofsimo · 2023-01-14T02:01:11Z

So I think it is kindof constraint optimization at this point. We just don't want output too large, but quality as dreambooth as high as possible. But the objective is also mixed. distortion, perceptual fidelity, and editability is all the performance we desire, but they have tradeoff relationship. perceptual fidelity is also kinda ill-defined, as CLIP score doesn't seem to represent that very well, unlike custom diffusion or textual inversion paper would like to suggest.

cloneofsimo · 2023-01-14T02:03:17Z

It seems like what many people are looking for can be simply described : identity preservation + editability.
Community seems to have high preference for usage on faces, and thus facial identity preservation is one thing they really want.
Second is that it has to have high editability as well, which I think CLIP score can well describe.

cloneofsimo · 2023-01-14T02:09:04Z

Now @brian6091 and I had an idea on making the LoRA on the resent part only on the upsample unet layers, because downsample parts are kinda used to compress representation and not to generate with fidelity. So we'll see if it works better. @brian6091 are you going to leave a PR or it just on your own thing?

cloneofsimo · 2023-01-14T02:09:39Z

I've added dropout as well so this PR is no longer only about conv layer. I'll rename it

cloneofsimo · 2023-01-14T02:32:48Z

Looks like adding dropout helps! Just like in the paper, I've used dropout with 0.1. Same seed, everything.

ExponentialML · 2023-01-14T04:47:07Z

Your work and contributions are very underrated. Great stuff @cloneofsimo!

Edit: I'm doing some testing of my own with these changes, and Better LoRA is a very big understatement.
I genuinely believe that you are well on your way to making this the new standard of fine tuning SD. Well done.

brian6091 · 2023-01-14T06:37:06Z

Yeah, I think the ability to do ablation experiments will be super interesting. There are subjective differences between the different components (CrossAttention, FFN for example) that may be hard to capture with objective metrics (but come out with more complex prompting). But from the end-user perspective I think being able to define your own objective and having the tools to achieve that is ideal.

brian6091 · 2023-01-14T06:38:32Z

I'll work the scale/nonlinearity code back in once you've stabilized this (PR #111). Also subtle effects, but worth it IMO for the trivial cost.

brian6091 · 2023-01-14T06:40:50Z

Now @brian6091 and I had an idea on making the LoRA on the resent part only on the upsample unet layers, because downsample parts are kinda used to compress representation and not to generate with fidelity. So we'll see if it works better. @brian6091 are you going to leave a PR or it just on your own thing?

I can leave a PR.

cloneofsimo · 2023-01-15T03:28:47Z

Done various experiments, but more needs to be done. I've got mixed results in my case, so i'll add these options as optional for now. using dropout does make a difference though.

cloneofsimo · 2023-01-15T04:35:33Z

Note:

I've found that having resnet trained requires very low learning rate : something like 5e-6 for me.

okaris · 2023-01-16T09:13:03Z

@cloneofsimo First of all thank you for creating this repo. I've been tinkering with LoRA but can't say it's faster than (let alone twice as fast) than Dreambooth. Could you share the args you used for the above examples please?

cloneofsimo · 2023-01-16T10:17:32Z

Is this related to conv loras? Or just lora pti in general? @okaris

okaris · 2023-01-16T11:34:39Z

The comment, lora pti in general. I can open a new issue for that. The question, I'd really like to know the settings you used for the above trainings and how long they took. Thanks!

cloneofsimo · 2023-01-16T11:49:01Z

It's been a while since I've evaluated time to train these models, but these take < 6min in general. I think they aren't as fast as the previous ones (currently in training scripts folder), because lora_pti is not optimized for speed and memory since no 8bit adam + xformers are tested. It is the textual inversion part that takes a long time since they are currently done with full precision.

cloneofsimo · 2023-01-16T11:50:04Z

I am continuing to optimize for perceptual performance first, and README is bit misguiding because they were not based on lora_pti scripts. Better fix that.

ExponentialML · 2023-01-17T05:18:06Z

Hey @cloneofsimo. Is nn.Conv2d needed here when using extended unet?

lora/lora_diffusion/lora.py

Line 565 in 583b1e7

model, target_replace_module, search_class=[nn.Linear]

cloneofsimo · 2023-01-17T05:32:26Z

Ah, these + also other tools are currently unsupported for lora conv2d. Rests are coming in as a feature soon.
Meanwhile, patch_pipe will take care with no problem

ExponentialML · 2023-01-17T05:50:13Z

Sweet, thanks!

cloneofsimo and others added 4 commits January 10, 2023 05:15

Merge pull request #130 from cloneofsimo/develop

e19f6ae

v0.1.1

feat : adds conv2d support

7bb97a3

conv2d as extended target replace on PTI

5062fd4

git update

7dca434

cloneofsimo changed the base branch from master to develop January 13, 2023 21:09

cloneofsimo self-assigned this Jan 13, 2023

cloneofsimo requested a review from hafriedlander January 13, 2023 21:12

scale conv2d as well

204d6ed

adds dropout

583b1e7

cloneofsimo changed the title ~~Feat : LoRA for convolutional layer as well~~ Feature/ Better LoRA : Dropout, Conv2d Jan 14, 2023

cloneofsimo added 2 commits January 15, 2023 03:27

minor bug fix

4e2612b

version up

770b97a

cloneofsimo merged commit be1ef54 into develop Jan 15, 2023

cloneofsimo deleted the convlora branch January 16, 2023 00:47

ExponentialML mentioned this pull request Jan 30, 2023

Update LoRA with Dropout & Conv2d Support d8ahazard/sd_dreambooth_extension#880

Merged

cloneofsimo mentioned this pull request Mar 8, 2023

Some thoughts, sharing my work here KohakuBlueleaf/LyCORIS#12

Closed

kovalexal mentioned this pull request May 17, 2023

LoRA for Conv2d layer, script to convert kohya_ss LoRA to PEFT huggingface/peft#461

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/ Better LoRA : Dropout, Conv2d #133

Feature/ Better LoRA : Dropout, Conv2d #133

cloneofsimo commented Jan 13, 2023 •

edited

Loading

brian6091 commented Jan 13, 2023

cloneofsimo commented Jan 14, 2023

hafriedlander commented Jan 14, 2023

hafriedlander commented Jan 14, 2023

cloneofsimo commented Jan 14, 2023

cloneofsimo commented Jan 14, 2023

cloneofsimo commented Jan 14, 2023

cloneofsimo commented Jan 14, 2023

cloneofsimo commented Jan 14, 2023

ExponentialML commented Jan 14, 2023 •

edited

Loading

brian6091 commented Jan 14, 2023 •

edited

Loading

brian6091 commented Jan 14, 2023 •

edited

Loading

brian6091 commented Jan 14, 2023

cloneofsimo commented Jan 15, 2023

cloneofsimo commented Jan 15, 2023

okaris commented Jan 16, 2023

cloneofsimo commented Jan 16, 2023

okaris commented Jan 16, 2023

cloneofsimo commented Jan 16, 2023

cloneofsimo commented Jan 16, 2023

ExponentialML commented Jan 17, 2023 •

edited

Loading

cloneofsimo commented Jan 17, 2023

ExponentialML commented Jan 17, 2023

Feature/ Better LoRA : Dropout, Conv2d #133

Feature/ Better LoRA : Dropout, Conv2d #133

Conversation

cloneofsimo commented Jan 13, 2023 • edited Loading

brian6091 commented Jan 13, 2023

cloneofsimo commented Jan 14, 2023

hafriedlander commented Jan 14, 2023

hafriedlander commented Jan 14, 2023

cloneofsimo commented Jan 14, 2023

cloneofsimo commented Jan 14, 2023

cloneofsimo commented Jan 14, 2023

cloneofsimo commented Jan 14, 2023

cloneofsimo commented Jan 14, 2023

ExponentialML commented Jan 14, 2023 • edited Loading

brian6091 commented Jan 14, 2023 • edited Loading

brian6091 commented Jan 14, 2023 • edited Loading

brian6091 commented Jan 14, 2023

cloneofsimo commented Jan 15, 2023

cloneofsimo commented Jan 15, 2023

Note:

okaris commented Jan 16, 2023

cloneofsimo commented Jan 16, 2023

okaris commented Jan 16, 2023

cloneofsimo commented Jan 16, 2023

cloneofsimo commented Jan 16, 2023

ExponentialML commented Jan 17, 2023 • edited Loading

cloneofsimo commented Jan 17, 2023

ExponentialML commented Jan 17, 2023

cloneofsimo commented Jan 13, 2023 •

edited

Loading

ExponentialML commented Jan 14, 2023 •

edited

Loading

brian6091 commented Jan 14, 2023 •

edited

Loading

brian6091 commented Jan 14, 2023 •

edited

Loading

ExponentialML commented Jan 17, 2023 •

edited

Loading