-
Notifications
You must be signed in to change notification settings - Fork 390
Chroma support (pruned Flux model) #696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Ok, running the Vulkan build with preview on, I can say there's something very wrong going on. Sometimes (very rarely, I got this like twice in a hundred tests) the output looks correct after the first step, then it turns to noise, then a full black image (Probably NaN/inf). About half of the time it looks like noise from the first step already and then turns black after a few steps. The rest of the time it starts off with a black image and stays like that. It's extremely inconsistent. Edit: it seems inconsistent on CPU too, but it works more often |
Ran it on cuda. edit: worth noting is that I do have d20f77f |
@Green-Sky Is it the same broken image everytime you run it with the same settings, or is it inconsistent too? |
@stduhpf Ran it on CUDA too and I got this, it's inconsistent too, I ran it 10 times and I got same thing each time |
Yeah that's odd. Why would it vary? It's supposed to be deterministic |
I tried compiling it with ubsan and asan, but
and thats the first thing that happens. looks like an upstream issue. also we should update ggml <.< Oh and using cpu backend, it crashes with Details
|
Thank you for trying that @Green-Sky . I believe it's working now. Vulkan backend, 16 steps with cfg (cfg_scale =4), so 32 forward passes without anything breaking: |
Ok it's very important to keep the distilled guidance scale to 0. For some reason the model still accepts it as an input, but it completely breaks apart if it's not zero (I double checked it with ComfyUI, it's not an issue with my code). Maybe I should just force it to zero for chroma to keep things simple? |
Go for it. Down the line we should put recommended/forced values into the gguf file. |
The q4_k is hurting it somewhat, as expected.
edit: I used v33 |
Keeping the v32:
Which begs the question: should we condsider making the convert tool "smarter", like it is in llama.cpp, with different quant types depending on the role of each tensor? |
We should. I find this model with current sd.cpp quantization incredibly hard to prompt too, but that is probably the token padding / masking. This is supposed to be q5_k quality, normal flux looks way better, even flux light looks better. |
Hi @stduhpf and @Green-Sky , Apologies for the question, but is this branch intended to work solely with vanilla master and master's version of ggml? I've been maintaining a little script for personal use that merges specific branches from here and there to get a more up-to-date build, and no matter what I do I can't get this branch to work, either on its own or with other branches. Here's the extract from my script that shows the current state of what I have generally been merging in (with a few of them commented out, as you can see, but I've included them anyway since they do work in some other combinations): BRANCHES="zhouwg/sync_with_latest_ggml"
BRANCHES="${BRANCHES} wbruna/fix_sdxl_lora"
BRANCHES="${BRANCHES} stduhpf/sdxl-embd"
BRANCHES="${BRANCHES} stduhpf/tiled-vae-encode"
BRANCHES="${BRANCHES} stduhpf/imatrix"
BRANCHES="${BRANCHES} stduhpf/lcppT5"
BRANCHES="${BRANCHES} stduhpf/unchained"
BRANCHES="${BRANCHES} stduhpf/dt"
# BRANCHES="${BRANCHES} stduhpf/diffusers"
BRANCHES="${BRANCHES} stduhpf/override-te"
BRANCHES="${BRANCHES} stduhpf/concat-controls"
# BRANCHES="${BRANCHES} stduhpf/ip2p"
BRANCHES="${BRANCHES} ImKyra/master"
# BRANCHES="${BRANCHES} Green-Sky/large_file_hardening"
# BRANCHES="${BRANCHES} rmatif/sigmas" Now, I already suspected that this branch would probably not work directly in my script, since you said, "I had to update GGML to get bf16 to work on vulkan", and that probably conflicts with Now, Chroma being a Flux derivative, I should also mention that for a while now I've not been able to get Flux to work either; the problems are generally of the form: A gajillion of these: [ERROR] model.cpp:1938 - tensor 'first_stage_model.decoder.conv_in.bias' not in model file or a gajillion of these (with the [INFO ] model.cpp:1897 - unknown tensor 'model.diffusion_model.model.diffusion_model.double_blocks.0.img_attn.norm.key_norm.scale | f8_e4m3 | 1 [128, 1, 1, 1, 1]' in model file (the doubled prefix probably indicating that something has automatically prefixed followed by: [ERROR] stable-diffusion.cpp:441 - load tensors from model loader failed From looking at And that's the point where I give up, since I don't know enough about the expected naming of the tensors. Which also brings us to the elephant in the room that you're all too polite to talk about :-) : vis-a-vis #686 , given that I'm probably not the only one with the above problems, I'm sure that nobody would take offence if a temporary (friendly, and prominently attributed) fork were created to contain suitably-approved merged and conflict-resolved branches from all the developers who have pending PRs, as well as lots of useful work that is currently being duplicated across lots of forks. |
@kgigitdev I feel you. For my use case I depend on the webserver api.
This does not seem to be in this pr.
If we ended up doing this, I would ask @ggerganov to host that project at the ggml org. |
Hi @Green-Sky , Thanks for your swift answer.
Gosh, yes, I hadn't even thought of that option. I think I had subconsciously assumed that stable-diffusion.cpp was only ever on the periphery of the llama.cpp people, because llama == serious work whereas stable diffusion == frippery and frivolity. |
That probably means the VAE is missing, or that the tensors from the VAE can't be found because their names are not the ones expected.
Yes, the doubled prefix means the tensor names are already prefixed in the model file you're using, which means you should use |
FWIW, I've been working on an option for sd.cpp to choose the quant by tensor pattern, a la llama.cpp's overridetensors: e128cfa . The conversion itself already works; could be useful for testing. |
@LostRuins Are you using flash attention or not? Because after looking quickly into it, it seems like flash attention only works with either causal attention masks or no mask at all, and in the case of Chroma, the mask is weirdly shaped. |
That requires |
Then I'm clueless. |
Alright, well we'll go with whatever solution you think works best. I'm just rather surprised at how wonky the chroma outputs are. Maybe it is just the model. But I think you'd agree that these are a big step down from Flux subjectively speaking. |
Anyway I believe there's still something wrong, both with/without the final commit in this PR, but I can't put my finger on it. Sometimes the object in the image is very clear and looks fine, other times there are just weird artifacts in some way or another. I have no idea if it's just the model or something else is wrong. Sometimes the exact same prompt gives good results for one seed and bad results with another. The prompt for the above was "cat" with various seeds and no negative prompt at 512x512px |
As far as I'm concerned, this is an issue with the model itself. It doesn't seem to handle short prompts and/or low resolutions very well. |
Ah i see. I don't use comfy so I can't compare. In that case, cool. Should be good to go then |
I was going to say the same thing. Generally, when it struggles like that, it's most likely because it doesn't handle low resolution well, you can experience that with other models too |
Still 512x512, with an "ai upscaled" prompt (i used a LLM to come up with a prompt for a "cat" picture):
|
Hmm actually I'm staring to think T5 might not be supposed to use masking by default. I thought it was, but it looks like the ComfyUI implementation doesn't use it. The model is still broken with short prompts and low resolutions when t5 masking is disabled, but not quite as much (just like the Comfyui implementation). |
Will this change affect regular flux too? Or just chroma |
Just Chroma. Regular Flux doesn't use any kind of attention masking |
Applied your patch. Same mashed evil cat seed as #696 (comment) , looks slightly less mashed now. But this cat still looks like it needs to visit a vet. Edit: I spoke too soon. Once I added "ugly" as a negative prompt I got this monstrosity. |
@LostRuins I'm confused. These look like this on my end ( 768x768, cfg scale 5, euler sampling, seed 777800, chroma-unlocked-v35-Q4_0.gguf (silveroxides), t5xxl_fp8_e4m3fn.safetensors) It's far from perfect either, but it's clearly not as broken. Could it be something weird happening on koboldcpp end? Have you tried running it from sdcpp cli? |
It's strange that the text on the sign is almost perfectly identical, but the cat's faces are completely different. |
Yeah. So strange. I am using the same HEAD commit as this, so perhaps there are some variances with either the quant/t5xxl i am using, or perhaps hardware related precision errors? Strangely all of these are related to chroma. Here's compared with Flux Dev 1.
Flux Dev 1 Result: Btw I did notify the creator of Chroma as feedback, and got this reply...
|
@stduhpf I'm planning to integrate your chroma PR into koboldcpp soon, there's been quite a lot of back-and-forth with these different approaches. What do you think are the current best configs from what we've tried? |
I think the current default one (no t5 masking, Dit masking with 1 padding included) should be the best but I'm not very confident about it anymore. |
@LostRuins , may I suggest targeting the bleedingedge release for Koboldcpp? Apart from Chroma support, it also includes many pending fixes for sd.cpp (like that VAE tile initialization bug), and both Koboldcpp and sd-server would benefit from any fixes needed for running with a persistent model in memory. |
Yes I have merged the VAE tile fix, what else is there? I don't see any |
From the top of my head there is also #484 and #681 (apart from stuff that you probably fixed directly on Koboldcpp already, like #658).
https://github.com/stduhpf/stable-diffusion.cpp/commits/bleedingedge/ , with several tagged releases. |
681 isnt needed for kobo because kobo doesn't load ckpt or diffusers models, only gguf and safetensors. I'd rather not rebase off a significantly different codebase as it'll require a bunch of testing to ensure everything works and I might have to deal with possible regressions. Instead if there are any critical or important issues I can patch those in manually. |
I gave an example on LostRuins/koboldcpp#1603, to avoid more unrelated discussions on this PR.
Fair enough. |
Alright I added the tiling fix |
// TODO: not hardcoded? | ||
const int single_blocks_count = 38; | ||
const int double_blocks_count = 19; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like I'll have to implement this todo:
https://huggingface.co/lodestones/flux-ultra-lite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tensor 'model.diffusion_model.distilled_guidance_layer.in_proj.weight' has wrong shape in model file: got [32, 5120, 1, 1], expected [64, 5120, 1, 1]
That's odd. Is it using a different VAE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm looks like it's not quite the same architecture as chroma. There a similarities, but somehow the "time_in" "vector_in" and "guidance_in" are back, as well as the modulation for double blocks only. And the distilled_guidance_layer doesn't have the same input shape...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gotta wait for more info then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My current guess is that it's using the double blocks from flux lite and the single blocks from Chroma.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice. Hoping your earlier commits get merged by leejet soon.
https://huggingface.co/lodestones/Chroma
Chroma is a Flux model with modulation layers pruned off, which makes it fit in a lower memory footprint. Unlike Flux, it doesn't use Clip-L, only t5-xxl.
Usage
Advanced usage
The following environment variables can be set to change the behavior:
--guidance 0
arg seems to break inference)(closes #690)