Chroma support (pruned Flux model) #696

stduhpf · 2025-05-28T16:29:06Z

https://huggingface.co/lodestones/Chroma

Chroma is a Flux model with modulation layers pruned off, which makes it fit in a lower memory footprint. Unlike Flux, it doesn't use Clip-L, only t5-xxl.

Usage

.\build\bin\Release\sd.exe --diffusion-model .models\diffusion-models\chroma-unlocked-v33-Q5_0.gguf --t5xxl .\models\t5\t5xxl_q4_k.gguf --vae .\models\vae\flux\ae.f16.gguf -p "A Cute cat holding a sign that says: \`"Stable diffusion.cpp Now supports Chroma!\`"" --cfg-scale 4 --sampling-method euler --vae-tiling -W 1024 -H 1024

Advanced usage

The following environment variables can be set to change the behavior:

SD_CHROMA_USE_T5_MASK (default to "OFF")
SD_CHROMA_USE_DIT_MASK (default to "ON")
SD_CHROMA_MASK_PAD_OVERRIDE (default to 1)
SD_CHROMA_ENABLE_GUIDANCE (default to "OFF", setting it to "ON" without --guidance 0 arg seems to break inference)

(closes #690)

stduhpf · 2025-05-30T13:01:12Z

Huh, It seems to kind of work (slowly) on CPU backend... Like the image is not good, but at least it kind of looks like the prompt if you squint your eyes. Maybe there's an issue with the Vulkan implementation of GGML, or there's somthing I'm doing that breaks when using GPU?

Can someone test with Cuda?

prompt:

'Extreme close-up photograph of a single tiger eye, direct frontal view. The iris is very detailed and the pupil resembling a dark void. The word "Chroma" is across the lower portion of the image in large white stylized letters, with brush strokes 
resembling those made with Japanese calligraphy. Each strand of the thick fur is highly detailed and distinguishable. Natural lighting to capture authentic eye shine and depth.'

(--cfg-scale 1 --sampling-method euler --vae-tiling --steps 16 --guidance 0)

(same settings produce a black image on Vulkan)

stduhpf · 2025-05-30T13:21:08Z

Still not working on Vulkan, but at least you don't have to squint to see the CPU result:

stduhpf · 2025-05-30T15:45:25Z

Ok, running the Vulkan build with preview on, I can say there's something very wrong going on. Sometimes (very rarely, I got this like twice in a hundred tests) the output looks correct after the first step, then it turns to noise, then a full black image (Probably NaN/inf). About half of the time it looks like noise from the first step already and then turns black after a few steps. The rest of the time it starts off with a black image and stays like that. It's extremely inconsistent.

Edit: it seems inconsistent on CPU too, but it works more often

stduhpf · 2025-05-30T15:58:41Z

Yippie I finally got a non-black image with Vulkan

Green-Sky · 2025-05-30T16:30:30Z

Ran it on cuda.

(different prompt, but I dont think thats important 😄 )

edit: worth noting is that I do have d20f77f

stduhpf · 2025-05-30T16:40:11Z

@Green-Sky Is it the same broken image everytime you run it with the same settings, or is it inconsistent too?

rmatif · 2025-05-30T16:43:07Z

@stduhpf Ran it on CUDA too and I got this, it's inconsistent too, I ran it 10 times and I got same thing each time

Green-Sky · 2025-05-30T16:48:25Z

It slightly varies.

stduhpf · 2025-05-30T16:49:43Z

Yeah that's odd. Why would it vary? It's supposed to be deterministic

Green-Sky · 2025-05-30T17:32:00Z

I tried compiling it with ubsan and asan, but

[DEBUG] stable-diffusion.cpp:165  - Using CUDA backend
ggml_cuda_init: failed to initialize CUDA: out of memory
ggml_backend_cuda_init: invalid device 0
[DEBUG] stable-diffusion.cpp:188  - Using CPU backend

and thats the first thing that happens. looks like an upstream issue. also we should update ggml <.<

Oh and using cpu backend, it crashes with ERROR: AddressSanitizer: heap-use-after-free

Details

==84706==ERROR: AddressSanitizer: heap-use-after-free on address 0x61d000006e80 at pc 0x7f4358071892 bp 0x7ffc2a93a850 sp 0x7ffc2a93a010
READ of size 1376 at 0x61d000006e80 thread T0
    #0 0x7f4358071891 in __interceptor_memcpy (/nix/store/mhd0rk497xm0xnip7262xdw9bylvzh99-gcc-13.3.0-lib/lib/libasan.so.8+0x71891)
    #1 0x8217f7 in memcpy /nix/store/81awch8mhqanda1vy0c09bflgra4cxh0-glibc-2.40-66-dev/include/bits/string_fortified.h:29
    #2 0x8217f7 in ggml_backend_cpu_buffer_set_tensor /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/ggml/src/ggml-backend.cpp:1877
    #3 0x821fda in ggml_backend_tensor_set /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/ggml/src/ggml-backend.cpp:266
    #4 0x48eef9 in GGMLRunner::cpy_data_to_backend_tensor() /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/ggml_extend.hpp:1138
    #5 0x48efa6 in GGMLRunner::compute(std::function<ggml_cgraph* ()>, int, bool, ggml_tensor**, ggml_context*) /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/ggml_extend.hpp:1239
    #6 0x48f62d in Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >) /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/flux.hpp:1115
    #7 0x4987b6 in FluxModel::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, int, std::vector<ggml_tensor*, std::allocator<ggml_tensor*> >, float, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >) /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/diffusion_model.hpp:178
    #8 0x49ab92 in StableDiffusionGGML::sample(ggml_context*, ggml_tensor*, ggml_tensor*, SDCondition, SDCondition, ggml_tensor*, float, float, float, float, float, sample_method_t, std::vector<float, std::allocator<float> > const&, int, SDCondition, std::vector<int, std::allocator<int> >, float, float, float, ggml_tensor*)::{lambda(ggml_tensor*, float, int)#1}::operator()(ggml_tensor*, float, int) const /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/stable-diffusion.cpp:883
    #9 0x49b309 in ggml_tensor* std::__invoke_impl<ggml_tensor*, StableDiffusionGGML::sample(ggml_context*, ggml_tensor*, ggml_tensor*, SDCondition, SDCondition, ggml_tensor*, float, float, float, float, float, sample_method_t, std::vector<float, std::allocator<float> > const&, int, SDCondition, std::vector<int, std::allocator<int> >, float, float, float, ggml_tensor*)::{lambda(ggml_tensor*, float, int)#1}&, ggml_tensor*, float, int>(std::__invoke_other, StableDiffusionGGML::sample(ggml_context*, ggml_tensor*, ggml_tensor*, SDCondition, SDCondition, ggml_tensor*, float, float, float, float, float, sample_method_t, std::vector<float, std::allocator<float> > const&, int, SDCondition, std::vector<int, std::allocator<int> >, float, float, float, ggml_tensor*)::{lambda(ggml_tensor*, float, int)#1}&, ggml_tensor*&&, float&&, int&&) /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/invoke.h:61
    #10 0x49b309 in std::enable_if<is_invocable_r_v<ggml_tensor*, StableDiffusionGGML::sample(ggml_context*, ggml_tensor*, ggml_tensor*, SDCondition, SDCondition, ggml_tensor*, float, float, float, float, float, sample_method_t, std::vector<float, std::allocator<float> > const&, int, SDCondition, std::vector<int, std::allocator<int> >, float, float, float, ggml_tensor*)::{lambda(ggml_tensor*, float, int)#1}&, ggml_tensor*, float, int>, ggml_tensor*>::type std::__invoke_r<ggml_tensor*, StableDiffusionGGML::sample(ggml_context*, ggml_tensor*, ggml_tensor*, SDCondition, SDCondition, ggml_tensor*, float, float, float, float, float, sample_method_t, std::vector<float, std::allocator<float> > const&, int, SDCondition, std::vector<int, std::allocator<int> >, float, float, float, ggml_tensor*)::{lambda(ggml_tensor*, float, int)#1}&, ggml_tensor*, float, int>(StableDiffusionGGML::sample(ggml_context*, ggml_tensor*, ggml_tensor*, SDCondition, SDCondition, ggml_tensor*, float, float, float, float, float, sample_method_t, std::vector<float, std::allocator<float> > const&, int, SDCondition, std::vector<int, std::allocator<int> >, float, float, float, ggml_tensor*)::{lambda(ggml_tensor*, float, int)#1}&, ggml_tensor*&&, float&&, int&&) /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/invoke.h:114
    #11 0x49b309 in std::_Function_handler<ggml_tensor* (ggml_tensor*, float, int), StableDiffusionGGML::sample(ggml_context*, ggml_tensor*, ggml_tensor*, SDCondition, SDCondition, ggml_tensor*, float, float, float, float, float, sample_method_t, std::vector<float, std::allocator<float> > const&, int, SDCondition, std::vector<int, std::allocator<int> >, float, float, float, ggml_tensor*)::{lambda(ggml_tensor*, float, int)#1}>::_M_invoke(std::_Any_data const&, ggml_tensor*&&, float&&, int&&) /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/std_function.h:290
    #12 0x4801fa in std::function<ggml_tensor* (ggml_tensor*, float, int)>::operator()(ggml_tensor*, float, int) const /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/std_function.h:591
    #13 0x4c36ff in sample_k_diffusion /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/denoiser.hpp:543
    #14 0x4c63df in StableDiffusionGGML::sample(ggml_context*, ggml_tensor*, ggml_tensor*, SDCondition, SDCondition, ggml_tensor*, float, float, float, float, float, sample_method_t, std::vector<float, std::allocator<float> > const&, int, SDCondition, std::vector<int, std::allocator<int> >, float, float, float, ggml_tensor*) /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/stable-diffusion.cpp:994
    #15 0x478088 in generate_image(sd_ctx_t*, ggml_context*, ggml_tensor*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, int, float, float, float, int, int, sample_method_t, std::vector<float, std::allocator<float> > const&, long, int, sd_image_t const*, float, float, bool, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::vector<int, std::allocator<int> >, float, float, float, ggml_tensor*) /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/stable-diffusion.cpp:1454
    #16 0x478948 in txt2img /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/stable-diffusion.cpp:1601
    #17 0x4242aa in main /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/examples/cli/main.cpp:948
    #18 0x7f432f23227d in __libc_start_call_main (/nix/store/maxa3xhmxggrc5v2vc0c3pjb79hjlkp9-glibc-2.40-66/lib/libc.so.6+0x2a27d) (BuildId: ff927b1b82bf859074854af941360cb428b4c739)
    #19 0x7f432f232338 in __libc_start_main_alias_1 (/nix/store/maxa3xhmxggrc5v2vc0c3pjb79hjlkp9-glibc-2.40-66/lib/libc.so.6+0x2a338) (BuildId: ff927b1b82bf859074854af941360cb428b4c739)
    #20 0x40d324 in _start (/nix/store/7kva3sp08b4pl8ll40lchlnnqr061nqn-stable-diffusion.cpp/bin/sd+0x40d324)

0x61d000006e80 is located 0 bytes inside of 2048-byte region [0x61d000006e80,0x61d000007680)
freed by thread T0 here:
    #0 0x7f43580dda88 in operator delete(void*, unsigned long) (/nix/store/mhd0rk497xm0xnip7262xdw9bylvzh99-gcc-13.3.0-lib/lib/libasan.so.8+0xdda88)
    #1 0x4496fb in std::__new_allocator<float>::deallocate(float*, unsigned long) /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/new_allocator.h:172
    #2 0x4e2d34 in Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}::operator()() const /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/flux.hpp:1112
    #3 0x4e2da7 in ggml_cgraph* std::__invoke_impl<ggml_cgraph*, Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}&>(std::__invoke_other, Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}&) /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/invoke.h:61
    #4 0x4e2da7 in std::enable_if<is_invocable_r_v<ggml_cgraph*, Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}&>, ggml_cgraph*>::type std::__invoke_r<ggml_cgraph*, Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}&>(Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}&) /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/invoke.h:114
    #5 0x4e2da7 in std::_Function_handler<ggml_cgraph* (), Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}>::_M_invoke(std::_Any_data const&) /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/std_function.h:290

previously allocated by thread T0 here:
    #0 0x7f43580dcb88 in operator new(unsigned long) (/nix/store/mhd0rk497xm0xnip7262xdw9bylvzh99-gcc-13.3.0-lib/lib/libasan.so.8+0xdcb88)
    #1 0x44bb97 in std::__new_allocator<float>::allocate(unsigned long, void const*) /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/new_allocator.h:151
    #2 0x4e2d34 in Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}::operator()() const /build/lmv0mga84ra6kphw68vscpbilwqpi3hb-source/flux.hpp:1112
    #3 0x4e2da7 in ggml_cgraph* std::__invoke_impl<ggml_cgraph*, Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}&>(std::__invoke_other, Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}&) /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/invoke.h:61
    #4 0x4e2da7 in std::enable_if<is_invocable_r_v<ggml_cgraph*, Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}&>, ggml_cgraph*>::type std::__invoke_r<ggml_cgraph*, Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}&>(Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}&) /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/invoke.h:114
    #5 0x4e2da7 in std::_Function_handler<ggml_cgraph* (), Flux::FluxRunner::compute(int, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor*, ggml_tensor**, ggml_context*, std::vector<int, std::allocator<int> >)::{lambda()#1}>::_M_invoke(std::_Any_data const&) /nix/store/yg4ahy7gahx91nq80achmzilrjyv0scj-gcc-13.3.0/include/c++/13.3.0/bits/std_function.h:290

SUMMARY: AddressSanitizer: heap-use-after-free (/nix/store/mhd0rk497xm0xnip7262xdw9bylvzh99-gcc-13.3.0-lib/lib/libasan.so.8+0x71891) in __interceptor_memcpy
Shadow bytes around the buggy address:
  0x61d000006c00: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x61d000006c80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x61d000006d00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x61d000006d80: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x61d000006e00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x61d000006e80:[fd]fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x61d000006f00: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x61d000006f80: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x61d000007000: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x61d000007080: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  0x61d000007100: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==84706==ABORTING

Green-Sky · 2025-05-30T17:41:01Z

Output with ubsan (no asan)

since this looks like the vulkan output, I am guessing the issue is one and the same.

stduhpf · 2025-05-30T17:41:48Z

Thank you for trying that @Green-Sky . I believe it's working now.

Vulkan backend, 16 steps with cfg (cfg_scale =4), so 32 forward passes without anything breaking:

stduhpf · 2025-05-30T17:50:22Z

Ok it's very important to keep the distilled guidance scale to 0. For some reason the model still accepts it as an input, but it completely breaks apart if it's not zero (I double checked it with ComfyUI, it's not an issue with my code). Maybe I should just force it to zero for chroma to keep things simple?

Green-Sky · 2025-05-30T17:52:02Z

Ok it's very important to keep the distilled guidance scale to 0. For some reason the model still accepts it as an input, but it completely breaks apart if it's not zero (I double checked it with ComfyUI, it's not an issue with my code). Maybe I should just force it to zero for chroma to keep things simple?

Go for it. Down the line we should put recommended/forced values into the gguf file.

Green-Sky · 2025-05-30T18:34:19Z

The q4_k is hurting it somewhat, as expected.

--sampling-method euler --steps 16 --guidance 0 --cfg-scale 4 --diffusion-fa -W 1024 -H 1024

[DEBUG] ggml_extend.hpp:1174 - flux params backend buffer size =  4824.80 MB(VRAM) (643 tensors)
[DEBUG] ggml_extend.hpp:1126 - flux compute buffer size: 854.00 MB(VRAM)
[INFO ] stable-diffusion.cpp:1628 - txt2img completed in 208.26s

edit: I used v33

stduhpf · 2025-05-30T22:59:11Z

The q4_k is hurting it somewhat, as expected.

Keeping the distilled_guidance_layer weights to high precision seems to help a lot. (for example: https://huggingface.co/silveroxides/Chroma-GGUF is using BF16 for distilled_guidance_layer, txt_in, img_in and final_layer, so comaptibility with GGML backends might not work, I had to update GGML to get bf16 to work on vulkan)

v32:

silveroxides q4_0 (+bf16)	full q4_0 (requantized)

Which begs the question: should we condsider making the convert tool "smarter", like it is in llama.cpp, with different quant types depending on the role of each tensor?

Green-Sky · 2025-05-31T11:39:11Z

Which begs the question: should we condsider making the convert tool "smarter", like it is in llama.cpp, with different quant types depending on the role of each tensor?

We should.

I find this model with current sd.cpp quantization incredibly hard to prompt too, but that is probably the token padding / masking.

This is supposed to be q5_k quality, normal flux looks way better, even flux light looks better.

kgigitdev · 2025-06-01T11:03:34Z

Hi @stduhpf and @Green-Sky ,

Apologies for the question, but is this branch intended to work solely with vanilla master and master's version of ggml? I've been maintaining a little script for personal use that merges specific branches from here and there to get a more up-to-date build, and no matter what I do I can't get this branch to work, either on its own or with other branches.

Here's the extract from my script that shows the current state of what I have generally been merging in (with a few of them commented out, as you can see, but I've included them anyway since they do work in some other combinations):

BRANCHES="zhouwg/sync_with_latest_ggml"
BRANCHES="${BRANCHES} wbruna/fix_sdxl_lora"
BRANCHES="${BRANCHES} stduhpf/sdxl-embd"
BRANCHES="${BRANCHES} stduhpf/tiled-vae-encode"
BRANCHES="${BRANCHES} stduhpf/imatrix"
BRANCHES="${BRANCHES} stduhpf/lcppT5"
BRANCHES="${BRANCHES} stduhpf/unchained"
BRANCHES="${BRANCHES} stduhpf/dt"
# BRANCHES="${BRANCHES} stduhpf/diffusers"
BRANCHES="${BRANCHES} stduhpf/override-te"
BRANCHES="${BRANCHES} stduhpf/concat-controls"
# BRANCHES="${BRANCHES} stduhpf/ip2p"
BRANCHES="${BRANCHES} ImKyra/master"
# BRANCHES="${BRANCHES} Green-Sky/large_file_hardening"
# BRANCHES="${BRANCHES} rmatif/sigmas"

Now, I already suspected that this branch would probably not work directly in my script, since you said, "I had to update GGML to get bf16 to work on vulkan", and that probably conflicts with zhouwg/sync_with_latest_ggml. But in general, it has been harder and harder to get a working build with all the latest features and fixes in place.

Now, Chroma being a Flux derivative, I should also mention that for a while now I've not been able to get Flux to work either; the problems are generally of the form:

A gajillion of these:

[ERROR] model.cpp:1938 - tensor 'first_stage_model.decoder.conv_in.bias' not in model file

or a gajillion of these (with the --diffusion-model argument):

[INFO ] model.cpp:1897 - unknown tensor 'model.diffusion_model.model.diffusion_model.double_blocks.0.img_attn.norm.key_norm.scale | f8_e4m3 | 1 [128, 1, 1, 1, 1]' in model file

(the doubled prefix probably indicating that something has automatically prefixed model.diffusion_model. where one already existed).

followed by:

[ERROR] stable-diffusion.cpp:441  - load tensors from model loader failed

From looking at ModelLoader::get_sd_version(), it looks like there's a ton of heuristics to determine the model type based on the tensor names, but that's all probably doomed to fail anyway if the tensor loading has previously failed. Or, I'm missing a branch somewhere that adds new entries to enum SDVersion.

And that's the point where I give up, since I don't know enough about the expected naming of the tensors.

Which also brings us to the elephant in the room that you're all too polite to talk about :-) : vis-a-vis #686 , given that I'm probably not the only one with the above problems, I'm sure that nobody would take offence if a temporary (friendly, and prominently attributed) fork were created to contain suitably-approved merged and conflict-resolved branches from all the developers who have pending PRs, as well as lots of useful work that is currently being duplicated across lots of forks.

Green-Sky · 2025-06-01T11:35:33Z

@kgigitdev I feel you. For my use case I depend on the webserver api.
Can you open a separate issue for this? It seems to be gernally flux, like you said.

Now, I already suspected that this branch would probably not work directly in my script, since you said, "I had to update GGML to get bf16 to work on vulkan"

This does not seem to be in this pr.

Which also brings us to the elephant in the room that you're all too polite to talk about :-) : vis-a-vis #686 , given that I'm probably not the only one with the above problems, I'm sure that nobody would take offence if a temporary (friendly, and prominently attributed) fork were created to contain suitably-approved merged and conflict-resolved branches from all the developers who have pending PRs, as well as lots of useful work that is currently being duplicated across lots of forks.

If we ended up doing this, I would ask @ggerganov to host that project at the ggml org.
But let's wait a while longer and see what @leejet ends up doing.

kgigitdev · 2025-06-01T12:02:17Z

Hi @Green-Sky ,

Thanks for your swift answer.

If we ended up doing this, I would ask @ggerganov to host that project at the ggml org.

Gosh, yes, I hadn't even thought of that option. I think I had subconsciously assumed that stable-diffusion.cpp was only ever on the periphery of the llama.cpp people, because llama == serious work whereas stable diffusion == frippery and frivolity.

stduhpf · 2025-06-01T13:35:14Z

A gajillion of these:

[ERROR] model.cpp:1938 - tensor 'first_stage_model.decoder.conv_in.bias' not in model file

That probably means the VAE is missing, or that the tensors from the VAE can't be found because their names are not the ones expected.

or a gajillion of these (with the --diffusion-model argument):
[INFO ] model.cpp:1897 - unknown tensor 'model.diffusion_model.model.diffusion_model.double_blocks.0.img_attn.norm.key_norm.scale | f8_e4m3 | 1 [128, 1, 1, 1, 1]' in model file
(the doubled prefix probably indicating that something has automatically prefixed model.diffusion_model. where one already existed).

Yes, the doubled prefix means the tensor names are already prefixed in the model file you're using, which means you should use --model instead of --diffusion-model. This often happens with versions of Flux with built-in vae, in which case that would also fix the other issue.

stduhpf · 2025-06-01T20:52:54Z

With attention masking	Without

Somehow the results I get with the attention masking look a bit worse than what I had before without it. Maybe once I implement the mask modification to attend some padding tokens, it will fix itself? For now if you just want to generate better pictures, use an earlier commit.
The compute buffer size is also significantly increased so I can no longer generate 1024x1024 with Vulkan (It reaches the allocation limit)

Edit: I fixed the compute buffer size issue, and after trying different prompts I'm not sure which is the best actually. Maybe using masking is not so bad after all.

wbruna · 2025-06-02T11:11:10Z

Which begs the question: should we consider making the convert tool "smarter", like it is in llama.cpp, with different quant types depending on the role of each tensor?

FWIW, I've been working on an option for sd.cpp to choose the quant by tensor pattern, a la llama.cpp's overridetensors: e128cfa . The conversion itself already works; could be useful for testing.

stduhpf · 2025-06-09T14:33:44Z

@LostRuins Are you using flash attention or not?

Because after looking quickly into it, it seems like flash attention only works with either causal attention masks or no mask at all, and in the case of Chroma, the mask is weirdly shaped.

LostRuins · 2025-06-09T15:14:24Z

That requires SD_USE_FLASH_ATTENTION manually defined right? If so, I am not using it

stduhpf · 2025-06-09T15:54:36Z

Then I'm clueless.

LostRuins · 2025-06-09T16:02:44Z

Alright, well we'll go with whatever solution you think works best. I'm just rather surprised at how wonky the chroma outputs are. Maybe it is just the model. But I think you'd agree that these are a big step down from Flux subjectively speaking.

LostRuins · 2025-06-14T04:38:23Z

Anyway I believe there's still something wrong, both with/without the final commit in this PR, but I can't put my finger on it. Sometimes the object in the image is very clear and looks fine, other times there are just weird artifacts in some way or another. I have no idea if it's just the model or something else is wrong. Sometimes the exact same prompt gives good results for one seed and bad results with another.

The prompt for the above was "cat" with various seeds and no negative prompt at 512x512px

stduhpf · 2025-06-14T10:36:10Z

Anyway I believe there's still something wrong, both with/without the final commit in this PR, but I can't put my finger on it. Sometimes the object in the image is very clear and looks fine, other times there are just weird artifacts in some way or another. I have no idea if it's just the model or something else is wrong. Sometimes the exact same prompt gives good results for one seed and bad results with another.

The prompt for the above was "cat" with various seeds and no negative prompt at 512x512px

ComfyUI	This PR

As far as I'm concerned, this is an issue with the model itself. It doesn't seem to handle short prompts and/or low resolutions very well.

LostRuins · 2025-06-14T10:46:44Z

Ah i see. I don't use comfy so I can't compare. In that case, cool. Should be good to go then

rmatif · 2025-06-14T10:54:06Z

I was going to say the same thing. Generally, when it struggles like that, it's most likely because it doesn't handle low resolution well, you can experience that with other models too

stduhpf · 2025-06-14T11:16:46Z

Still 512x512, with an "ai upscaled" prompt (i used a LLM to come up with a prompt for a "cat" picture):

A majestic Maine Coon cat with thick, flowing brown tabby fur and piercing golden eyes, sitting regally on an ornate velvet cushion by a sunlit window. The warm afternoon light casts soft highlights on its fur, creating a luxurious texture. The background features a cozy Victorian-style room with bookshelves and a steaming teacup. Rendered in ultra-realistic 8K detail with subtle painterly touches, cinematic depth of field, and a dreamy atmosphere.

1024x1024 with simple "cat" prompt:

stduhpf · 2025-06-14T11:31:46Z

Both changes:

stduhpf · 2025-06-14T13:21:15Z

It seems to also be related to the attention masking. Looks like not being able to attend to enough tokens kind of breaks T5.

With masking (default)	T5 masking only (disabled for DiT)	DiT masking only (disabled for t5)	No masking at all

stduhpf · 2025-06-14T13:49:05Z

Hmm actually I'm staring to think T5 might not be supposed to use masking by default. I thought it was, but it looks like the ComfyUI implementation doesn't use it. The model is still broken with short prompts and low resolutions when t5 masking is disabled, but not quite as much (just like the Comfyui implementation).

LostRuins · 2025-06-14T14:52:40Z

Will this change affect regular flux too? Or just chroma

stduhpf · 2025-06-14T14:58:36Z

Just Chroma. Regular Flux doesn't use any kind of attention masking

LostRuins · 2025-06-14T15:05:53Z

Applied your patch. Same mashed evil cat seed as #696 (comment) , looks slightly less mashed now. But this cat still looks like it needs to visit a vet.

Edit: I spoke too soon. Once I added "ugly" as a negative prompt I got this monstrosity.

stduhpf · 2025-06-14T19:49:08Z

@LostRuins I'm confused. These look like this on my end ( 768x768, cfg scale 5, euler sampling, seed 777800, chroma-unlocked-v35-Q4_0.gguf (silveroxides), t5xxl_fp8_e4m3fn.safetensors)

It's far from perfect either, but it's clearly not as broken. Could it be something weird happening on koboldcpp end? Have you tried running it from sdcpp cli?

LostRuins · 2025-06-15T08:18:57Z

I would say they are in the same ballpark. Your second image is basically what I am seeing too.

Yeah I do run sd.cpp locally outside of kcpp, same results. Also other models are unaffected. I don't really modify the backend/graph so those parts should be the same.

stduhpf · 2025-06-15T08:24:29Z

It's strange that the text on the sign is almost perfectly identical, but the cat's faces are completely different.

LostRuins · 2025-06-15T08:40:30Z

Yeah. So strange. I am using the same HEAD commit as this, so perhaps there are some variances with either the quant/t5xxl i am using, or perhaps hardware related precision errors? Strangely all of these are related to chroma.

Here's compared with Flux Dev 1.

Same T5XXL
Same VAE
Same prompt
Same seed, sampler settings, step count and prompt, with no negative prompt.
Added Flux Dev 1 Clip-L

Flux Dev 1 Result:

Btw I did notify the creator of Chroma as feedback, and got this reply...

That has nothing to do with the model and everything to do with the implementation in the PR.
If issues this severe were standard then don't you think that you would have seen more examples from other places of this?
btw if t5 in stable-diffusion.cpp previously did not require specific total chunk length or attention mask then adding attention mask is unnecessary
attention mask is only needed if one uses transformers default which pads the t5 model to 512.
by simply not padding it to 512 you effectively remove that part
also empty negatives produces artifacts since the model is dedistilled schnell from start and bfl did not use attention mask. So you either use negative to a reasonable token count or you pad negative to a set minimum if negative is empty.
Also single token prompting on a model with unpadded t5 encoder and no CLIP is obviously out of scope.

https://huggingface.co/silveroxides/Chroma-GGUF/discussions/8 they're on huggingface.

LostRuins · 2025-06-18T12:28:03Z

@stduhpf I'm planning to integrate your chroma PR into koboldcpp soon, there's been quite a lot of back-and-forth with these different approaches. What do you think are the current best configs from what we've tried?

stduhpf · 2025-06-18T12:40:03Z

@stduhpf I'm planning to integrate your chroma PR into koboldcpp soon, there's been quite a lot of back-and-forth with these different approaches. What do you think are the current best configs from what we've tried?

I think the current default one (no t5 masking, Dit masking with 1 padding included) should be the best but I'm not very confident about it anymore.

wbruna · 2025-06-18T14:14:05Z

@LostRuins , may I suggest targeting the bleedingedge release for Koboldcpp? Apart from Chroma support, it also includes many pending fixes for sd.cpp (like that VAE tile initialization bug), and both Koboldcpp and sd-server would benefit from any fixes needed for running with a persistent model in memory.

LostRuins · 2025-06-19T03:29:25Z

Yes I have merged the VAE tile fix, what else is there? I don't see any bleedingedge branch and everything else is mostly up to date

wbruna · 2025-06-19T10:38:05Z

Yes I have merged the VAE tile fix, what else is there?

From the top of my head there is also #484 and #681 (apart from stuff that you probably fixed directly on Koboldcpp already, like #658).

I don't see any bleedingedge branch and everything else is mostly up to date

https://github.com/stduhpf/stable-diffusion.cpp/commits/bleedingedge/ , with several tagged releases.

LostRuins · 2025-06-19T15:30:29Z

681 isnt needed for kobo because kobo doesn't load ckpt or diffusers models, only gguf and safetensors.
484 seems to already be working? Tiled VAE is working fine.

I'd rather not rebase off a significantly different codebase as it'll require a bunch of testing to ensure everything works and I might have to deal with possible regressions. Instead if there are any critical or important issues I can patch those in manually.

wbruna · 2025-06-19T16:34:29Z

681 isnt needed for kobo because kobo doesn't load ckpt or diffusers models, only gguf and safetensors. 484 seems to already be working? Tiled VAE is working fine.

I gave an example on LostRuins/koboldcpp#1603, to avoid more unrelated discussions on this PR.

I'd rather not rebase off a significantly different codebase as it'll require a bunch of testing to ensure everything works and I might have to deal with possible regressions. Instead if there are any critical or important issues I can patch those in manually.

Fair enough.

LostRuins · 2025-06-20T03:14:23Z

Alright I added the tiling fix

stduhpf · 2025-06-25T10:15:27Z

flux.hpp

+            // TODO: not hardcoded?
+            const int single_blocks_count = 38;
+            const int double_blocks_count = 19;


Looks like I'll have to implement this todo:
https://huggingface.co/lodestones/flux-ultra-lite

tensor 'model.diffusion_model.distilled_guidance_layer.in_proj.weight' has wrong shape in model file: got [32, 5120, 1, 1], expected [64, 5120, 1, 1]

That's odd. Is it using a different VAE?

Hmm looks like it's not quite the same architecture as chroma. There a similarities, but somehow the "time_in" "vector_in" and "guidance_in" are back, as well as the modulation for double blocks only. And the distilled_guidance_layer doesn't have the same input shape...

Gotta wait for more info then.

My current guess is that it's using the double blocks from flux lite and the single blocks from Chroma.

Nice. Hoping your earlier commits get merged by leejet soon.

stduhpf added 3 commits May 28, 2025 18:25

Chroma: Initial commit (broken output)

f7ad456

Fix small mistake (still broken)

ad39011

is_chroma

93ed721

Reshape before approx

d20f77f

stduhpf added 2 commits May 30, 2025 19:40

format

bb1fe2c

Fix use_after_free (hopefully)

f506a63

Chroma: Attention masking (no pad)

836fd72

LostRuins approved these changes Jun 14, 2025

View reviewed changes

Chroma/Pixart T5: Disable attention masking by default

00141e5

stduhpf commented Jun 25, 2025

View reviewed changes

Chroma support (pruned Flux model) #696

Are you sure you want to change the base?

Chroma support (pruned Flux model) #696

Conversation

stduhpf commented May 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Usage

Advanced usage

Uh oh!

stduhpf commented May 30, 2025

Uh oh!

stduhpf commented May 30, 2025

Uh oh!

stduhpf commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stduhpf commented May 30, 2025

Uh oh!

Green-Sky commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stduhpf commented May 30, 2025

Uh oh!

rmatif commented May 30, 2025

Uh oh!

Green-Sky commented May 30, 2025

Uh oh!

stduhpf commented May 30, 2025

Uh oh!

Green-Sky commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Green-Sky commented May 30, 2025

Uh oh!

stduhpf commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stduhpf commented May 30, 2025

Uh oh!

Green-Sky commented May 30, 2025

Uh oh!

Green-Sky commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stduhpf commented May 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Green-Sky commented May 31, 2025

Uh oh!

kgigitdev commented Jun 1, 2025

Uh oh!

Green-Sky commented Jun 1, 2025

Uh oh!

kgigitdev commented Jun 1, 2025

Uh oh!

stduhpf commented Jun 1, 2025

Uh oh!

stduhpf commented Jun 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wbruna commented Jun 2, 2025

Uh oh!

stduhpf commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

LostRuins commented Jun 9, 2025

Uh oh!

stduhpf commented Jun 9, 2025

Uh oh!

LostRuins commented Jun 9, 2025

Uh oh!

LostRuins commented Jun 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stduhpf commented Jun 14, 2025

Uh oh!

LostRuins commented Jun 14, 2025

Uh oh!

rmatif commented Jun 14, 2025

Uh oh!

stduhpf commented May 28, 2025 •

edited

Loading

stduhpf commented May 30, 2025 •

edited

Loading

Green-Sky commented May 30, 2025 •

edited

Loading

Green-Sky commented May 30, 2025 •

edited

Loading

stduhpf commented May 30, 2025 •

edited

Loading

Green-Sky commented May 30, 2025 •

edited

Loading

stduhpf commented May 30, 2025 •

edited

Loading

stduhpf commented Jun 1, 2025 •

edited

Loading

stduhpf commented Jun 9, 2025 •

edited

Loading

LostRuins commented Jun 14, 2025 •

edited

Loading

stduhpf commented Jun 14, 2025 •

edited

Loading

LostRuins commented Jun 14, 2025 •

edited

Loading

LostRuins commented Jun 15, 2025 •

edited

Loading

stduhpf Jun 25, 2025 •

edited

Loading