Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Clip-L does absolutely nothing with Flux models. #396

Open
stduhpf opened this issue Sep 4, 2024 · 1 comment · May be fixed by #397
Open

Bug: Clip-L does absolutely nothing with Flux models. #396

stduhpf opened this issue Sep 4, 2024 · 1 comment · May be fixed by #397

Comments

@stduhpf
Copy link
Contributor

stduhpf commented Sep 4, 2024

I was testing out alternative clip models for Flux (https://huggingface.co/zer0int/CLIP-GmP-ViT-L-14), and noticed the output were oddly similar to the ones I was getting with the original clip-l.
Turns out the result are actually 100% identical (when using the same seed of course), not even a single bit of difference. Using the same models with ComfyUI does lead to noticably different outputs (with fixed seed).

clip-L ViT-L-14
Dev q3_k (sd.cpp) test-dev-clip test-dev-vit
Schnell q3_k (sd.cpp) test-schnell-clip test-schnell-vit
Schnell fp8 (Comfyui) ComfyUI_00515_ ComfyUI_00514_
Schnell q3_k (Comfyui) ComfyUI_00517_ ComfyUI_00516_

As a sidenote, this could explain why I was feeling like the generations I get with sdcpp look less detailed and more artifacty than when I'm using ComfyUI at the same quants.

@Green-Sky
Copy link
Contributor

You are right, it is commented out and just set to 0.

// auto it = std::find(chunk_tokens.begin(), chunk_tokens.end(), clip_l_tokenizer.EOS_TOKEN_ID);
// max_token_idx = std::min<size_t>(std::distance(chunk_tokens.begin(), it), chunk_tokens.size() - 1);
// clip_l->compute(n_threads,
// input_ids,
// 0,
// NULL,
// max_token_idx,
// true,
// &pooled,
// work_ctx);
// clip_l.transformer.text_model.text_projection no in file, ignore
// TODO: use torch.eye(embed_dim) as default clip_l.transformer.text_model.text_projection
pooled = ggml_new_tensor_1d(work_ctx, GGML_TYPE_F32, 768);
ggml_set_f32(pooled, 0.f);

@stduhpf stduhpf linked a pull request Sep 4, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants