Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce GGMLBlock and implement SVD(Broken) #159

Merged
merged 6 commits into from
Feb 24, 2024
Merged

feat: introduce GGMLBlock and implement SVD(Broken) #159

merged 6 commits into from
Feb 24, 2024

Conversation

leejet
Copy link
Owner

@leejet leejet commented Jan 27, 2024

In the past few weeks during my free time, I've been working on implementing this PR. In this PR, I introduced the GGMLBlock, making it easier to implement neural networks. In most cases, it's straightforward to implement the corresponding GGMLBlock from nn.Module. I have implemented the majority of the building blocks for SVD and the SVD pipeline, except for VScalingWithEDMcNoise, which is also relatively simple to implement.

However, I've started to feel fatigued. ggml's batch inference implementation for certain operators has issues, and although I've addressed some problems in this branch https://github.com/leejet/ggml/tree/batch-inference, it's not entirely resolved. Furthermore, there are situations where NaN occurs in the implementation of some operators, and these issues also need to be fixed. If I have time in the future, I'll continue addressing these issues with ggml. However, for now, I'll be allocating my free time to other tasks as I've already invested a considerable amount of effort in implementing this PR over the past few weeks. Perhaps I'll merge this PR first, even though the SVD support is broken. This is because the PR introduces GGMLBlock, which makes it convenient to use ggml for implementing neural networks. The test results for batch inference are documented in the comments of the test functions in unet.hpp/vae.hpp; take a look if you're interested.

@Amin456789
Copy link

Amin456789 commented Jan 27, 2024

this is amazing news, thank u so much for ur hard work leejet. can't wait u guys fix svd and try it on here

on the side note = convert feature seems still not working for quantize to under fp16 it seems, i tried to convert sdxl turbo to q5.1 and it didnt generate images [i mentioned it in that issues in safetensors topic]. could u please fix it as it will be very useful to convert svd model to quantize to q4.1 for example will be very fast. converting on the fly works but converting to gguf dont

@FSSRepo
Copy link
Contributor

FSSRepo commented Jan 27, 2024

Doing batch inference will only be reserved for when you have a lot of VRAM. I think now we will be able to perform a single computation of a UNet in which a batch of conditionals c = [c, uc] is used.

@FSSRepo
Copy link
Contributor

FSSRepo commented Jan 27, 2024

I was planning to refactor Stable Diffusion to have an API similar to llama.cpp and also support offloading, computing sd, and controlnet on the GPU with low VRAM. However, upon reviewing this refactoring, I think it's better to just extend what I need to make a web UI work.

@Cyberhan123
Copy link
Contributor

I was planning to refactor Stable Diffusion to have an API similar to llama.cpp and also support offloading, computing sd, and controlnet on the GPU with low VRAM. However, upon reviewing this refactoring, I think it's better to just extend what I need to make a web UI work.

@FSSRepo Basically this PR that I'm implementing: #157
I split the loading logic of clip, vae, and unet, and then added the set_options api. I think we did the same thing.

@Cyberhan123
Copy link
Contributor

Cyberhan123 commented Jan 28, 2024

It is understandable that problems will arise, ggml is a lib mainly to support llama. However, for me, ggml has several advantages that cannot be ignored compared to pytorch. It is small enough (pytorch cuda dependency is about 1GB), and it supports quantification very well, and supports windows rocm.

@Cyberhan123
Copy link
Contributor

GGMLBlock is very educational and this implementation is great.

@leejet
Copy link
Owner Author

leejet commented Jan 28, 2024

Doing batch inference will only be reserved for when you have a lot of VRAM. I think now we will be able to perform a single computation of a UNet in which a batch of conditionals c = [c, uc] is used.

For svd, batch inference is a must, the ne3 is actually batch size * num video frames.

@leejet
Copy link
Owner Author

leejet commented Jan 28, 2024

on the side note = convert feature seems still not working for quantize to under fp16 it seems, i tried to convert sdxl turbo to q5.1 and it didnt generate images [i mentioned it in that issues in safetensors topic]. could u please fix it as it will be very useful to convert svd model to quantize to q4.1 for example will be very fast. converting on the fly works but converting to gguf dont

did you use the fp16-fix vae?

@Cyberhan123
Copy link
Contributor

Cyberhan123 commented Jan 28, 2024

on the side note = convert feature seems still not working for quantize to under fp16 it seems, i tried to convert sdxl turbo to q5.1 and it didnt generate images [i mentioned it in that issues in safetensors topic]. could u please fix it as it will be very useful to convert svd model to quantize to q4.1 for example will be very fast. converting on the fly works but converting to gguf dont

did you use the fp16-fix vae?

I found a problem when using it now. Regarding generating seeds, if the seed is 42, then the generated pictures are correct.
But if the seeds are random, the pictures will often be generated very strangely. I don’t know much about the behavior on pytorch.

seed 42:
image

random seed
image

@Cyberhan123
Copy link
Contributor

What shocks me is that for a 768x768 image (sdxl-turbo) on 7900xtx, a single sampling only takes 0.35s. It seems that the performance bottleneck lies in the decoding operation.

@Amin456789
Copy link

Amin456789 commented Jan 28, 2024

im using taesdxl and it works great, with taesdxl i dont need to use vae fp16 fix and generated images are great with 1 step and lcm sampler for sdxl turbo, however, what i meant was converting the models to q4.1 gguf files with -m convert command in cmd for having smaller models which gave me errors after generating image [the same as converting safetensors topic in issues if i remember]
converting on the fly and making images works great but -m convert --type to quantize like q4.1 somehow curropt the model sdxl turbo i think

@leejet
Copy link
Owner Author

leejet commented Jan 29, 2024

But if the seeds are random, the pictures will often be generated very strangely

@Cyberhan123 I got same result in sd-webui using seed 297003140.

@leejet
Copy link
Owner Author

leejet commented Feb 24, 2024

I will merge this PR even though the SVD support is broken. This is because the PR introduces GGMLBlock, which makes it convenient to use ggml for implementing neural networks. I have other changes that rely on GGMLBlock, such as adding support for stable cascade. I will try to fix the SVD issue later if I have time.

@leejet leejet merged commit b636886 into master Feb 24, 2024
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants