Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with RAM usage #3

Open
Arnaud3013 opened this issue Oct 1, 2024 · 1 comment
Open

Problem with RAM usage #3

Arnaud3013 opened this issue Oct 1, 2024 · 1 comment

Comments

@Arnaud3013
Copy link

Arnaud3013 commented Oct 1, 2024

Really usefull extensions. On dev nf4 (RTX 4070, setting 9500 max vram for model) Great speedup if no model change between generation.
previously:

100%|████████████████████████████████████████████████████████████████████| 5/5 [00:07<00:00, 1.60s/it]
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.██████████| 5/5 [00:07<00:00, 1.29s/it]
[Unload] Trying to free 15315.57 MB for cuda:0 with 0 models keep loaded ... Current free memory is 4472.06 MB ... Unload model KModel Current free memory is 10860.76 MB ... Unload model IntegratedAutoencoderKL Done.
[Memory Management] Target: JointTextEncoder, Free GPU: 11020.64 MB, Model Require: 9641.98 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: -1402.34 MB, CPU Swap Loaded (blocked method): 2886.75 MB, GPU Loaded: 6899.98 MB
Moving model(s) has taken 3.62 seconds
Distilled CFG Scale: 3.5
[Unload] Trying to free 10901.84 MB for cuda:0 with 0 models keep loaded ... Current free memory is 3613.65 MB ... Unload model JointTextEncoder Done.
[Memory Management] Target: KModel, Free GPU: 11015.63 MB, Model Require: 6246.80 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: 1987.83 MB, All loaded to GPU.
Moving model(s) has taken 3.24 seconds
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:07<00:00, 1.47s/it]

Now:

100%|████████████████████████████████████████████████████████████████████| 5/5 [00:08<00:00, 1.76s/it]
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.██████████| 5/5 [00:08<00:00, 1.30s/it]
[Unload] Trying to free 10901.84 MB for cuda:0 with 0 models keep loaded ... Current free memory is 10352.01 MB ... Current free memory is 10352.01 MB ... Unload model IntegratedAutoencoderKL Done.
[Memory Management] Target: KModel, Free GPU: 10511.88 MB, Model Require: 6246.80 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: 1484.08 MB, All loaded to GPU.
Moving model(s) has taken 1.41 seconds
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:07<00:00, 1.47s/it]

On dev Q8 gguf (same gpu)
Previously:

100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:11<00:00, 2.23s/it]
[Unload] Trying to free 15315.57 MB for cuda:0 with 0 models keep loaded ... Current free memory is 10861.01 MB ... Unload model IntegratedAutoencoderKL Done.
[Memory Management] Target: JointTextEncoder, Free GPU: 11020.89 MB, Model Require: 9641.98 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: -1402.09 MB, CPU Swap Loaded (blocked method): 2886.75 MB, GPU Loaded: 6899.98 MB
Moving model(s) has taken 1.75 seconds
Distilled CFG Scale: 3.5
[Unload] Trying to free 18536.36 MB for cuda:0 with 0 models keep loaded ... Current free memory is 3613.90 MB ... Unload model JointTextEncoder Done.
[Memory Management] Target: KModel, Free GPU: 11015.88 MB, Model Require: 12119.51 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: -3884.63 MB, CPU Swap Loaded (blocked method): 5202.00 MB, GPU Loaded: 6917.51 MB
Moving model(s) has taken 4.48 seconds
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:11<00:00, 2.24s/it]

after:

100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:11<00:00, 2.23s/it]
Distilled CFG Scale: 3.5
[Unload] Trying to free 18536.36 MB for cuda:0 with 0 models keep loaded ... Current free memory is 10354.01 MB ... Unload model IntegratedAutoencoderKL Current free memory is 10513.88 MB ... Done.
[Memory Management] Target: KModel, Free GPU: 10513.88 MB, Model Require: 12119.51 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: -4386.63 MB, CPU Swap Loaded (blocked method): 5680.12 MB, GPU Loaded: 6439.38 MB
Moving model(s) has taken 2.62 seconds
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:11<00:00, 2.24s/it]

But it seems to have some issues with memory management. When i change model, like in x/y/z plot and do some genration/tests, my memory explode. Vram constant no issue here, blocked at 9000 on my 4070, but RAM it's another topic. I've set some virtual ram to be sure to handle model in q8_0, I've 32go physical RAM and 60go of virtual RAM on nvme ssd. without extension no issues, sometimes RAM usage go up to 55go but not more, with extension very often it go up to 90go and crash forge. It seems when changing model, something is loaded again and again at each loading of a model and is not cleaned

@Juqowel
Copy link
Owner

Juqowel commented Oct 2, 2024

Unfortunately I can't control the forge internal memory management.
It loads each individual t5 if it is integrated and if you don't specify an external one. Use a separate CLIP/T5.

4356

I don't have any memory problems with this. If you already use it - let me know what checkpoints you use to reproduce.
I also recommend avoiding the CPU Swap Loaded (blocked method) message. Q4 best for 12gb. NF4 is trash cuz too noticeable square pattern and other quality issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants