Problem with RAM usage #3

Arnaud3013 · 2024-10-01T20:36:36Z

Really usefull extensions. On dev nf4 (RTX 4070, setting 9500 max vram for model) Great speedup if no model change between generation.
previously:

100%|████████████████████████████████████████████████████████████████████| 5/5 [00:07<00:00, 1.60s/it]
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.██████████| 5/5 [00:07<00:00, 1.29s/it]
[Unload] Trying to free 15315.57 MB for cuda:0 with 0 models keep loaded ... Current free memory is 4472.06 MB ... Unload model KModel Current free memory is 10860.76 MB ... Unload model IntegratedAutoencoderKL Done.
[Memory Management] Target: JointTextEncoder, Free GPU: 11020.64 MB, Model Require: 9641.98 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: -1402.34 MB, CPU Swap Loaded (blocked method): 2886.75 MB, GPU Loaded: 6899.98 MB
Moving model(s) has taken 3.62 seconds
Distilled CFG Scale: 3.5
[Unload] Trying to free 10901.84 MB for cuda:0 with 0 models keep loaded ... Current free memory is 3613.65 MB ... Unload model JointTextEncoder Done.
[Memory Management] Target: KModel, Free GPU: 11015.63 MB, Model Require: 6246.80 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: 1987.83 MB, All loaded to GPU.
Moving model(s) has taken 3.24 seconds
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:07<00:00, 1.47s/it]

Now:

100%|████████████████████████████████████████████████████████████████████| 5/5 [00:08<00:00, 1.76s/it]
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.██████████| 5/5 [00:08<00:00, 1.30s/it]
[Unload] Trying to free 10901.84 MB for cuda:0 with 0 models keep loaded ... Current free memory is 10352.01 MB ... Current free memory is 10352.01 MB ... Unload model IntegratedAutoencoderKL Done.
[Memory Management] Target: KModel, Free GPU: 10511.88 MB, Model Require: 6246.80 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: 1484.08 MB, All loaded to GPU.
Moving model(s) has taken 1.41 seconds
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:07<00:00, 1.47s/it]

On dev Q8 gguf (same gpu)
Previously:

100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:11<00:00, 2.23s/it]
[Unload] Trying to free 15315.57 MB for cuda:0 with 0 models keep loaded ... Current free memory is 10861.01 MB ... Unload model IntegratedAutoencoderKL Done.
[Memory Management] Target: JointTextEncoder, Free GPU: 11020.89 MB, Model Require: 9641.98 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: -1402.09 MB, CPU Swap Loaded (blocked method): 2886.75 MB, GPU Loaded: 6899.98 MB
Moving model(s) has taken 1.75 seconds
Distilled CFG Scale: 3.5
[Unload] Trying to free 18536.36 MB for cuda:0 with 0 models keep loaded ... Current free memory is 3613.90 MB ... Unload model JointTextEncoder Done.
[Memory Management] Target: KModel, Free GPU: 11015.88 MB, Model Require: 12119.51 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: -3884.63 MB, CPU Swap Loaded (blocked method): 5202.00 MB, GPU Loaded: 6917.51 MB
Moving model(s) has taken 4.48 seconds
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:11<00:00, 2.24s/it]

after:

100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:11<00:00, 2.23s/it]
Distilled CFG Scale: 3.5
[Unload] Trying to free 18536.36 MB for cuda:0 with 0 models keep loaded ... Current free memory is 10354.01 MB ... Unload model IntegratedAutoencoderKL Current free memory is 10513.88 MB ... Done.
[Memory Management] Target: KModel, Free GPU: 10513.88 MB, Model Require: 12119.51 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: -4386.63 MB, CPU Swap Loaded (blocked method): 5680.12 MB, GPU Loaded: 6439.38 MB
Moving model(s) has taken 2.62 seconds
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:11<00:00, 2.24s/it]

But it seems to have some issues with memory management. When i change model, like in x/y/z plot and do some genration/tests, my memory explode. Vram constant no issue here, blocked at 9000 on my 4070, but RAM it's another topic. I've set some virtual ram to be sure to handle model in q8_0, I've 32go physical RAM and 60go of virtual RAM on nvme ssd. without extension no issues, sometimes RAM usage go up to 55go but not more, with extension very often it go up to 90go and crash forge. It seems when changing model, something is loaded again and again at each loading of a model and is not cleaned

Juqowel · 2024-10-02T07:36:49Z

Unfortunately I can't control the forge internal memory management.
It loads each individual t5 if it is integrated and if you don't specify an external one. Use a separate CLIP/T5.

I don't have any memory problems with this. If you already use it - let me know what checkpoints you use to reproduce.
I also recommend avoiding the CPU Swap Loaded (blocked method) message. Q4 best for 12gb. NF4 is trash cuz too noticeable square pattern and other quality issues.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with RAM usage #3

Problem with RAM usage #3

Arnaud3013 commented Oct 1, 2024 •

edited

Loading

Juqowel commented Oct 2, 2024

Problem with RAM usage #3

Problem with RAM usage #3

Comments

Arnaud3013 commented Oct 1, 2024 • edited Loading

Juqowel commented Oct 2, 2024

Arnaud3013 commented Oct 1, 2024 •

edited

Loading