You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Really usefull extensions. On dev nf4 (RTX 4070, setting 9500 max vram for model) Great speedup if no model change between generation.
previously:
100%|████████████████████████████████████████████████████████████████████| 5/5 [00:07<00:00, 1.60s/it]
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.██████████| 5/5 [00:07<00:00, 1.29s/it]
[Unload] Trying to free 15315.57 MB for cuda:0 with 0 models keep loaded ... Current free memory is 4472.06 MB ... Unload model KModel Current free memory is 10860.76 MB ... Unload model IntegratedAutoencoderKL Done.
[Memory Management] Target: JointTextEncoder, Free GPU: 11020.64 MB, Model Require: 9641.98 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: -1402.34 MB, CPU Swap Loaded (blocked method): 2886.75 MB, GPU Loaded: 6899.98 MB
Moving model(s) has taken 3.62 seconds
Distilled CFG Scale: 3.5
[Unload] Trying to free 10901.84 MB for cuda:0 with 0 models keep loaded ... Current free memory is 3613.65 MB ... Unload model JointTextEncoder Done.
[Memory Management] Target: KModel, Free GPU: 11015.63 MB, Model Require: 6246.80 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: 1987.83 MB, All loaded to GPU.
Moving model(s) has taken 3.24 seconds
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:07<00:00, 1.47s/it]
Now:
100%|████████████████████████████████████████████████████████████████████| 5/5 [00:08<00:00, 1.76s/it]
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.██████████| 5/5 [00:08<00:00, 1.30s/it]
[Unload] Trying to free 10901.84 MB for cuda:0 with 0 models keep loaded ... Current free memory is 10352.01 MB ... Current free memory is 10352.01 MB ... Unload model IntegratedAutoencoderKL Done.
[Memory Management] Target: KModel, Free GPU: 10511.88 MB, Model Require: 6246.80 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: 1484.08 MB, All loaded to GPU.
Moving model(s) has taken 1.41 seconds
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:07<00:00, 1.47s/it]
On dev Q8 gguf (same gpu)
Previously:
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:11<00:00, 2.23s/it]
[Unload] Trying to free 15315.57 MB for cuda:0 with 0 models keep loaded ... Current free memory is 10861.01 MB ... Unload model IntegratedAutoencoderKL Done.
[Memory Management] Target: JointTextEncoder, Free GPU: 11020.89 MB, Model Require: 9641.98 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: -1402.09 MB, CPU Swap Loaded (blocked method): 2886.75 MB, GPU Loaded: 6899.98 MB
Moving model(s) has taken 1.75 seconds
Distilled CFG Scale: 3.5
[Unload] Trying to free 18536.36 MB for cuda:0 with 0 models keep loaded ... Current free memory is 3613.90 MB ... Unload model JointTextEncoder Done.
[Memory Management] Target: KModel, Free GPU: 11015.88 MB, Model Require: 12119.51 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: -3884.63 MB, CPU Swap Loaded (blocked method): 5202.00 MB, GPU Loaded: 6917.51 MB
Moving model(s) has taken 4.48 seconds
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:11<00:00, 2.24s/it]
after:
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:11<00:00, 2.23s/it]
Distilled CFG Scale: 3.5
[Unload] Trying to free 18536.36 MB for cuda:0 with 0 models keep loaded ... Current free memory is 10354.01 MB ... Unload model IntegratedAutoencoderKL Current free memory is 10513.88 MB ... Done.
[Memory Management] Target: KModel, Free GPU: 10513.88 MB, Model Require: 12119.51 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: -4386.63 MB, CPU Swap Loaded (blocked method): 5680.12 MB, GPU Loaded: 6439.38 MB
Moving model(s) has taken 2.62 seconds
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:11<00:00, 2.24s/it]
But it seems to have some issues with memory management. When i change model, like in x/y/z plot and do some genration/tests, my memory explode. Vram constant no issue here, blocked at 9000 on my 4070, but RAM it's another topic. I've set some virtual ram to be sure to handle model in q8_0, I've 32go physical RAM and 60go of virtual RAM on nvme ssd. without extension no issues, sometimes RAM usage go up to 55go but not more, with extension very often it go up to 90go and crash forge. It seems when changing model, something is loaded again and again at each loading of a model and is not cleaned
The text was updated successfully, but these errors were encountered:
Unfortunately I can't control the forge internal memory management.
It loads each individual t5 if it is integrated and if you don't specify an external one. Use a separate CLIP/T5.
I don't have any memory problems with this. If you already use it - let me know what checkpoints you use to reproduce.
I also recommend avoiding the CPU Swap Loaded (blocked method) message. Q4 best for 12gb. NF4 is trash cuz too noticeable square pattern and other quality issues.
Really usefull extensions. On dev nf4 (RTX 4070, setting 9500 max vram for model) Great speedup if no model change between generation.
previously:
100%|████████████████████████████████████████████████████████████████████| 5/5 [00:07<00:00, 1.60s/it]
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.██████████| 5/5 [00:07<00:00, 1.29s/it]
[Unload] Trying to free 15315.57 MB for cuda:0 with 0 models keep loaded ... Current free memory is 4472.06 MB ... Unload model KModel Current free memory is 10860.76 MB ... Unload model IntegratedAutoencoderKL Done.
[Memory Management] Target: JointTextEncoder, Free GPU: 11020.64 MB, Model Require: 9641.98 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: -1402.34 MB, CPU Swap Loaded (blocked method): 2886.75 MB, GPU Loaded: 6899.98 MB
Moving model(s) has taken 3.62 seconds
Distilled CFG Scale: 3.5
[Unload] Trying to free 10901.84 MB for cuda:0 with 0 models keep loaded ... Current free memory is 3613.65 MB ... Unload model JointTextEncoder Done.
[Memory Management] Target: KModel, Free GPU: 11015.63 MB, Model Require: 6246.80 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: 1987.83 MB, All loaded to GPU.
Moving model(s) has taken 3.24 seconds
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:07<00:00, 1.47s/it]
Now:
100%|████████████████████████████████████████████████████████████████████| 5/5 [00:08<00:00, 1.76s/it]
Skipping unconditional conditioning when CFG = 1. Negative Prompts are ignored.██████████| 5/5 [00:08<00:00, 1.30s/it]
[Unload] Trying to free 10901.84 MB for cuda:0 with 0 models keep loaded ... Current free memory is 10352.01 MB ... Current free memory is 10352.01 MB ... Unload model IntegratedAutoencoderKL Done.
[Memory Management] Target: KModel, Free GPU: 10511.88 MB, Model Require: 6246.80 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: 1484.08 MB, All loaded to GPU.
Moving model(s) has taken 1.41 seconds
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:07<00:00, 1.47s/it]
On dev Q8 gguf (same gpu)
Previously:
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:11<00:00, 2.23s/it]
[Unload] Trying to free 15315.57 MB for cuda:0 with 0 models keep loaded ... Current free memory is 10861.01 MB ... Unload model IntegratedAutoencoderKL Done.
[Memory Management] Target: JointTextEncoder, Free GPU: 11020.89 MB, Model Require: 9641.98 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: -1402.09 MB, CPU Swap Loaded (blocked method): 2886.75 MB, GPU Loaded: 6899.98 MB
Moving model(s) has taken 1.75 seconds
Distilled CFG Scale: 3.5
[Unload] Trying to free 18536.36 MB for cuda:0 with 0 models keep loaded ... Current free memory is 3613.90 MB ... Unload model JointTextEncoder Done.
[Memory Management] Target: KModel, Free GPU: 11015.88 MB, Model Require: 12119.51 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: -3884.63 MB, CPU Swap Loaded (blocked method): 5202.00 MB, GPU Loaded: 6917.51 MB
Moving model(s) has taken 4.48 seconds
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:11<00:00, 2.24s/it]
after:
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:11<00:00, 2.23s/it]
Distilled CFG Scale: 3.5
[Unload] Trying to free 18536.36 MB for cuda:0 with 0 models keep loaded ... Current free memory is 10354.01 MB ... Unload model IntegratedAutoencoderKL Current free memory is 10513.88 MB ... Done.
[Memory Management] Target: KModel, Free GPU: 10513.88 MB, Model Require: 12119.51 MB, Previously Loaded: 0.00 MB, Inference Require: 2781.00 MB, Remaining: -4386.63 MB, CPU Swap Loaded (blocked method): 5680.12 MB, GPU Loaded: 6439.38 MB
Moving model(s) has taken 2.62 seconds
100%|████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:11<00:00, 2.24s/it]
But it seems to have some issues with memory management. When i change model, like in x/y/z plot and do some genration/tests, my memory explode. Vram constant no issue here, blocked at 9000 on my 4070, but RAM it's another topic. I've set some virtual ram to be sure to handle model in q8_0, I've 32go physical RAM and 60go of virtual RAM on nvme ssd. without extension no issues, sometimes RAM usage go up to 55go but not more, with extension very often it go up to 90go and crash forge. It seems when changing model, something is loaded again and again at each loading of a model and is not cleaned
The text was updated successfully, but these errors were encountered: