feat: latest comfyui; fix: better GPU utilization for SD15 #270

tazlin · 2024-08-26T20:38:44Z

Changes/fixes:

Significant improvements to crash recovery.
- The worker will no longer crash when there are no jobs for long periods of time.
- The main process is much more capable of recovering from a sub-process crash
- The worker will now detect more deadlocks (which are ordinarily impossible but may arise due to difficult to reproduce edge cases) and attempt to recover.
Additional log messages and warnings under certain situations, along with some recommendations to resolve.
- More info is printed out by default in the periodic status message and with more clarity as to its meaning.
- If the worker will pause popping jobs (such as if too many fail consecutively), a warning that this is happening will appear in every status message.
- If more than several minutes are spent with no jobs, the worker warns that offering more models can potentially prevent this.
Flux support
- Add "Flux.1-Schnell fp8 (Compact)" to your models_to_load to offer.
Updates the README.md to have some additional information about worker configuration.
Updated the bridgeData_template.yaml for clarity and new configuration options.
Added configuration options extra_slow_worker, limit_max_steps, unload_from_vram_often, high_memory_mode
- See the updated template and README.md "Suggested settings" section for more information.

Relies on:

Despite the name, `--novram` still allows the GPU to be used. However, comfyui uses this flag to much more aggressively avoid leaving tensors in VRAM. I am hoping that this will reduce VRAM OOMs and/or shared memory usage (in windows).

With some recent comfyui changes it appears that the logic prior to this commit was not aggressive enough to avoid OOMs with relying on comfyui's internal decision making alone. This commit causes the worker to unload models from VRAM immediately after an inference result (if it is not about to be used) and right before post processing. Post processing as implemented today almost always overestimates the amount of free VRAM, and tends to cause OOMs or shared memory usage (on window) so more proactively unloading the model should help minimize that problem.

The worker seems to be holding onto too much system RAM on average. I previously relied on comfyui internals to handle this implicitly but recent changes seem to have broken some assumptions I was making. This is an purposely over-zealous attempt to keep system RAM usage down.

Redefines the broken existing `high_memory_mode` to leverage the recent memory management extension

This will clarify when the situations such as the shared model manager failing to load or no models being found occur (e.g., when download_models.py isn't)

- More fallback logic if there are jobs popped, processes available, but nothing happening. - Resolves certain problems with the unresponsive logic - The case of it ending all jobs after a long period of "No Job" messages from the server followed by successful pops. - Now no longer shuts down in error while processes are restarting

Tracks the time spent without any available jobs. This will help worker operators identify potential issues with their configuration. A warning will be logged if the worker spends more than 5 minutes without any jobs, suggesting possible actions to increase job demand.

tazlin · 2024-09-23T16:35:21Z

@CodiumAI-Agent /review

CodiumAI-Agent · 2024-09-23T16:36:41Z

PR Reviewer Guide 🔍

(Review updated until commit `f89812c`)

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Key issues to review Performance Concern The method `remove_maintenance` is added to remove maintenance mode from a worker. However, the method makes synchronous network calls (`simple_client.worker_details_by_name` and `simple_client.worker_modify`) within an asynchronous context. This could block the event loop and affect the performance of the application. Consider refactoring these calls to be asynchronous or running them in a separate thread. Redundant Code The method `_receive_and_handle_control_message` contains a condition to check if `message.control_flag` is `HordeControlFlag.START_INFERENCE` and then preloads a model if not already active. However, the method `preload_model` is called again inside the condition, which seems redundant and could lead to unnecessary preloading of the model. This could be optimized to avoid potential performance issues. Configuration Overlap The `start_inference_process` function has parameters `low_memory_mode`, `high_memory_mode`, and `very_high_memory_mode` that could potentially overlap in functionality. This might lead to confusing behavior depending on how these flags are set. It's recommended to clarify the precedence and interaction of these modes in the documentation or refactor the approach to handle memory management settings more clearly.

CodiumAI-Agent · 2024-09-23T16:44:43Z

Persistent review updated to latest commit f89812c

tazlin and others added 30 commits September 23, 2024 10:37

feat: use horde_engine~=2.14.2

daad6b5

fix: use --novram approach

5dfc26b

Despite the name, `--novram` still allows the GPU to be used. However, comfyui uses this flag to much more aggressively avoid leaving tensors in VRAM. I am hoping that this will reduce VRAM OOMs and/or shared memory usage (in windows).

feat: use horde_engine~=2.14.3

c344985

fix: always unload models from ram

364c7b1

feat: use horde_engine~=2.14.4

5debcf2

fix: unload models more often and more appropriately

c90a019

docs: add missing arg docstrings

73418c1

feat: more configurable memory management

602a958

Redefines the broken existing `high_memory_mode` to leverage the recent memory management extension

chore: log addtl info/config; warn for incorrect high memory mode

47f08c7

feat: use horde_engine~=2.14.5

1c46af6

fix: don't pass memory arg to comfyui when init safety process

9e60095

feat: more informative kudos/user log messages

67ba52b

fix: more acuate units in log messages

55ddc03

fix: pass very high memory mode config to inf. proc.

22202b3

feat: use horde_sdk~=0.14.1

6dd52aa

fix: print to console PROCESS_ENDED message's info

00b6033

This will clarify when the situations such as the shared model manager failing to load or no models being found occur (e.g., when download_models.py isn't)

chore: version bump

3d41fdd

feat: use horde_model_reference>=0.9.0 for flux support

3f7a47c

fix: use latest compat. horde_model_reference

70f7a0b

fix: use horde_sdk==0.14.2

aa0ca39

fix: clarify "currently popped" in log messages

e21f192

doc: custom models

b0bf64b

adds extra_slow_worker and limit_max_steps vars

1ab8b07

feat: horde_sdk==0.14.3 for extra_slow_worker/limit_max_steps

48206d8

fix: remove redundant bridge data fields (already in SDK)

6d74831

style: fix

b1f5013

feat: use horde_engine~=2.14.6

f1bbd5e

feat: use horde_engine==2.15.0

0934de5

tazlin and others added 16 commits September 23, 2024 10:37

fix: add flux to known slow/vram heavy models

6dfcf93

fix: enforce constraints on other configs w/ extra_slow_worker

0c1de03

fix: respect exit_on_unhandled_faults on deadlocks

e142601

feat: adds remove_maintenance_on_init secret var

5ef71b4

style: fix

519f734

fix: less flux slowdown

91d6f35

style: fix

ddc7bfe

fix: better process crash handling/logging

a52d5a3

fix: dont default to low_memory_mode by default

f5ba73c

chore: version bump

d817884

fix: use horde_engine==2.15.2

f972979

fix: reset failed job counter after conseq. pause

2f88886

chore: add suggested settings in README.md

c5f1bd9

chore: update bridgeData_template.yaml

b54376f

tazlin force-pushed the no-vram branch from ceb42ec to b54376f Compare September 23, 2024 14:37

tazlin marked this pull request as ready for review September 23, 2024 14:37

tazlin added 4 commits September 23, 2024 10:40

docs: fix extra line in tempalte

8de3e4d

chore: version bump

e8de582

style: fix

79e71b6

chore: version meta update to require >=9.0.2 by sept 26 UTC

f89812c

tazlin merged commit fbc3c46 into main Sep 23, 2024
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: latest comfyui; fix: better GPU utilization for SD15 #270

feat: latest comfyui; fix: better GPU utilization for SD15 #270

tazlin commented Aug 26, 2024 •

edited

Loading

tazlin commented Sep 23, 2024

CodiumAI-Agent commented Sep 23, 2024 •

edited

Loading

CodiumAI-Agent commented Sep 23, 2024

feat: latest comfyui; fix: better GPU utilization for SD15 #270

feat: latest comfyui; fix: better GPU utilization for SD15 #270

Conversation

tazlin commented Aug 26, 2024 • edited Loading

Changes/fixes:

Relies on:

tazlin commented Sep 23, 2024

CodiumAI-Agent commented Sep 23, 2024 • edited Loading

PR Reviewer Guide 🔍

(Review updated until commit f89812c)

CodiumAI-Agent commented Sep 23, 2024

tazlin commented Aug 26, 2024 •

edited

Loading

CodiumAI-Agent commented Sep 23, 2024 •

edited

Loading

(Review updated until commit `f89812c`)