Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: latest comfyui; fix: better GPU utilization for SD15 #270

Merged
merged 50 commits into from
Sep 23, 2024
Merged

Conversation

tazlin
Copy link
Member

@tazlin tazlin commented Aug 26, 2024

Changes/fixes:

  • Significant improvements to crash recovery.
    • The worker will no longer crash when there are no jobs for long periods of time.
    • The main process is much more capable of recovering from a sub-process crash
    • The worker will now detect more deadlocks (which are ordinarily impossible but may arise due to difficult to reproduce edge cases) and attempt to recover.
  • Additional log messages and warnings under certain situations, along with some recommendations to resolve.
    • More info is printed out by default in the periodic status message and with more clarity as to its meaning.
    • If the worker will pause popping jobs (such as if too many fail consecutively), a warning that this is happening will appear in every status message.
    • If more than several minutes are spent with no jobs, the worker warns that offering more models can potentially prevent this.
  • Flux support
    • Add "Flux.1-Schnell fp8 (Compact)" to your models_to_load to offer.
  • Updates the README.md to have some additional information about worker configuration.
  • Updated the bridgeData_template.yaml for clarity and new configuration options.
  • Added configuration options extra_slow_worker, limit_max_steps, unload_from_vram_often, high_memory_mode
    • See the updated template and README.md "Suggested settings" section for more information.

Relies on:

tazlin and others added 30 commits September 23, 2024 10:37
Despite the name, `--novram` still allows the GPU to be used. However, comfyui uses this flag to much more aggressively avoid leaving tensors in VRAM. I am hoping that this will reduce VRAM OOMs and/or shared memory usage (in windows).
With some recent comfyui changes it appears that the logic prior to this commit was not aggressive enough to avoid OOMs with relying on comfyui's internal decision making alone.

This commit causes the worker to unload models from VRAM immediately after an inference result (if it is not about to be used) and right before post processing.

Post processing as implemented today almost always overestimates the amount of free VRAM, and tends to cause OOMs or shared memory usage (on window) so more proactively unloading the model should help minimize that problem.
The worker seems to be holding onto too much system RAM on average. I previously relied on comfyui internals to handle this implicitly but recent changes seem to have broken some assumptions I was making. This is an purposely over-zealous attempt to keep system RAM usage down.
Redefines the broken existing `high_memory_mode` to leverage the recent memory management extension
This will clarify when the situations such as the shared model manager failing to load or no models being found occur (e.g., when download_models.py isn't)
tazlin and others added 16 commits September 23, 2024 10:37
- More fallback logic if there are jobs popped, processes available, but nothing happening.

- Resolves certain problems with the unresponsive logic
  - The case of it ending all jobs after a long period of "No Job" messages from the server followed by successful pops.
  - Now no longer shuts down in error while processes are restarting
Tracks the time spent without any available jobs. This will help worker operators identify potential issues with their configuration. A warning will be logged if the worker spends more than 5 minutes without any jobs, suggesting possible actions to increase job demand.
@tazlin tazlin marked this pull request as ready for review September 23, 2024 14:37
@tazlin
Copy link
Member Author

tazlin commented Sep 23, 2024

@CodiumAI-Agent /review

@CodiumAI-Agent
Copy link

CodiumAI-Agent commented Sep 23, 2024

PR Reviewer Guide 🔍

(Review updated until commit f89812c)

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Key issues to review

Performance Concern
The method remove_maintenance is added to remove maintenance mode from a worker. However, the method makes synchronous network calls (simple_client.worker_details_by_name and simple_client.worker_modify) within an asynchronous context. This could block the event loop and affect the performance of the application. Consider refactoring these calls to be asynchronous or running them in a separate thread.

Redundant Code
The method _receive_and_handle_control_message contains a condition to check if message.control_flag is HordeControlFlag.START_INFERENCE and then preloads a model if not already active. However, the method preload_model is called again inside the condition, which seems redundant and could lead to unnecessary preloading of the model. This could be optimized to avoid potential performance issues.

Configuration Overlap
The start_inference_process function has parameters low_memory_mode, high_memory_mode, and very_high_memory_mode that could potentially overlap in functionality. This might lead to confusing behavior depending on how these flags are set. It's recommended to clarify the precedence and interaction of these modes in the documentation or refactor the approach to handle memory management settings more clearly.

@tazlin tazlin merged commit fbc3c46 into main Sep 23, 2024
3 checks passed
@CodiumAI-Agent
Copy link

Persistent review updated to latest commit f89812c

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants