You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A regression as of at least #309 and reported by a beta tester of that branch, it is possible for the worker to enter a state where two models are constantly loaded/unloaded on a non-working process. The scenario reported was such:
max_threads == 2
flux fp8 was currently working (other inference blocked due to large model blocking logic)
Two queued models alternatively would be loaded into a single process.
Likely candidates for root cause include the keep_single_inference(...) and the get_next_job_and_process(...) functions.
See this log exceprt:
reGen | 2024-11-02 19:20:44.556 | INFO | [HWRPM]:preload_models:2002 - Already preloading 1 models, waiting for one to finish before preloading AlbedoBase XL (SDXL)
reGen | 2024-11-02 19:20:45.568 | INFO | [HWRPM]:receive_and_handle_process_messages:1748 - Process 2 moved model Juggernaut XL to system RAM. Loading took 1.14 seconds.
reGen | 2024-11-02 19:20:45.669 | INFO | [HWRPM]:receive_and_handle_process_messages:1752 - Process 2 unloaded model Juggernaut XL
reGen | 2024-11-02 19:20:45.772 | INFO | [HWRPM]:preload_models:2002 - Already preloading 1 models, waiting for one to finish before preloading Juggernaut XL
reGen | 2024-11-02 19:20:46.684 | INFO | [HWRPM]:receive_and_handle_process_messages:1748 - Process 2 moved model AlbedoBase XL (SDXL) to system RAM. Loading took 1.11 seconds.
reGen | 2024-11-02 19:20:46.887 | INFO | [HWRPM]:receive_and_handle_process_messages:1752 - Process 2 unloaded model AlbedoBase XL (SDXL)
reGen | 2024-11-02 19:20:46.990 | INFO | [HWRPM]:preload_models:2002 - Already preloading 1 models, waiting for one to finish before preloading AlbedoBase XL (SDXL)
reGen | 2024-11-02 19:20:47.702 | INFO | [HWRPM]:print_status_method:4061 - Process info:
reGen | 2024-11-02 19:20:47.702 | INFO | [HWRPM]:print_status_method:4063 - Process 0: (SAFETY) WAITING_FOR_JOB
reGen | 2024-11-02 19:20:47.702 | INFO | [HWRPM]:print_status_method:4063 - Process 1 (INFERENCE_STARTING): (Flux.1-Schnell fp16 (Compact) [last event: 171.25 secs ago: START_INFERENCE]
reGen | 2024-11-02 19:20:47.702 | INFO | [HWRPM]:print_status_method:4063 - Process 2 (PRELOADING_MODEL): (Juggernaut XL [last event: 0.81 secs ago: PRELOAD_MODEL]
reGen | 2024-11-02 19:20:47.702 | INFO | [HWRPM]:print_status_method:4063 - Process 3 (WAITING_FOR_JOB): (AbsoluteReality [last event: 302.43 secs ago: START_INFERENCE]
reGen | 2024-11-02 19:20:47.702 | INFO | [HWRPM]:print_status_method:4066 - dreamer_name: worker-23423432432432 | (v9.2.0) | horde user: worker-23423432432432#312152 | num_models: 187 | max_power: 64 (1448x1448) | max_threads: 2 | queue_size: 1 | safety_on_gpu: True
reGen | 2024-11-02 19:20:47.703 | INFO | [HWRPM]:print_status_method:4121 - Jobs: <aa1e6144-8234-4a9d-9b56-a3bfef10fee6: Flux.1-Schnell fp16 (Compact)>, <e6319011-a7b0-43d1-9e75-dea10e0022c1: Juggernaut XL>, <31dcabdb-10b1-4248-9a25-531e52af9029: AlbedoBase XL (SDXL)>
reGen | 2024-11-02 19:20:47.703 | INFO | [HWRPM]:print_status_method:4129 - Active models: {'Flux.1-Schnell fp16 (Compact)', 'Juggernaut XL', 'AbsoluteReality'}
reGen | 2024-11-02 19:20:47.703 | SUCCESS | [HWRPM]:print_status_method:4145 - Session job info: currently popped: 3 (eMPS: 112) | submitted: 23 | faulted: 0 | slow_jobs: 0 | process_recoveries: 0 | 0.00 seconds without jobs
reGen | 2024-11-02 19:20:47.804 | INFO | [HWRPM]:_process_control_loop:3862 - Blocking further inference because batch or slow_model inference in process.
reGen | 2024-11-02 19:20:48.006 | INFO | [HWRPM]:receive_and_handle_process_messages:1748 - Process 2 moved model Juggernaut XL to system RAM. Loading took 1.14 seconds.
reGen | 2024-11-02 19:20:48.109 | INFO | [HWRPM]:receive_and_handle_process_messages:1752 - Process 2 unloaded model Juggernaut XL
reGen | 2024-11-02 19:20:48.211 | INFO | [HWRPM]:preload_models:2002 - Already preloading 1 models, waiting for one to finish before preloading Juggernaut XL
reGen | 2024-11-02 19:20:49.430 | INFO | [HWRPM]:receive_and_handle_process_messages:1748 - Process 2 moved model AlbedoBase XL (SDXL) to system RAM. Loading took 1.36 seconds.
reGen | 2024-11-02 19:20:49.533 | INFO | [HWRPM]:receive_and_handle_process_messages:1752 - Process 2 unloaded model AlbedoBase XL (SDXL)
reGen | 2024-11-02 19:20:49.635 | INFO | [HWRPM]:preload_models:2002 - Already preloading 1 models, waiting for one to finish before preloading AlbedoBase XL (SDXL)
reGen | 2024-11-02 19:20:50.650 | INFO | [HWRPM]:receive_and_handle_process_messages:1748 - Process 2 moved model Juggernaut XL to system RAM. Loading took 1.13 seconds.
reGen | 2024-11-02 19:20:50.754 | INFO | [HWRPM]:receive_and_handle_process_messages:1752 - Process 2 unloaded model Juggernaut XL
reGen | 2024-11-02 19:20:50.856 | INFO | [HWRPM]:preload_models:2002 - Already preloading 1 models, waiting for one to finish before preloading Juggernaut XL
reGen | 2024-11-02 19:20:51.867 | INFO | [HWRPM]:receive_and_handle_process_messages:1748 - Process 2 moved model AlbedoBase XL (SDXL) to system RAM. Loading took 1.09 seconds.
reGen | 2024-11-02 19:20:51.970 | INFO | [HWRPM]:receive_and_handle_process_messages:1752 - Process 2 unloaded model AlbedoBase XL (SDXL)
reGen | 2024-11-02 19:20:52.073 | INFO | [HWRPM]:preload_models:2002 - Already preloading 1 models, waiting for one to finish before preloading AlbedoBase XL (SDXL)
reGen | 2024-11-02 19:20:53.083 | INFO | [HWRPM]:receive_and_handle_process_messages:1748 - Process 2 moved model Juggernaut XL to system RAM. Loading took 1.16 seconds.
reGen | 2024-11-02 19:20:53.186 | INFO | [HWRPM]:receive_and_handle_process_messages:1752 - Process 2 unloaded model Juggernaut XL
The text was updated successfully, but these errors were encountered:
A regression as of at least #309 and reported by a beta tester of that branch, it is possible for the worker to enter a state where two models are constantly loaded/unloaded on a non-working process. The scenario reported was such:
max_threads
== 2flux fp8
was currently working (other inference blocked due to large model blocking logic)Likely candidates for root cause include the
keep_single_inference(...)
and theget_next_job_and_process(...)
functions.See this log exceprt:
The text was updated successfully, but these errors were encountered: