You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The reason is that the loader creates multiple processes but they all share the same fd and its file handle. As each process reads different offset of the file, it makes the GCSFuse perform really badly because those reads appear to be random read jumping between offsets. For example:
The question I have is why the loading multiple processes share the same fd in the first place? As mmap is already used, even the multiple processes don't share the same fd, the kernel will still map the virtual memory for each process back to the same the page cache naturally, so there is no need to share the fd across the fd.
If they don't share the fd, GCSFuse will perform much better. Therefore, can we disable the fd sharing?
Reproduction
Simply using GCSFuse to serve a file to StableDiffusionPipeline.from_single_file
Surprisingly, when I just tested safetensors.torch.load_file(<file_path>), I didn't see multiple processes. I only see one process (single PID) sequentially read the file, which is pretty fast as GCSFuse is optimized for this sequential read pattern:
I also tried safetensors.torch.load(open(<file_path>, 'rb').read()) as described in comfyanonymous/ComfyUI#1992 (comment). It doesn't show much difference.
Describe the bug
When I use
StableDiffusionPipeline.from_single_file
to load a safetensors model, I noticed that the loading speed is extremely slow when the file is loaded from GCSFuse (https://cloud.google.com/storage/docs/cloud-storage-fuse/overview).The reason is that the loader creates multiple processes but they all share the same fd and its file handle. As each process reads different offset of the file, it makes the GCSFuse perform really badly because those reads appear to be random read jumping between offsets. For example:
The question I have is why the loading multiple processes share the same fd in the first place? As
mmap
is already used, even the multiple processes don't share the same fd, the kernel will still map the virtual memory for each process back to the same the page cache naturally, so there is no need to share the fd across the fd.If they don't share the fd, GCSFuse will perform much better. Therefore, can we disable the fd sharing?
Reproduction
Simply using GCSFuse to serve a file to
StableDiffusionPipeline.from_single_file
Logs
No response
System Info
N/A
Who can help?
@yiyixuxu @asomoza
The text was updated successfully, but these errors were encountered: