FLUX error when loading with low_cpu_mem_usage=False and ignore_mismatched_sizes=True #9343

primecai · 2024-09-02T11:28:48Z

Describe the bug

I'd like to change the input layers of FLUX for training some img2img stuff, but got:
TypeError: expected str, bytes or os.PathLike object, not NoneType
when loading FluxTransformer2DModel with low_cpu_mem_usage=False, ignore_mismatched_sizes=True.

Reproduction

from diffusers.models import FluxTransformer2DModel

transformer = FluxTransformer2DModel.from_pretrained(
                "black-forest-labs/FLUX.1-dev",
                subfolder="transformer",
                torch_dtype=weight_dtype,
                low_cpu_mem_usage=False, ignore_mismatched_sizes=True,
                revision=None, variant=None)

Logs

expected str, bytes or os.PathLike object, not NoneType
Traceback (most recent call last):
  File "/home/xxxx/repos/xxxx/.venv/lib/python3.11/site-packages/diffusers/models/model_loading_utils.py", line 104, in load_state_dict
    file_extension = os.path.basename(checkpoint_file).split(".")[-1]
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen posixpath>", line 142, in basename
TypeError: expected str, bytes or os.PathLike object, not NoneType

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/xxxx/repos/xxxx/train.py", line xxxx, in <module>
    main()
  File "/home/xxxx/repos/xxxx/train.py", line xxx, in main
    transformer = load_flux(args, weight_dtype)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xxxx/repos/xxxx/xxxx.py", line xx, in load_flux
    transformer = FluxTransformer2DModel.from_pretrained(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xxxx/repos/xxxx/.venv/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/xxxx/repos/xxxx/.venv/lib/python3.11/site-packages/diffusers/models/modeling_utils.py", line 828, in from_pretrained
    state_dict = load_state_dict(model_file, variant=variant)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xxxx/repos/xxxx/.venv/lib/python3.11/site-packages/diffusers/models/model_loading_utils.py", line 116, in load_state_dict
    with open(checkpoint_file) as f:
         ^^^^^^^^^^^^^^^^^^^^^
TypeError: expected str, bytes or os.PathLike object, not NoneType

System Info

🤗 Diffusers version: 0.31.0.dev0
Platform: Linux-5.15.0-119-generic-x86_64-with-glibc2.31
Running on Google Colab?: No
Python version: 3.11.9
PyTorch version (GPU?): 2.4.0+cu124 (True)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Huggingface_hub version: 0.24.5
Transformers version: 4.44.0
Accelerate version: 0.33.0
PEFT version: 0.12.0
Bitsandbytes version: 0.43.3
Safetensors version: 0.4.4
xFormers version: 0.0.27.post2
Accelerator: NVIDIA A100 80GB PCIe, 81920 MiB
NVIDIA A100 80GB PCIe, 81920 MiB
NVIDIA A100 80GB PCIe, 81920 MiB
NVIDIA A100 80GB PCIe, 81920 MiB
NVIDIA A100 80GB PCIe, 81920 MiB
NVIDIA A100 80GB PCIe, 81920 MiB
NVIDIA A100 80GB PCIe, 81920 MiB
NVIDIA A100 80GB PCIe, 81920 MiB
Using GPU in script?: yes
Using distributed or parallel set-up in script?: yes

Who can help?

@sayakpaul @DN6

The text was updated successfully, but these errors were encountered:

sayakpaul · 2024-09-02T11:46:27Z

Why would you want to not use low_cpu_mem_usage? If you are looking into changing the input channels, you could use something like:
https://github.com/sayakpaul/instructpix2pix-sdxl/blob/be02e48e03c1ce3e15b7687e1cba11d55875e990/scripts/train_instruct_pix2pix_sdxl.py#L590

primecai · 2024-09-02T13:00:25Z

Thanks for your reply.
If I'm not mistaken, low_cpu_mem_usage=False has to be used when ignore_mismatched_sizes=True is specified?
I want to change quite a bit with the architecture, and it's just easier if I can have them in a single class...is not using low_cpu_mem_usage not an option? If not I can resort to adding/changing the layers outside the class.

sayakpaul · 2024-09-02T13:17:05Z

Will take a look tomorrow.

yiyixuxu · 2024-09-03T00:06:00Z

cc @SunMarc
are we able to, or does it make sense to support shared checkpoint with low_cpu_mem_usage=False?

Littleor · 2024-09-22T09:04:48Z

I have the same question, is there any progress on this issue?

zyx1213271098 · 2024-09-23T07:00:41Z

There is a simple method that is save_pretrained FluxTransformer2DModel in one big *.safetensors file.

sayakpaul · 2024-09-23T07:06:53Z

Please try to understand scope of the issue, first. Here we're talking about supporting the loading of big checkpoints (which should usually be sharded) with low_cpu_mem_usage=False. I think we should consider it carefully. Supporting it is no big deal but should we?

A checkpoint that is sharded means it's usually big and hence it's sharded. So, setting low_cpu_mem_usage=False would mean:

We first randomly initialize the underlying model class.
We then populate the sharded checkpoint into the model.

This effectively doubles the model loading time and is also prone to OOMs.

PromeAIpro · 2024-10-14T06:59:58Z

it can be an issue when accelerate was not installed. as in that case, low_cpu_mem_usage would be set to False and only a warning would be printed , which does not mention the risk of unexpected exceptions when loading sharded models, that will mislead debug process. May print more warning like Missing accelerate, may cause loading failure, etc ：） @sayakpaul

diffusers/src/diffusers/models/modeling_utils.py

Lines 533 to 540 in 86bcbc3

    
           if low_cpu_mem_usage and not is_accelerate_available(): 
        
               low_cpu_mem_usage = False 
        
               logger.warning( 
        
                   "Cannot initialize model with low cpu memory usage because `accelerate` was not found in the" 
        
                   " environment. Defaulting to `low_cpu_mem_usage=False`. It is strongly recommended to install" 
        
                   " `accelerate` for faster and less memory-intense model loading. You can do so with: \n```\npip" 
        
                   " install accelerate\n```\n." 
        
               )

zodiacg · 2024-11-01T07:11:25Z

Same problem here. Fixed after install accelerate. Second @PromeAIpro 's idea about adding a warning with specific details. The error itself is quite misleading.

github-actions · 2024-11-25T15:04:26Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

sayakpaul · 2024-11-25T15:18:17Z

@PromeAIpro @zodiacg sorry for the delay. Would you maybe open to contributing a PR? Cc: @SunMarc here too.

#10013 is highly relevant I guess.

catwell · 2024-12-12T16:40:26Z

I had the same issue today, just running examples from here.

The issue was introduced exactly here (cc. @Wauplin)

In the case where model_file is None and the model is sharded, model_file stays None and hence if low_cpu_mem_usage is False it is still None here.

github-actions · 2025-01-06T15:05:13Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

sayakpaul · 2025-01-20T05:53:12Z

Let's keep a close eye on #10604

primecai added the bug Something isn't working label Sep 2, 2024

github-actions bot added the stale Issues that haven't received updates label Nov 25, 2024

github-actions bot removed the stale Issues that haven't received updates label Nov 26, 2024

github-actions bot added the stale Issues that haven't received updates label Jan 6, 2025

yiyixuxu removed the stale Issues that haven't received updates label Jan 6, 2025

sayakpaul mentioned this issue Jan 20, 2025

[FEAT] Model loading refactor #10604

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FLUX error when loading with low_cpu_mem_usage=False and ignore_mismatched_sizes=True #9343

FLUX error when loading with low_cpu_mem_usage=False and ignore_mismatched_sizes=True #9343

primecai commented Sep 2, 2024

sayakpaul commented Sep 2, 2024

primecai commented Sep 2, 2024 •

edited

Loading

sayakpaul commented Sep 2, 2024

yiyixuxu commented Sep 3, 2024

Littleor commented Sep 22, 2024

zyx1213271098 commented Sep 23, 2024

sayakpaul commented Sep 23, 2024

PromeAIpro commented Oct 14, 2024

zodiacg commented Nov 1, 2024

github-actions bot commented Nov 25, 2024

sayakpaul commented Nov 25, 2024

catwell commented Dec 12, 2024

github-actions bot commented Jan 6, 2025

sayakpaul commented Jan 20, 2025

FLUX error when loading with low_cpu_mem_usage=False and ignore_mismatched_sizes=True #9343

FLUX error when loading with low_cpu_mem_usage=False and ignore_mismatched_sizes=True #9343

Comments

primecai commented Sep 2, 2024

Describe the bug

Reproduction

Logs

System Info

Who can help?

sayakpaul commented Sep 2, 2024

primecai commented Sep 2, 2024 • edited Loading

sayakpaul commented Sep 2, 2024

yiyixuxu commented Sep 3, 2024

Littleor commented Sep 22, 2024

zyx1213271098 commented Sep 23, 2024

sayakpaul commented Sep 23, 2024

PromeAIpro commented Oct 14, 2024

zodiacg commented Nov 1, 2024

github-actions bot commented Nov 25, 2024

sayakpaul commented Nov 25, 2024

catwell commented Dec 12, 2024

github-actions bot commented Jan 6, 2025

sayakpaul commented Jan 20, 2025

primecai commented Sep 2, 2024 •

edited

Loading