Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FLUX error when loading with low_cpu_mem_usage=False and ignore_mismatched_sizes=True #9343

Open
primecai opened this issue Sep 2, 2024 · 14 comments
Labels
bug Something isn't working

Comments

@primecai
Copy link

primecai commented Sep 2, 2024

Describe the bug

I'd like to change the input layers of FLUX for training some img2img stuff, but got:
TypeError: expected str, bytes or os.PathLike object, not NoneType
when loading FluxTransformer2DModel with low_cpu_mem_usage=False, ignore_mismatched_sizes=True.

Reproduction

from diffusers.models import FluxTransformer2DModel

transformer = FluxTransformer2DModel.from_pretrained(
                "black-forest-labs/FLUX.1-dev",
                subfolder="transformer",
                torch_dtype=weight_dtype,
                low_cpu_mem_usage=False, ignore_mismatched_sizes=True,
                revision=None, variant=None)

Logs

expected str, bytes or os.PathLike object, not NoneType
Traceback (most recent call last):
  File "/home/xxxx/repos/xxxx/.venv/lib/python3.11/site-packages/diffusers/models/model_loading_utils.py", line 104, in load_state_dict
    file_extension = os.path.basename(checkpoint_file).split(".")[-1]
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen posixpath>", line 142, in basename
TypeError: expected str, bytes or os.PathLike object, not NoneType

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/xxxx/repos/xxxx/train.py", line xxxx, in <module>
    main()
  File "/home/xxxx/repos/xxxx/train.py", line xxx, in main
    transformer = load_flux(args, weight_dtype)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xxxx/repos/xxxx/xxxx.py", line xx, in load_flux
    transformer = FluxTransformer2DModel.from_pretrained(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xxxx/repos/xxxx/.venv/lib/python3.11/site-packages/huggingface_hub/utils/_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/xxxx/repos/xxxx/.venv/lib/python3.11/site-packages/diffusers/models/modeling_utils.py", line 828, in from_pretrained
    state_dict = load_state_dict(model_file, variant=variant)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xxxx/repos/xxxx/.venv/lib/python3.11/site-packages/diffusers/models/model_loading_utils.py", line 116, in load_state_dict
    with open(checkpoint_file) as f:
         ^^^^^^^^^^^^^^^^^^^^^
TypeError: expected str, bytes or os.PathLike object, not NoneType

System Info

  • 🤗 Diffusers version: 0.31.0.dev0
  • Platform: Linux-5.15.0-119-generic-x86_64-with-glibc2.31
  • Running on Google Colab?: No
  • Python version: 3.11.9
  • PyTorch version (GPU?): 2.4.0+cu124 (True)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Huggingface_hub version: 0.24.5
  • Transformers version: 4.44.0
  • Accelerate version: 0.33.0
  • PEFT version: 0.12.0
  • Bitsandbytes version: 0.43.3
  • Safetensors version: 0.4.4
  • xFormers version: 0.0.27.post2
  • Accelerator: NVIDIA A100 80GB PCIe, 81920 MiB
    NVIDIA A100 80GB PCIe, 81920 MiB
    NVIDIA A100 80GB PCIe, 81920 MiB
    NVIDIA A100 80GB PCIe, 81920 MiB
    NVIDIA A100 80GB PCIe, 81920 MiB
    NVIDIA A100 80GB PCIe, 81920 MiB
    NVIDIA A100 80GB PCIe, 81920 MiB
    NVIDIA A100 80GB PCIe, 81920 MiB
  • Using GPU in script?: yes
  • Using distributed or parallel set-up in script?: yes

Who can help?

@sayakpaul @DN6

@primecai primecai added the bug Something isn't working label Sep 2, 2024
@sayakpaul
Copy link
Member

Why would you want to not use low_cpu_mem_usage? If you are looking into changing the input channels, you could use something like:
https://github.com/sayakpaul/instructpix2pix-sdxl/blob/be02e48e03c1ce3e15b7687e1cba11d55875e990/scripts/train_instruct_pix2pix_sdxl.py#L590

@primecai
Copy link
Author

primecai commented Sep 2, 2024

Thanks for your reply.
If I'm not mistaken, low_cpu_mem_usage=False has to be used when ignore_mismatched_sizes=True is specified?
I want to change quite a bit with the architecture, and it's just easier if I can have them in a single class...is not using low_cpu_mem_usage not an option? If not I can resort to adding/changing the layers outside the class.

@sayakpaul
Copy link
Member

Will take a look tomorrow.

@yiyixuxu
Copy link
Collaborator

yiyixuxu commented Sep 3, 2024

cc @SunMarc
are we able to, or does it make sense to support shared checkpoint with low_cpu_mem_usage=False?

@Littleor
Copy link

I have the same question, is there any progress on this issue?

@zyx1213271098
Copy link

There is a simple method that is save_pretrained FluxTransformer2DModel in one big *.safetensors file.

@sayakpaul
Copy link
Member

Please try to understand scope of the issue, first. Here we're talking about supporting the loading of big checkpoints (which should usually be sharded) with low_cpu_mem_usage=False. I think we should consider it carefully. Supporting it is no big deal but should we?

A checkpoint that is sharded means it's usually big and hence it's sharded. So, setting low_cpu_mem_usage=False would mean:

  1. We first randomly initialize the underlying model class.
  2. We then populate the sharded checkpoint into the model.

This effectively doubles the model loading time and is also prone to OOMs.

@PromeAIpro
Copy link
Contributor

it can be an issue when accelerate was not installed. as in that case, low_cpu_mem_usage would be set to False and only a warning would be printed , which does not mention the risk of unexpected exceptions when loading sharded models, that will mislead debug process. May print more warning like Missing accelerate, may cause loading failure, etc :) @sayakpaul

if low_cpu_mem_usage and not is_accelerate_available():
low_cpu_mem_usage = False
logger.warning(
"Cannot initialize model with low cpu memory usage because `accelerate` was not found in the"
" environment. Defaulting to `low_cpu_mem_usage=False`. It is strongly recommended to install"
" `accelerate` for faster and less memory-intense model loading. You can do so with: \n```\npip"
" install accelerate\n```\n."
)

@zodiacg
Copy link

zodiacg commented Nov 1, 2024

Same problem here. Fixed after install accelerate. Second @PromeAIpro 's idea about adding a warning with specific details. The error itself is quite misleading.

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Nov 25, 2024
@sayakpaul
Copy link
Member

@PromeAIpro @zodiacg sorry for the delay. Would you maybe open to contributing a PR? Cc: @SunMarc here too.

#10013 is highly relevant I guess.

@github-actions github-actions bot removed the stale Issues that haven't received updates label Nov 26, 2024
@catwell
Copy link
Contributor

catwell commented Dec 12, 2024

I had the same issue today, just running examples from here.

The issue was introduced exactly here (cc. @Wauplin)

In the case where model_file is None and the model is sharded, model_file stays None and hence if low_cpu_mem_usage is False it is still None here.

Copy link

github-actions bot commented Jan 6, 2025

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot added the stale Issues that haven't received updates label Jan 6, 2025
@yiyixuxu yiyixuxu removed the stale Issues that haven't received updates label Jan 6, 2025
@sayakpaul
Copy link
Member

Let's keep a close eye on #10604

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

8 participants