Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MobileLLM safetensors seem to be missing model.embed_tokens.weight #34759

Open
2 of 4 tasks
avishaiElmakies opened this issue Nov 16, 2024 · 3 comments
Open
2 of 4 tasks
Labels

Comments

@avishaiElmakies
Copy link
Contributor

avishaiElmakies commented Nov 16, 2024

System Info

  • transformers version: 4.46.2
  • Platform: Linux-6.6.20-aufs-1-x86_64-with-glibc2.36
  • Python version: 3.11.2
  • Huggingface_hub version: 0.26.2
  • Safetensors version: 0.4.5
  • Accelerate version: 1.1.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.5.1+cu124 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: no
  • Using GPU in script?: no
  • GPU type: NVIDIA RTX A5000

Who can help?

@ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

mobilellm = AutoModelForCausalLM.from_pretrained("facebook/MobileLLM-125M",trust_remote_code=True)

will output Some weights of MobileLLMForCausalLM were not initialized from the model checkpoint at facebook/MobileLLM-125M and are newly initialized: ['model.embed_tokens.weight']

and the weights will be random. when using use_safetensor=False. everything seems to work as expected.

Expected behavior

using safetensors should work the same as when not using them.

@mayankagarwals
Copy link
Contributor

Hi 👋
Am able to reproduce this, checking this!

@mayankagarwals
Copy link
Contributor

mayankagarwals commented Nov 17, 2024

can you please provide the code snippet where you are not seeing any error (without using safe tensors) @avishaiElmakies

@avishaiElmakies
Copy link
Contributor Author

avishaiElmakies commented Nov 17, 2024

There should be a single "error" about lm_head.weight, since the model uses weight tieing for the embeeding and output layer. Both safetensors and normal loading does this.

the problem is that when using safetensors the embedding layer seems to be missing which causes problems with both the embedding layer and the output layer.

maybe I should have been more clear about that in the bug report (sorry about that).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants