Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: LlamaRotaryEmbedding.forward() got an unexpected keyword argument 'seq_len' when running VILA model inference #126

Open
LanceLeonhart opened this issue Aug 25, 2024 · 3 comments

Comments

@LanceLeonhart
Copy link

Hello,
I am trying to run the VILA model for inference, but I have encountered a couple of issues that I need help with.
(1)FlashAttention Issue: Initially, I faced a problem related to FlashAttention. After going through all relevant issues on Github, I managed to resolve this issue by modifying the relevant code (lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py ).
(2)TypeError Encountered: After addressing the FlashAttention issue, I encountered the following error during model inference:
TypeError: LlamaRotaryEmbedding.forward() got an unexpected keyword argument 'seq_len'
Could you please provide guidance on how to resolve this issue? Any help would be greatly appreciated!

@LanceLeonhart
Copy link
Author

Here is the whole error message:
"[2024-08-25 16:28:12,830] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████| 4/4 [00:09<00:00, 2.48s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
input: \n Please describe the content in the picture.
[WARNING] the auto inferred conversation mode is llava_v0, while --conv-mode is llama_3, using llama_3
torch.Size([1, 3, 384, 384])
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's attention_mask to obtain reliable results.
Setting pad_token_id to eos_token_id:128001 for open-end generation.
Traceback (most recent call last):
File "/home/ny52/VILA/llava/eval/run_vila.py", line 157, in
eval_model(args)
File "/home/ny52/VILA/llava/eval/run_vila.py", line 119, in eval_model
output_ids = model.generate(
File "/home/ny52/miniconda3/envs/vila/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/ny52/VILA/llava/model/llava_arch.py", line 874, in generate
outputs = self.llm.generate(inputs_embeds=inputs_embeds, attention_mask=attention_mask, **generation_kwargs)
File "/home/ny52/miniconda3/envs/vila/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/home/ny52/miniconda3/envs/vila/lib/python3.10/site-packages/transformers/generation/utils.py", line 1525, in generate
return self.sample(
File "/home/ny52/miniconda3/envs/vila/lib/python3.10/site-packages/transformers/generation/utils.py", line 2622, in sample
outputs = self(
File "/home/ny52/miniconda3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ny52/miniconda3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ny52/miniconda3/envs/vila/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1018, in forward
outputs = self.model(
File "/home/ny52/miniconda3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ny52/miniconda3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ny52/miniconda3/envs/vila/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 904, in forward
layer_outputs = decoder_layer(
File "/home/ny52/miniconda3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ny52/miniconda3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ny52/miniconda3/envs/vila/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 658, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/home/ny52/miniconda3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ny52/miniconda3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ny52/miniconda3/envs/vila/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 339, in forward
cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
File "/home/ny52/miniconda3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ny52/miniconda3/envs/vila/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ny52/miniconda3/envs/vila/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
TypeError: LlamaRotaryEmbedding.forward() got an unexpected keyword argument 'seq_len'"

@liuyijiang1994
Copy link

same

@RomeoV
Copy link

RomeoV commented Sep 12, 2024

Pretty sure this broke in 54c9706, as there the LlamaRotaryEmbedding.forward(self, seq_len) arg was removed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants