Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama3.2 + cross attn test #220

Merged
merged 7 commits into from
Oct 4, 2024
Merged

llama3.2 + cross attn test #220

merged 7 commits into from
Oct 4, 2024

Conversation

maleksan85
Copy link

Commands:
server

 VLLM_NO_TUNED_GEMM=1 vllm serve /data/models/Llama-3.2-90B-Vision-Instruct --tensor_parallel_size 2 --enforce-eager --limit-mm-per-prompt "image=2" --max-num-seqs 32 --max_model_len 8192

client (from https://huggingface.co/nltpt/VLLM-llama3.2):

root@banff-cyxtera-s82-5:~/workspace/VLLM-llama3.2# python openai_vision_api_client.py
Chat completion output: The image depicts a serene lake scene with a wooden dock extending into the water, surrounded by lush greenery and majestic mountains in the background. The overall atmosphere of the image exudes tranquility and natural beauty, inviting the viewer to step into its peaceful world.
remove me: testing done, exitting...

Copy link
Collaborator

@shajrawi shajrawi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!!
Few nits + one question.
Also, can you please make the linter happy? :)

tests/kernels/test_encoder_decoder_attn.py Outdated Show resolved Hide resolved
tests/kernels/utils.py Show resolved Hide resolved
vllm/attention/backends/rocm_flash_attn.py Outdated Show resolved Hide resolved
vllm/attention/backends/rocm_flash_attn.py Outdated Show resolved Hide resolved
vllm/model_executor/layers/linear.py Outdated Show resolved Hide resolved
shajrawi
shajrawi previously approved these changes Oct 4, 2024
Copy link
Collaborator

@shajrawi shajrawi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good - assuming performance does not regress due to reshape

@maleksan85 maleksan85 merged commit 2550f14 into main Oct 4, 2024
16 of 17 checks passed
@gshtras gshtras deleted the maleksan_llama32_support branch October 24, 2024 18:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants