[Bugfix] handle alignment of arguments in convert_sparse_cross_attention_mask_to_dense #12347

tjohnson31415 · 2025-01-23T07:53:31Z

Reproducing the bug reqiures a batch with a text-only request and a request with an image. With the OpenAI server, this can happen when under load, but it is easier to reproduce in offline mode:

from vllm import LLM, SamplingParams
from vllm.multimodal.utils import fetch_image

image_url = "https://upload.wikimedia.org/wikipedia/commons/d/da/2015_Kaczka_krzy%C5%BCowka_w_wodzie_%28samiec%29.jpg"
image_data = fetch_image(image_url)

model_name = "meta-llama/Llama-3.2-11B-Vision-Instruct"

llm = LLM(
    model=model_name,
    max_model_len=4096,
    max_num_seqs=2,
    enforce_eager=True,
)

sampling_params = SamplingParams(
    temperature=0.0,
    max_tokens=128
)

outputs = llm.generate(
    prompts=[
        {
            "prompt": "What is the capital of Spain?",
        },
        {
            "prompt": "Analyze this image <|image|>. What do you see?",
            "multi_modal_data": {
                "image": image_data,
            },
        },
    ],
    sampling_params=sampling_params
)

This would have been a crash before #11939 but now results in an AssertionError. The reason is that the num_tiles: List[List[int]] passed in to convert_sparse_cross_attention_mask_to_dense would not have any entry in the list for a sequence that is text only. The inputs to the function would be like:

sparse_mask = [[], [[5, -1]]]
num_tiles = [[4]]
lengths = [7, 12]

The fix in this PR is to skip [] entries in sparse_mask for text-only requests.

A better fix may be to have num_tiles created with an entry for each sequence, but I didn't see where to make that change.

Potential fix for #10648

Co-authored-by: Wallas Santos <[email protected]> Signed-off-by: Travis Johnson <[email protected]>

…_mask_to_dense Without the alignemnt the an AssertionError is raised if a text-only sequence precedes one with an image. Co-authored-by: Wallas Santos <[email protected]> Signed-off-by: Travis Johnson <[email protected]>

github-actions · 2025-01-23T07:53:43Z

👋 Hi! Thank you for contributing to the vLLM project.
Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can do one of these:

Add ready label to the PR
Enable auto-merge.

🚀

DarkLight1337

Thanks for fixing!

heheda12345 · 2025-01-23T15:06:15Z

LGTM! Thank you for the bug fix.
@DarkLight1337 Does @large_gpu_test(min_gb=48) means this new test will be skipped during CI? If that is true, I think we should implement some test for get_cross_attention_mask (and also get_cross_attention_states & get_full_text_row_masked_out_mask if possible)

DarkLight1337 · 2025-01-23T15:12:41Z

@DarkLight1337 Does @large_gpu_test(min_gb=48) means this new test will be skipped during CI?

Yes, that is correct.

heheda12345 · 2025-01-23T15:17:01Z

@tjohnson31415 Can you change e2e test to a test for get_cross_attention_mask? (and also get_cross_attention_states & get_full_text_row_masked_out_mask if possible)

Signed-off-by: Wallas Santos <[email protected]>

…error Signed-off-by: Wallas Santos <[email protected]>

Signed-off-by: Wallas Santos <[email protected]>

wallashss · 2025-01-28T17:59:13Z

Hey @heheda12345 , I added some tests that would be enough to prevent regression on this issue. Please, see if you agree.

Thanks!

cc @tjohnson31415

heheda12345

LGTM! Thanks for the bug fix and new tests!

…ion_mask_to_dense (vllm-project#12347) Signed-off-by: Travis Johnson <[email protected]> Signed-off-by: Wallas Santos <[email protected]> Co-authored-by: Wallas Santos <[email protected]>

…ion_mask_to_dense (vllm-project#12347) Signed-off-by: Travis Johnson <[email protected]> Signed-off-by: Wallas Santos <[email protected]> Co-authored-by: Wallas Santos <[email protected]> Signed-off-by: Isotr0py <[email protected]>

…ion_mask_to_dense (vllm-project#12347) Signed-off-by: Travis Johnson <[email protected]> Signed-off-by: Wallas Santos <[email protected]> Co-authored-by: Wallas Santos <[email protected]>

tjohnson31415 and others added 2 commits January 22, 2025 23:29

test: add regression tests for mllama

c807b07

Co-authored-by: Wallas Santos <[email protected]> Signed-off-by: Travis Johnson <[email protected]>

tjohnson31415 requested review from DarkLight1337 and ywang96 as code owners January 23, 2025 07:53

ywang96 assigned heheda12345 Jan 23, 2025

DarkLight1337 approved these changes Jan 23, 2025

View reviewed changes

wallashss added 6 commits January 27, 2025 15:38

feat: tests for mllama

994dd1d

Signed-off-by: Wallas Santos <[email protected]>

fix: mypy CI error

4e5f6a4

Signed-off-by: Wallas Santos <[email protected]>

Merge remote-tracking branch 'origin/main' into fix-mllama-assertion-…

7bd25b5

…error Signed-off-by: Wallas Santos <[email protected]>

feat: updated tests

36359d2

Signed-off-by: Wallas Santos <[email protected]>

feat: added test_get_full_text_row_masked_out_mask

60edbde

Signed-off-by: Wallas Santos <[email protected]>

test cleanup

ca29464

Signed-off-by: Wallas Santos <[email protected]>

heheda12345 approved these changes Jan 29, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) January 29, 2025 04:17

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jan 29, 2025

DarkLight1337 merged commit 036ca94 into vllm-project:main Jan 29, 2025
61 checks passed

tjohnson31415 mentioned this pull request Jan 29, 2025

[Bug]: Llama 3.2 90b crash #10648

Open

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bugfix] handle alignment of arguments in convert_sparse_cross_attention_mask_to_dense #12347

[Bugfix] handle alignment of arguments in convert_sparse_cross_attention_mask_to_dense #12347

tjohnson31415 commented Jan 23, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Jan 23, 2025

DarkLight1337 left a comment

heheda12345 commented Jan 23, 2025

DarkLight1337 commented Jan 23, 2025

heheda12345 commented Jan 23, 2025

wallashss commented Jan 28, 2025

heheda12345 left a comment

[Bugfix] handle alignment of arguments in convert_sparse_cross_attention_mask_to_dense #12347

[Bugfix] handle alignment of arguments in convert_sparse_cross_attention_mask_to_dense #12347

Conversation

tjohnson31415 commented Jan 23, 2025 • edited by github-actions bot Loading

github-actions bot commented Jan 23, 2025

DarkLight1337 left a comment

Choose a reason for hiding this comment

heheda12345 commented Jan 23, 2025

DarkLight1337 commented Jan 23, 2025

heheda12345 commented Jan 23, 2025

wallashss commented Jan 28, 2025

heheda12345 left a comment

Choose a reason for hiding this comment

tjohnson31415 commented Jan 23, 2025 •

edited by github-actions bot

Loading