Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Model] New model support for Phi-4-multimodal-instruct #14119

Conversation

congcongchen123
Copy link
Contributor

@congcongchen123 congcongchen123 commented Mar 3, 2025

New model for https://huggingface.co/microsoft/Phi-4-multimodal-instruct/tree/main

co-author: Jacob Platin and Vadim Mazalov for speech encoder and Yen-Chun Chen for vision encoder.

FIX #13936


Add instructions on how to load the model with lora checkpoints:

Run OpenAI compatible server:

  • Since the vision-lora and speech-lora co-exist with the base model, we have to manually specify the path of the lora weights:
model_path = snapshot_download(repo_id=args.model_path)
vision_lora_path = model_path + "/vision-lora"
speech_lora_path = model_path + "/speech-lora"
  • Run the server:
python -m vllm.entrypoints.openai.api_server --model 'microsoft/Phi-4-multimodal-instruct' --dtype auto --trust-remote-code --max-model-len 131072 --max-seq-len-to-capture=131072 --enable-lora --max-lora-rank 320 --lora-extra-vocab-size 0 --limit-mm-per-prompt audio=3,image=3 --max-loras 2 --lora-modules speech=<speech_lora_path> vision=<vision_lora_path>

Note that --max-seq-len-to-capture=131072 is super important otherwise this will be set to be 8k by default, and vLLM will surprisingly disble cuda_graph for sequence with length larger than 8k. With cuda_graph disabled unexpectly, we have seen signifcant time between token degradation (twice as slow).

  • Send a curl request:
curl -X POST "http://localhost:8000/v1/chat/completions"       -H "Content-Type: application/json"       --data '{
          "model": "vision",
          "messages": [
            { "role": "user", "content": [ {"type": "text","text": "Represent the given image."},{ "type": "image_url", "image_url": { "url": "https://alinasayre.com/wp-content/uploads/2013/10/d67cd-dsc01646.jpg" } } ]}
         ]
      }'      

Got result:

{"id":"chatcmpl-fcc36661176a4aa99f20268969eea2be","object":"chat.completion","created":1741166187,"model":"vision","choices":[{"index":0,"message":{"role":"assistant","reasoning_content":null,"content":"Image description: A faded yellow traffic sign with one-lane and bridge warnings.","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":200020}],"usage":{"prompt_tokens":3402,"total_tokens":3419,"completion_tokens":17,"prompt_tokens_details":null},"prompt_logprobs":null}

Copy link

github-actions bot commented Mar 3, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added documentation Improvements or additions to documentation ci/build frontend labels Mar 3, 2025
@DarkLight1337
Copy link
Member

Please update the tests as described here: https://docs.vllm.ai/en/latest/contributing/model/tests.html

@@ -0,0 +1,531 @@
# Implements a simple offline inference script for the Phi 3.5 Speech model.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will delete this test file once the review is done.

@@ -38,3 +38,4 @@ depyf==0.18.0 # required for profiling and debugging with compilation config
cloudpickle # allows pickling lambda functions in model_executor/models/registry.py
watchfiles # required for http server to monitor the updates of TLS files
python-json-logger # Used by logging as per examples/other/logging_configuration.md
scipy # Required for phi-4-multimodal-instruct
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to remove the scipy dep if possible. It is okay if we do it in a follow up PR. Seems like it is only being used for a single function scipy.signal.resample_poly in phi4mm.py

        # Resample to 16000 or 8000 if needed
        if fs > 16000:
            wav = scipy.signal.resample_poly(wav, 1, fs // 16000)
            fs = 16000
        elif 8000 < fs < 16000:
            wav = scipy.signal.resample_poly(wav, 1, fs // 8000)
            fs = 8000
        elif fs < 8000:
            raise RuntimeError(f"Unsupported sample rate {fs}")

        if fs == 8000:
            if self._eightk_method == "resample":
                # Input audio is 8 kHz. Convert to 16 kHz before feature
                # extraction
                wav = scipy.signal.resample_poly(wav, 2, 1)
                fs = 16000
            # Do nothing here for fillzero method
        elif fs != 16000:
            # Input audio is not a supported sample rate.
            raise RuntimeError(
                f"Input data using an unsupported sample rate: {fs}"
            )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, will do that in the following up PR.

Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're here, can you update the supported models page to also include text-only version of Phi-4?

@congcongchen123
Copy link
Contributor Author

congcongchen123 commented Mar 4, 2025

Hi @DarkLight1337, I have addressed all the new comments, could you please approve this PR if it looks good to you? I would like to ship this as soon as possible since people are waiting on this to try out Phi-4-multimodal-instruct on vLLM.

@congcongchen123 congcongchen123 force-pushed the congcongchen/phi-4-multimodal-instruct branch from 78f037b to 41d0103 Compare March 4, 2025 21:13
Copy link
Member

@ywang96 ywang96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@congcongchen123 I left two final comments to help us ship this PR in. Thanks for the effort!

@congcongchen123
Copy link
Contributor Author

BTW @ywang96 , should I merge latest main into my branch one more time to make it ready to ship?

@ywang96
Copy link
Member

ywang96 commented Mar 4, 2025

BTW @ywang96 , should I merge latest main into my branch one more time to make it ready to ship?

Yep that would be great!

Signed-off-by: Congcong Chen <[email protected]>
Signed-off-by: Congcong Chen <[email protected]>
@congcongchen123 congcongchen123 force-pushed the congcongchen/phi-4-multimodal-instruct branch from 41d0103 to 08c845a Compare March 4, 2025 21:29
@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 4, 2025
@congcongchen123
Copy link
Contributor Author

congcongchen123 commented Mar 4, 2025

Hi @ywang96 the buildkite/ci/pr/basic-models-test test failed because of the following check in vllm/model_executor/models/vision_siglip_navit.py:

self.attn_backend: _Backend = get_vit_attn_backend(support_fa=True)
if self.attn_backend != _Backend.FLASH_ATTN:
    raise RuntimeError(
        "Phi-4-multimodal-instruct model does not support"\
            f" {self.attn_backend} backend now."
    )

Looks like the testing container does not have flash_attn package installed, and flash_attn is required by vllm/model_executor/models/vision_siglip_navit.py. What is the suggested way to fix that? Thanks


To answer myself, as a work around, maybe I could print out the logging messages instead of throwing runtime error. We will come back on this in the following PRs.

congcongchen123 and others added 2 commits March 4, 2025 15:51
…t does not have flash_attn installed

Signed-off-by: Congcong Chen <[email protected]>
@ywang96
Copy link
Member

ywang96 commented Mar 4, 2025

Hi @ywang96 the buildkite/ci/pr/basic-models-test test failed because of the following check in vllm/model_executor/models/vision_siglip_navit.py:

self.attn_backend: _Backend = get_vit_attn_backend(support_fa=True)
if self.attn_backend != _Backend.FLASH_ATTN:
    raise RuntimeError(
        "Phi-4-multimodal-instruct model does not support"\
            f" {self.attn_backend} backend now."
    )

Looks like the testing container does not have flash_attn package installed, and flash_attn is required by vllm/model_executor/models/vision_siglip_navit.py. What is the suggested way to fix that? Thanks

To answer myself, as a work around, maybe I could print out the logging messages instead of throwing runtime error. We will come back on this in the following PRs.

        # Detect attention implementation.
        self.attn_backend: _Backend = get_vit_attn_backend(support_fa=True)
        if self.attn_backend != _Backend.FLASH_ATTN:
            # Only print out errors for now to make ci/pr/basic-models-test
            # happy since the testing environment does not have flash_attn
            # installed.
            logger.error("Phi-4-multimodal-instruct model does not support "\
                         "%s backend now.", self.attn_backend)

was added to the wrong place, and I just corrected it.

@congcongchen123
Copy link
Contributor Author

congcongchen123 commented Mar 5, 2025

The kernel test failure looks unrelated to this PR, is there a way to bypass that and merge this PR?
If not, what steps should I take?

@vllm-bot vllm-bot merged commit 0a995d5 into vllm-project:main Mar 5, 2025
58 of 61 checks passed
@DarkLight1337
Copy link
Member

It is a known failure on main, I have force-merged this PR.

@congcongchen123
Copy link
Contributor Author

Got it! Thanks @DarkLight1337 !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci/build documentation Improvements or additions to documentation frontend ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[New Model]: Phi-4 Multimodal Instruct
9 participants