Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Model]: Phi-4 Multimodal Instruct #13936

Closed
1 task done
lhcavalcanti opened this issue Feb 27, 2025 · 6 comments · Fixed by #14119
Closed
1 task done

[New Model]: Phi-4 Multimodal Instruct #13936

lhcavalcanti opened this issue Feb 27, 2025 · 6 comments · Fixed by #14119
Labels
new model Requests to new models

Comments

@lhcavalcanti
Copy link

The model to consider.

New Phi 4 Multimodal: https://huggingface.co/microsoft/Phi-4-multimodal-instruct

The closest model vllm already supports.

https://docs.vllm.ai/en/latest/models/supported_models.html#list-of-multimodal-language-models

What's your difficulty of supporting the model you want?

No response

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@lhcavalcanti lhcavalcanti added the new model Requests to new models label Feb 27, 2025
@lhcavalcanti
Copy link
Author

lhcavalcanti commented Feb 27, 2025

Error when running it with vllm 0.7.3:

NFO 02-27 01:39:00 model_runner.py:1110] Starting to load model /models/Phi-4-multimodal-instruct...
/opt/miniconda/envs/python39/lib/python3.9/site-packages/transformers/models/auto/image_processing_auto.py:594: FutureWarning: The image_processor_class argument is deprecated and will be removed in v4.42. Please use `slow_image_processor_class`, or `fast_image_processor_class` instead
  warnings.warn(
[rank0]: Traceback (most recent call last):
[rank0]:   File "/code/score.py", line 250, in <module>
[rank0]:     engine = AsyncLLMEngine.from_engine_args(engine_args, stat_loggers={"Geneva": GenevaStatsLogger()})
[rank0]:   File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 644, in from_engine_args
[rank0]:     engine = cls(
[rank0]:   File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 594, in __init__
[rank0]:     self.engine = self._engine_class(*args, **kwargs)
[rank0]:   File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/engine/async_llm_engine.py", line 267, in __init__
[rank0]:     super().__init__(*args, **kwargs)
[rank0]:   File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/engine/llm_engine.py", line 273, in __init__
[rank0]:     self.model_executor = executor_class(vllm_config=vllm_config, )
[rank0]:   File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/executor/executor_base.py", line 52, in __init__
[rank0]:     self._init_executor()
[rank0]:   File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/executor/uniproc_executor.py", line 47, in _init_executor
[rank0]:     self.collective_rpc("load_model")
[rank0]:   File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/executor/uniproc_executor.py", line 56, in collective_rpc
[rank0]:     answer = run_method(self.driver_worker, method, args, kwargs)
[rank0]:   File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/utils.py", line 2196, in run_method
[rank0]:     return func(*args, **kwargs)
[rank0]:   File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/worker/worker.py", line 183, in load_model
[rank0]:     self.model_runner.load_model()
[rank0]:   File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/worker/model_runner.py", line 1112, in load_model
[rank0]:     self.model = get_model(vllm_config=self.vllm_config)
[rank0]:   File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/model_executor/model_loader/__init__.py", line 14, in get_model
[rank0]:     return loader.load_model(vllm_config=vllm_config)
[rank0]:   File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/model_executor/model_loader/loader.py", line 406, in load_model
[rank0]:     model = _initialize_model(vllm_config=vllm_config)
[rank0]:   File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/model_executor/model_loader/loader.py", line 115, in _initialize_model
[rank0]:     model_class, _ = get_model_architecture(model_config)
[rank0]:   File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/model_executor/model_loader/utils.py", line 106, in get_model_architecture
[rank0]:     architectures = resolve_transformers_fallback(model_config,
[rank0]:   File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/model_executor/model_loader/utils.py", line 60, in resolve_transformers_fallback
[rank0]:     auto_modules = {
[rank0]:   File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/vllm/model_executor/model_loader/utils.py", line 61, in <dictcomp>
[rank0]:     name: get_class_from_dynamic_module(module, model_config.model)
[rank0]:   File "/opt/miniconda/envs/python39/lib/python3.9/site-packages/transformers/dynamic_module_utils.py", line 536, in get_class_from_dynamic_module
[rank0]:     module_file, class_name = class_reference.split(".")
[rank0]: ValueError: not enough values to unpack (expected 2, got 1)

@DarkLight1337
Copy link
Member

Support is coming very soon!

@lhcavalcanti
Copy link
Author

Thanks for the update @DarkLight1337. Do you have any ETA / rough estimate? Thank you!

@DarkLight1337
Copy link
Member

DarkLight1337 commented Mar 3, 2025

We now have an official PR #14119 under review, feel free to check it out!

@congcongchen123
Copy link
Contributor

Feel free to check out the PR description here for steps on:
1. Starting the server with the base model and vision/speech LoRA weights.
2. Sending requests to the OpenAI-compatible server.

@congcongchen123
Copy link
Contributor

We also found that LoRA(Punica) kernels are quite slow for Phi-4-multimodal-instruct. Here’s the fix: PR #14272. With this fix, we observed up to a 5x improvement in generation speed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new model Requests to new models
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants