You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
INFO 02-07 13:57:49 cuda.py:230] Using Flash Attention backend.
Feb 07 21:57:50.825
INFO 02-07 13:57:50 model_runner.py:1110] Starting to load model /pointer/unsloth/Qwen2.5-VL-72B-Instruct-bnb-4bit...
Feb 07 21:57:51.164
WARNING 02-07 13:57:51 vision.py:94] Current vllm-flash-attn has a bug inside vision module, so we use xformers backend instead. You can run pip install flash-attn to use flash-attention backend.
ERROR 02-07 13:58:32 engine.py:389] RuntimeError: shape '[3, 16, 80, 1280]' is invalid for input of size 2457600 106, in __torch_function__load_weightsshts
Feb 07 21:58:32.651
Process SpawnProcess-1:
Feb 07 21:58:32.656
Traceback (most recent call last): File "/usr/local/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap self.run() File "/usr/local/lib/python3.12/multiprocessing/process.py", line 108, in run self._target(*self._args, **self._kwargs) File "/usr/local/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 391, in run_mp_engine raise e File "/usr/local/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 380, in run_mp_engine engine = MQLLMEngine.from_engine_args(engine_args=engine_args, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 123, in from_engine_args return cls(ipc_path=ipc_path, ^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/vllm/engine/multiprocessing/engine.py", line 75, in init self.engine = LLMEngine(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/vllm/engine/llm_engine.py", line 273, in init self.model_executor = executor_class(vllm_config=vllm_config, ) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/vllm/executor/executor_base.py", line 51, in init self._init_executor() File "/usr/local/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 42, in _init_executor self.collective_rpc("load_model") File "/usr/local/lib/python3.12/site-packages/vllm/executor/uniproc_executor.py", line 51, in collective_rpc answer = run_method(self.driver_worker, method, args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/vllm/utils.py", line 2220, in run_method return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/vllm/worker/worker.py", line 183, in load_model self.model_runner.load_model() File "/usr/local/lib/python3.12/site-packages/vllm/worker/model_runner.py", line 1112, in load_model self.model = get_model(vllm_config=self.vllm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/model_loader/init.py", line 14, in get_model return loader.load_model(vllm_config=vllm_config) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py", line 1225, in load_model self._load_weights(model_config, model) File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/model_loader/loader.py", line 1135, in _load_weights loaded_weights = model.load_weights(qweight_iterator) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/models/qwen2_5_vl.py", line 1124, in load_weights return loader.load_weights(weights, mapper=self.hf_to_vllm_mapper) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 235, in load_weights autoloaded_weights = set(self._load_module("", self.module, weights)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 196, in _load_module yield from self._load_module(prefix, File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/models/utils.py", line 173, in _load_module loaded_params = module_load_weights(weights) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/vllm/model_executor/models/qwen2_5_vl.py", line 672, in load_weights loaded_weight = loaded_weight.view(3, visual_num_heads, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/torch/utils/_device.py", line 106, in torch_function return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ RuntimeError: shape '[3, 16, 80, 1280]' is invalid for input of size 2457600
[rank0]:[W207 13:58:33.655823362 ProcessGroupNCCL.cpp:1250] Warning: WARNING: process group has NOT been destroyed before we destruct ProcessGroupNCCL. On normal program exit, the application should call destroy_process_group to ensure that any pending NCCL operations have finished in this process. In rare cases this process can exit before this point and block the progress of another member of the process group. This constraint has always been present, but this warning has only been added since PyTorch 2.4 (function operator())
Feb 07 21:58:41.418
Traceback (most recent call last): File "/usr/local/bin/vllm", line 8, in
Feb 07 21:58:41.423
sys.exit(main()) ^^^^^^ File "/usr/local/lib/python3.12/site-packages/vllm/scripts.py", line 204, in main args.dispatch_function(args) File "/usr/local/lib/python3.12/site-packages/vllm/scripts.py", line 44, in serve uvloop.run(run_server(args)) File "/usr/local/lib/python3.12/site-packages/uvloop/init.py", line 109, in run return __asyncio.run( ^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/asyncio/runners.py", line 194, in run return runner.run(main) ^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/asyncio/runners.py", line 118, in run return self._loop.run_until_complete(task) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete File "/usr/local/lib/python3.12/site-packages/uvloop/init.py", line 61, in wrapper return await main ^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 875, in run_server async with build_async_engine_client(args) as engine_client: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/contextlib.py", line 210, in aenter return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 136, in build_async_engine_client async with build_async_engine_client_from_engine_args( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/contextlib.py", line 210, in aenter return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.12/site-packages/vllm/entrypoints/openai/api_server.py", line 230, in build_async_engine_client_from_engine_args raise RuntimeError( RuntimeError: Engine process failed to start. See stack trace for the root cause.
Before submitting a new issue...
Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
The text was updated successfully, but these errors were encountered:
Your current environment
I am trying to run
unsloth/Qwen2.5-VL-72B-Instruct-bnb-4bit
with 2 A100-80GB on modal.🐛 Describe the bug
vllm-flash-attn
has a bug inside vision module, so we use xformers backend instead. You can runpip install flash-attn
to use flash-attention backend.Before submitting a new issue...
The text was updated successfully, but these errors were encountered: