You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
i would like to run infer with 2 nodes of T4 GPU (each node has 1xT4) on the dedicated compute cluster in the AML workspace.
I tried to set tps=1 and pps=2 following this doc https://docs.vllm.ai/en/stable/serving/distributed_serving.html and tps=2 and pps=1.
I set compute for AML pipeline job to allocate 2 instance, compute: name: SKU resources: instance_count: 2
And i verified in the job i got 2 instances.
However, I always got
File "/azureml-envs/azureml_c0bf6a8220ed691bbb7c96539dc2050f/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 537, in _get_executor_cls
initialize_ray_cluster(engine_config.parallel_config)
File "/azureml-envs/azureml_c0bf6a8220ed691bbb7c96539dc2050f/lib/python3.10/site-packages/vllm/executor/ray_utils.py", line 270, in initialize_ray_cluster
raise ValueError(
ValueError: The number of required GPUs exceeds the total number of available GPUs in the placement group.
It seems i need to setup engine_config.parallel_config. I use from vllm import LLM, SamplingParams and model.generate() to predict. How can i setup ?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi,
i would like to run infer with 2 nodes of T4 GPU (each node has 1xT4) on the dedicated compute cluster in the AML workspace.
I tried to set tps=1 and pps=2 following this doc https://docs.vllm.ai/en/stable/serving/distributed_serving.html and tps=2 and pps=1.
I set compute for AML pipeline job to allocate 2 instance,
compute: name: SKU resources: instance_count: 2
And i verified in the job i got 2 instances.
However, I always got
File "/azureml-envs/azureml_c0bf6a8220ed691bbb7c96539dc2050f/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 537, in _get_executor_cls
initialize_ray_cluster(engine_config.parallel_config)
File "/azureml-envs/azureml_c0bf6a8220ed691bbb7c96539dc2050f/lib/python3.10/site-packages/vllm/executor/ray_utils.py", line 270, in initialize_ray_cluster
raise ValueError(
ValueError: The number of required GPUs exceeds the total number of available GPUs in the placement group.
It seems i need to setup engine_config.parallel_config. I use
from vllm import LLM, SamplingParams
and model.generate() to predict. How can i setup ?Thanks
Beta Was this translation helpful? Give feedback.
All reactions