how run multi-node infer on Azure ML #10715

yufang67 · 2024-11-27T15:24:08Z

yufang67
Nov 27, 2024

Hi,
i would like to run infer with 2 nodes of T4 GPU (each node has 1xT4) on the dedicated compute cluster in the AML workspace.
I tried to set tps=1 and pps=2 following this doc https://docs.vllm.ai/en/stable/serving/distributed_serving.html and tps=2 and pps=1.

I set compute for AML pipeline job to allocate 2 instance,
compute: name: SKU resources: instance_count: 2
And i verified in the job i got 2 instances.
However, I always got

File "/azureml-envs/azureml_c0bf6a8220ed691bbb7c96539dc2050f/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 537, in _get_executor_cls
initialize_ray_cluster(engine_config.parallel_config)
File "/azureml-envs/azureml_c0bf6a8220ed691bbb7c96539dc2050f/lib/python3.10/site-packages/vllm/executor/ray_utils.py", line 270, in initialize_ray_cluster
raise ValueError(
ValueError: The number of required GPUs exceeds the total number of available GPUs in the placement group.

It seems i need to setup engine_config.parallel_config. I use from vllm import LLM, SamplingParams and model.generate() to predict. How can i setup ?

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how run multi-node infer on Azure ML #10715

{{title}}

Replies: 0 comments

Select a reply

how run multi-node infer on Azure ML #10715

yufang67 Nov 27, 2024

Replies: 0 comments

yufang67
Nov 27, 2024