Can vllm serving clients by using multiple model instances? #239

aoyulong · 2023-06-21T07:24:05Z

aoyulong
Jun 21, 2023

Based on the examples, vllm can launch a server with a single model instances. Can vllm serving clients by using multiple model instances? With multiple model instances, the sever will dispatch the requests to different instances to reduce the overhead.

Answered by zhuohan123

Jun 21, 2023

Right now vLLM is a serving engine for a single model. You can start multiple vLLM server replicas and use a custom load balancer (e.g., nginx load balancer). Also feel free to checkout FastChat and other multi-model frontends (e.g., aviary). vLLM can be a model worker of these libraries to support multi-replica serving.

View full answer

zhuohan123 · 2023-06-21T10:47:16Z

zhuohan123
Jun 21, 2023
Maintainer

Right now vLLM is a serving engine for a single model. You can start multiple vLLM server replicas and use a custom load balancer (e.g., nginx load balancer). Also feel free to checkout FastChat and other multi-model frontends (e.g., aviary). vLLM can be a model worker of these libraries to support multi-replica serving.

1 reply

hex-plex May 26, 2024

This is just a example of load balancer that I use hope its useful for others

Load Balancer Gist

hughesadam87 · 2024-05-06T14:00:12Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can vllm serving clients by using multiple model instances? #239

{{title}}

Replies: 2 comments 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Can vllm serving clients by using multiple model instances? #239

aoyulong Jun 21, 2023

Replies: 2 comments · 3 replies

zhuohan123 Jun 21, 2023 Maintainer

hex-plex May 26, 2024

hughesadam87 May 6, 2024

dmarx May 30, 2024

khurramkhalil Jul 12, 2024

aoyulong
Jun 21, 2023

Replies: 2 comments 3 replies

zhuohan123
Jun 21, 2023
Maintainer

hughesadam87
May 6, 2024