Can I inference Phi-3-vision with batch? #846

2U1 · 2024-07-24T07:41:05Z

2U1
Jul 24, 2024

Thanks for the conversion code for phi3-vision.
I'm making a app for concurrent requests that need continuous batching. Can I inference phi3-vision with batchsize larger than 1 ( I mean in onnx model)?

There's some other examples for LLM but not VLM so, I'm not sure how to make it.

baijumeswani · 2024-07-24T16:40:14Z

baijumeswani
Jul 24, 2024
Collaborator

Currently, the model does not support batching. The onnx model used behind the scenes is optimized to work with batch size 1 and will not work for batch size > 1.

Adding batch size support for such models is on our roadmap but I do not know when we will be able to prioritize this. It heavily depends on whether we are able to make the onnx model support batch size > 1.

0 replies

elephantpanda · 2024-08-29T01:12:17Z

elephantpanda
Aug 29, 2024

@2U1 Unless you have a really powerful GPU, batching probably won't speed anything up. IMO.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I inference Phi-3-vision with batch? #846

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Can I inference Phi-3-vision with batch? #846

2U1 Jul 24, 2024

Replies: 2 comments

baijumeswani Jul 24, 2024 Collaborator

elephantpanda Aug 29, 2024

2U1
Jul 24, 2024

baijumeswani
Jul 24, 2024
Collaborator

elephantpanda
Aug 29, 2024