Can I inference Phi-3-vision with batch? #846
2U1
started this conversation in
New features / APIs
Replies: 2 comments
-
Currently, the model does not support batching. The onnx model used behind the scenes is optimized to work with batch size 1 and will not work for batch size > 1. Adding batch size support for such models is on our roadmap but I do not know when we will be able to prioritize this. It heavily depends on whether we are able to make the onnx model support batch size > 1. |
Beta Was this translation helpful? Give feedback.
0 replies
-
@2U1 Unless you have a really powerful GPU, batching probably won't speed anything up. IMO. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Thanks for the conversion code for phi3-vision.
I'm making a app for concurrent requests that need continuous batching. Can I inference phi3-vision with batchsize larger than 1 ( I mean in onnx model)?
There's some other examples for LLM but not VLM so, I'm not sure how to make it.
Beta Was this translation helpful? Give feedback.
All reactions