Open
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
A flag for limiting the maximum input image size for vision models, resizing the images if necessary, which may help avoiding OOM issues.
Motivation
Certain vision models like Mistral Small 3.1 support large images which apparently can balloon token usage significantly (beyond the context memory used by the text model; could be a bug) in llama-server and cause OOM with resulting crash. If we could limit maximum image resolution to a specific value, it might be possible to avoid this problem.
It's worth pointing out that kobold.cpp already offers something along these lines. From the program help:
...
--visionmaxres [max px]
Clamp MMProj vision maximum allowed resolution. Allowed values are between 512 to 2048 px (default 1024).
...
Possible Implementation
No response