Skip to content

Feature Request: llama-server: a flag for limiting input image size #14216

Open
@BugReporterZ

Description

@BugReporterZ

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

A flag for limiting the maximum input image size for vision models, resizing the images if necessary, which may help avoiding OOM issues.

Motivation

Certain vision models like Mistral Small 3.1 support large images which apparently can balloon token usage significantly (beyond the context memory used by the text model; could be a bug) in llama-server and cause OOM with resulting crash. If we could limit maximum image resolution to a specific value, it might be possible to avoid this problem.

It's worth pointing out that kobold.cpp already offers something along these lines. From the program help:

...
--visionmaxres [max px]
                        Clamp MMProj vision maximum allowed resolution. Allowed values are between 512 to 2048 px (default 1024).
...

Possible Implementation

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions