Skip to content

Feature Request: Add support for moonshotai/Kimi-VL-A3B-Instruct #14318

Open
@dinerburger

Description

@dinerburger

Prerequisites

  • I am running the latest code. Mention the version if possible as well.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new and useful enhancement to share.

Feature Description

moonshotai recently released their moonshotai/Kimi-VL-A3B-Instruct model, with ViT provided by their home-cooked moonshotai/MoonViT-SO-400M. The architecture appears to be a combination of llava and DeepSeek V3, both of which are already supported fortunately. Probably just a matter of letting the conversion script know how to interact with the models, and also deal with the pretokenizer, which appears to be TikToken.

Motivation

At 16B with only 3B activated parameters, a quantized version of this model has potential to be run as a high quality VL model on CPU only, allowing agentic use with larger models that live on the GPUs.

Possible Implementation

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions