Open
Description
Prerequisites
- I am running the latest code. Mention the version if possible as well.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new and useful enhancement to share.
Feature Description
moonshotai recently released their moonshotai/Kimi-VL-A3B-Instruct model, with ViT provided by their home-cooked moonshotai/MoonViT-SO-400M. The architecture appears to be a combination of llava and DeepSeek V3, both of which are already supported fortunately. Probably just a matter of letting the conversion script know how to interact with the models, and also deal with the pretokenizer, which appears to be TikToken.
Motivation
At 16B with only 3B activated parameters, a quantized version of this model has potential to be run as a high quality VL model on CPU only, allowing agentic use with larger models that live on the GPUs.
Possible Implementation
No response