FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
1Harbin Institute of Technology, Shenzhen
2Huawei Noah's Ark Lab
†Corresponding author
- [01/2025] Arxiv paper released.
This is the github repository of FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers. In this work, we propose the FALCON model, which introduces a novel visual register technique to simultaneously address the issues of visual redundancy and fragmentation in the high-resolution visual encoding of MLLMs.
The framework of the proposed FALCON model:
If you find this work useful for your research, please kindly cite our paper:
@misc{zhang2025falcon,
title={FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers},
author={Renshan Zhang and Rui Shao and Gongwei Chen and Kaiwen Zhou and Weili Guan and Liqiang Nie},
year={2025},
eprint={2501.16297},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2501.16297},
}