"Beyond Text: Frozen Large Language Models in Visual Signal Comprehension" Arxiv, 2024 Mar 12,
V2T-Tokenizer
paper code pdf note Authors: Lei Zhu, Fangyun Wei, Yanye Lu
- Task: LLM in image && codebook
- Problems
- 🏷️ Label:
用 LLM token 来表示图像,发现具有 low-level restoration 的能力 && 不需要 finetune;支持多种下游任务 caption, VQA, denoising; 学习 codebook;
Low-Level 任务给一张完全的人脸,只是移一个位置 or 旋转,输出的人脸修复很烂
LLM gains the ability not only for visual comprehension but also for image denoising and restoration in an auto-regressive fashion
ablation study 看那个模块有效,总结一下
learn what & how to apply to our task