Skip to content

Latest commit

 

History

History
38 lines (20 loc) · 1.38 KB

2024_03_Arxiv_Beyond-Text--Frozen-Large-Language-Models-in-Visual-Signal-Comprehension_Note.md

File metadata and controls

38 lines (20 loc) · 1.38 KB

Beyond Text: Frozen Large Language Models in Visual Signal Comprehension

"Beyond Text: Frozen Large Language Models in Visual Signal Comprehension" Arxiv, 2024 Mar 12, V2T-Tokenizer paper code pdf note Authors: Lei Zhu, Fangyun Wei, Yanye Lu

Key-point

  • Task: LLM in image && codebook
  • Problems
  • 🏷️ Label:

用 LLM token 来表示图像,发现具有 low-level restoration 的能力 && 不需要 finetune;支持多种下游任务 caption, VQA, denoising; 学习 codebook;

Low-Level 任务给一张完全的人脸,只是移一个位置 or 旋转,输出的人脸修复很烂

LLM gains the ability not only for visual comprehension but also for image denoising and restoration in an auto-regressive fashion

V2T-Tokenizer_overview.png

Contributions

Introduction

methods

Experiment

ablation study 看那个模块有效,总结一下

Limitations

Summary 🌟

learn what & how to apply to our task