Beyond Text: Frozen Large Language Models in Visual Signal Comprehension

"Beyond Text: Frozen Large Language Models in Visual Signal Comprehension" Arxiv, 2024 Mar 12, V2T-Tokenizer paper code pdf note Authors: Lei Zhu, Fangyun Wei, Yanye Lu

Key-point

Task: LLM in image && codebook
Problems
🏷️ Label:

用 LLM token 来表示图像，发现具有 low-level restoration 的能力 && 不需要 finetune；支持多种下游任务 caption, VQA, denoising; 学习 codebook;

Low-Level 任务给一张完全的人脸，只是移一个位置 or 旋转，输出的人脸修复很烂

LLM gains the ability not only for visual comprehension but also for image denoising and restoration in an auto-regressive fashion

Contributions

Introduction

methods

Experiment

ablation study 看那个模块有效，总结一下

Limitations

Summary 🌟

learn what & how to apply to our task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2024_03_Arxiv_Beyond-Text--Frozen-Large-Language-Models-in-Visual-Signal-Comprehension_Note.md

2024_03_Arxiv_Beyond-Text--Frozen-Large-Language-Models-in-Visual-Signal-Comprehension_Note.md

Beyond Text: Frozen Large Language Models in Visual Signal Comprehension

Key-point

Contributions

Introduction

methods

Experiment

Limitations

Summary 🌟

Files

2024_03_Arxiv_Beyond-Text--Frozen-Large-Language-Models-in-Visual-Signal-Comprehension_Note.md

Latest commit

History

2024_03_Arxiv_Beyond-Text--Frozen-Large-Language-Models-in-Visual-Signal-Comprehension_Note.md

File metadata and controls

Beyond Text: Frozen Large Language Models in Visual Signal Comprehension

Key-point

Contributions

Introduction

methods

Experiment

Limitations

Summary 🌟