diff --git a/README.md b/README.md index ae779b5..8085391 100644 --- a/README.md +++ b/README.md @@ -37,6 +37,9 @@ # 1. Introduction 📚 **TL;DR: ChatRex is an MLLM skilled in perception that can respond to questions while simultaneously grounding its answers to the referenced objects.** + +[![Video Name](assets/teaser_cover.jpg)](https://github.com/user-attachments/assets/03d7e0af-1808-4ce8-bc67-854cf40a4972) + ChatRex is a Multimodal Large Language Model (MLLM) designed to seamlessly integrate fine-grained object perception and robust language understanding. By adopting a decoupled architecture with a retrieval-based approach for object detection and leveraging high-resolution visual inputs, ChatRex addresses key challenges in perception tasks. It is powered by the Rexverse-2M dataset with diverse image-region-text annotations. ChatRex can be applied to various scenarios requiring fine-grained perception, such as object detection, grounded conversation, grounded image captioning and region understanding. diff --git a/assets/teaser_cover.jpg b/assets/teaser_cover.jpg new file mode 100644 index 0000000..f57a0b5 Binary files /dev/null and b/assets/teaser_cover.jpg differ