diff --git a/README.md b/README.md index adda39d..84aea51 100644 --- a/README.md +++ b/README.md @@ -15,22 +15,23 @@ ---- # Contents -- [1. Introduction](#1-introduction-) -- [2. Installation](#2-installation-) - * [2.1 Download Pre-trained Models](#21-download-pre-trained-models) - * [2.2 Verify Installation](#22-verify-installation) -- [3. Usage](#3-usage-) - * [3.1 Use UPN for Object Proposal Generation](#31-use-upn-for-object-proposal-generation) - * [3.2 Usage of ChatRex](#32-usage-of-chatrex) - + [3.2.1 ChatRex for Object Detection & Grounding & Referring](#321-chatrex-for-object-detection---grounding---referring) - + [3.2.2 ChatRex for Region Caption](#322-chatrex-for-region-caption) - + [3.2.3 ChatRex for Grounded Image Captioning](#323-chatrex-for-grounded-image-captioning) - + [3.2.4 ChatRex for Grounded Conversation](#324-chatrex-for-grounded-conversation) -- [4. Gradio Demos](#4-gradio-demos-) - * [4.1 Gradio Demo for UPN](#41-gradio-demo-for-upn) - * [4.2 Gradio Demo for ChatRex](#42-gradio-demo-for-chatrex) +- [Contents](#contents) +- [1. Introduction 📚](#1-introduction-) +- [2. Installation 🛠️](#2-installation-️) + - [2.1 Download Pre-trained Models](#21-download-pre-trained-models) + - [2.2 Verify Installation](#22-verify-installation) +- [3. Usage 🚀](#3-usage-) + - [3.1 Use UPN for Object Proposal Generation](#31-use-upn-for-object-proposal-generation) + - [3.2 Usage of ChatRex](#32-usage-of-chatrex) + - [3.2.1 ChatRex for Object Detection \& Grounding \& Referring](#321-chatrex-for-object-detection--grounding--referring) + - [3.2.2 ChatRex for Region Caption](#322-chatrex-for-region-caption) + - [3.2.3 ChatRex for Grounded Image Captioning](#323-chatrex-for-grounded-image-captioning) + - [3.2.4 ChatRex for Grounded Conversation](#324-chatrex-for-grounded-conversation) +- [4. Gradio Demos 🎨](#4-gradio-demos-) + - [4.1 Gradio Demo for UPN](#41-gradio-demo-for-upn) + - [4.2 Gradio Demo for ChatRex](#42-gradio-demo-for-chatrex) - [5. LICENSE](#5-license) -- [BibTeX](#bibtex-) +- [BibTeX 📚](#bibtex-) ---- @@ -604,6 +605,8 @@ The visualization of the output is like: ---- # 4. Gradio Demos 🎨 +Here are [Workflow Readme](gradio_demos/gradio.md) you can follow to run the gradio demos. + ## 4.1 Gradio Demo for UPN We provide a gradio demo for UPN to visualize the object proposals generated by UPN. You can run the following command to start the gradio demo: ```bash diff --git a/gradio_demos/gradio.md b/gradio_demos/gradio.md index 00b8e86..f1c6d64 100644 --- a/gradio_demos/gradio.md +++ b/gradio_demos/gradio.md @@ -2,19 +2,50 @@ ---- + # ChatRex Demo: Visual Prompt Interaction Guide +
+ +![Static Badge](https://img.shields.io/badge/Chat-Rex-red) [![arXiv preprint](https://img.shields.io/badge/arxiv_2403.14610-blue%3Flog%3Darxiv)](https://arxiv.org/pdf/2403.14610.pdf) [![Hits](https://hits.seeyoufarm.com/api/count/incr/badge.svg?url=https%3A%2F%2Fgithub.com%2FIDEA-Research%2FChatRex&count_bg=%2379C83D&title_bg=%23F4A6A6&icon=waze.svg&icon_color=%23E7E7E7&title=VISITORS&edge_flat=false)](https://hits.seeyoufarm.com) + +
+ +--- +# Contents +- [ChatRex Demo: Visual Prompt Interaction Guide](#chatrex-demo-visual-prompt-interaction-guide) +- [Contents](#contents) +- [1. Introduction 📖](#1-introduction-) + - [1.1. Video Demo for ChatRex](#11-video-demo-for-chatrex) +- [2. Workflow 🚀](#2-workflow-) + - [2.1. Visual Prompt Methods 🎤](#21-visual-prompt-methods-) + - [2.1.1. Interactive Visual Prompt](#211-interactive-visual-prompt) + - [2.1.2. Proposal Visual Prompt](#212-proposal-visual-prompt) + - [2.2. Question Input ❓](#22-question-input-) + - [2.2.1. Raw Question Input](#221-raw-question-input) + - [2.2.2. Pre-defined Question Templates](#222-pre-defined-question-templates) +- [3. Tips and Support 💡](#3-tips-and-support-) + +--- +# 1. Introduction 📖
- +
Welcome to the ChatRex Demo! This tool demonstrates interactive visual prompt methods for AI-powered image understanding and question answering. This document provides detailed instructions on the workflow, interface components, and how to utilize the visual prompts effectively. + +## 1.1. Video Demo for ChatRex +We also provide a gradio demo for ChatRex. Before you use, we highly recommend you to watch the following video to understand how to use this demo: + +[![Video Name](../assets/video_cover.jpg)](https://github.com/user-attachments/assets/945e192f-59e3-4c84-8615-20343378279a) + + + --- -## **Workflow** +# 2. Workflow 🚀 1. **Choose a Visual Prompt Method** - Select either `Interactive Visual Prompt` or `Proposal Visual Prompt` to define your region of interest within the image. @@ -25,11 +56,10 @@ Welcome to the ChatRex Demo! This tool demonstrates interactive visual prompt me 3. **Run the Demo** - Click on the `Run ChatRex` button to process the image and display the results, including answers and visualizations. ---- -## **Visual Prompt Methods** +## 2.1. Visual Prompt Methods 🎤 -### 1. Interactive Visual Prompt +### 2.1.1. Interactive Visual Prompt - **Overview**: This mode allows you to manually annotate regions of interest by either: - Clicking on the image to add a point, or @@ -41,35 +71,35 @@ Welcome to the ChatRex Demo! This tool demonstrates interactive visual prompt me - **Important Notes**: - Ensure that **neither** `Fine Grained Proposal` nor `Coarse Grained Proposal` checkboxes are selected when using this mode. ---- -### 2. Proposal Visual Prompt +### 2.1.2. Proposal Visual Prompt - **Overview**: This mode automatically generates bounding boxes based on the granularity of the proposal: - *Fine Grained Proposal*: Produces a detailed set of bounding boxes for smaller components (e.g., noses, eyes, or body parts). - - *Coarse Grained Proposal*: Generates fewer bounding boxes for larger objects or overall entities (e.g., a person, dog, or full figure). + - *Coarse Grained Proposal*: Generates fewer bounding boxes for larger objects or overall entities (e.g., a person, dog, or an whole entity). - **Display Visualization**: Click `Display UPN Proposal` to view the generated bounding boxes. ---- +## 2.2. Question Input ❓ -## **Question Input Options** - -### 1. Raw Question Input +### 2.2.1. Raw Question Input - Enter your question in natural language. For example: - *What objects are present in this image?* - *What is the color of the dog's collar?* + - *Who painted the sculpture?* -### 2. Pre-defined Question Templates +### 2.2.2. Pre-defined Question Templates - Select from a list of predefined templates to simplify the question input process. -- If you need to specify object categories (e.g., *dog* or *cat*), enter their names or IDs in the `` field, following the provided hints. +- If you need to specify object categories (e.g., *dog* or *cat* -> `dog,cat`), enter their names or IDs in the `` field, following the provided hints. --- -## **Tips and Support** -- If you're unsure how to interact with the application, refer to the tutorial video or browse the documented issues for additional guidance. +# 3. Tips and Support 💡 + +- If you're unsure how to interact with the application, refer to the tutorial video or browse the solved issues for additional guidance. - For any further questions or feedback, feel free to contact us through the [Issues](https://huggingface.co/IDEA-Research/ChatRex-7B/issues) page. --- +Enjoy exploring ChatRex's multimodal capabilities for seamless visual and language interaction!