Welcome to the ChatRex Demo! This tool demonstrates interactive visual prompt methods for AI-powered image understanding and question answering. This document provides detailed instructions on the workflow, interface components, and how to utilize the visual prompts effectively.
+
+## 1.1. Video Demo for ChatRex
+We also provide a gradio demo for ChatRex. Before you use, we highly recommend you to watch the following video to understand how to use this demo:
+
+[](https://github.com/user-attachments/assets/945e192f-59e3-4c84-8615-20343378279a)
+
+
+
---
-## **Workflow**
+# 2. Workflow 🚀
1. **Choose a Visual Prompt Method**
- Select either `Interactive Visual Prompt` or `Proposal Visual Prompt` to define your region of interest within the image.
@@ -25,11 +56,10 @@ Welcome to the ChatRex Demo! This tool demonstrates interactive visual prompt me
3. **Run the Demo**
- Click on the `Run ChatRex` button to process the image and display the results, including answers and visualizations.
----
-## **Visual Prompt Methods**
+## 2.1. Visual Prompt Methods 🎤
-### 1. Interactive Visual Prompt
+### 2.1.1. Interactive Visual Prompt
- **Overview**:
This mode allows you to manually annotate regions of interest by either:
- Clicking on the image to add a point, or
@@ -41,35 +71,35 @@ Welcome to the ChatRex Demo! This tool demonstrates interactive visual prompt me
- **Important Notes**:
- Ensure that **neither** `Fine Grained Proposal` nor `Coarse Grained Proposal` checkboxes are selected when using this mode.
----
-### 2. Proposal Visual Prompt
+### 2.1.2. Proposal Visual Prompt
- **Overview**:
This mode automatically generates bounding boxes based on the granularity of the proposal:
- *Fine Grained Proposal*: Produces a detailed set of bounding boxes for smaller components (e.g., noses, eyes, or body parts).
- - *Coarse Grained Proposal*: Generates fewer bounding boxes for larger objects or overall entities (e.g., a person, dog, or full figure).
+ - *Coarse Grained Proposal*: Generates fewer bounding boxes for larger objects or overall entities (e.g., a person, dog, or an whole entity).
- **Display Visualization**:
Click `Display UPN Proposal` to view the generated bounding boxes.
----
+## 2.2. Question Input ❓
-## **Question Input Options**
-
-### 1. Raw Question Input
+### 2.2.1. Raw Question Input
- Enter your question in natural language. For example:
- *What objects are present in this image?*
- *What is the color of the dog's collar?*
+ - *Who painted the sculpture?*
-### 2. Pre-defined Question Templates
+### 2.2.2. Pre-defined Question Templates
- Select from a list of predefined templates to simplify the question input process.
-- If you need to specify object categories (e.g., *dog* or *cat*), enter their names or IDs in the `