feat: RND-112: Improve GroundingDINO (#592)

Co-authored-by: nik <[email protected]> Co-authored-by: Sergey Zhuk <[email protected]>
HumanSignal · Aug 6, 2024 · 94f5a42 · 94f5a42
1 parent f2ab2c8
commit 94f5a42
Show file tree

Hide file tree

Showing 16 changed files with 1,012 additions and 445 deletions.
diff --git a/label_studio_ml/examples/grounding_dino/Dockerfile b/label_studio_ml/examples/grounding_dino/Dockerfile
@@ -40,10 +40,9 @@ RUN --mount=type=cache,target=${PIP_CACHE_DIR},sharing=locked \
 RUN mkdir weights
 WORKDIR /GroundingDINO/weights
 RUN wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
+RUN wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha2/groundingdino_swinb_cogcoor.pth
 
 WORKDIR /app
-RUN wget -q https://github.com/ChaoningZhang/MobileSAM/raw/master/weights/mobile_sam.pt
-RUN wget -q https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
 
 # install test requirements if needed
 COPY requirements-test.txt .

diff --git a/label_studio_ml/examples/grounding_dino/README.md b/label_studio_ml/examples/grounding_dino/README.md
@@ -12,9 +12,7 @@ categories:
     - Computer Vision
     - Image Annotation
     - Object Detection
-    - Zero-shot Image Segmentation
     - Grounding DINO
-    - Segment Anything Model
 image: "/tutorials/grounding-dino.png"
 ---
 -->
@@ -28,7 +26,6 @@ This integration will allow you to:
 
 * Use text prompts for zero-shot detection of objects in images.
 * Specify the detection of any object and get state-of-the-art results without any model fine tuning.
-* Get segmentation predictions from SAM with just text prompts.
 
 See [here](https://github.com/IDEA-Research/GroundingDINO) for more details about the pre-trained Grounding DINO model. 
 
@@ -52,21 +49,19 @@ See [here](https://github.com/IDEA-Research/GroundingDINO) for more details abou
 
 ```xml
 <View>
-  <Image name="image" value="$image"/>
-  <Style>
+    <Style>
     .lsf-main-content.lsf-requesting .prompt::before { content: ' loading...'; color: #808080; }
-  </Style>
-  <View className="prompt">
-  <TextArea name="prompt" toName="image" editable="true" rows="2" maxSubmissions="1" showSubmitButton="true"/>
-  </View>
-  <RectangleLabels name="label" toName="image">
-    <Label value="cats" background="yellow"/>
-    <Label value="house" background="blue"/>
-  </RectangleLabels>
-  <BrushLabels name="label2" toName="image">
-    <Label value="cats" background="yellow"/>
-    <Label value="house" background="blue"/>
-  </BrushLabels>
+    </Style>
+    <View className="prompt">
+        <Header value="Enter a prompt to detect objects in the image:"/>
+    <TextArea name="prompt" toName="image" editable="true" rows="2" maxSubmissions="1" showSubmitButton="true"/>
+    </View>
+    <Image name="image" value="$image"/>
+
+    <RectangleLabels name="label" toName="image">
+        <Label value="cats" background="yellow"/>
+        <Label value="house" background="blue"/>
+    </RectangleLabels>
 </View>
 ```
 
@@ -92,39 +87,5 @@ deploy:
 
 ## Using GroundingSAM
 
-Combine the Segment Anything Model with your text input to automatically generate mask predictions! 
+If you are looking for GroundingDINO integration with SAM, [check this example](https://github.com/HumanSignal/label-studio-ml-backend/tree/master/label_studio_ml/examples/grounding_sam).
 
-To do this, set `USE_SAM=true` before running. 
-
-> Warning: Using GroundingSAM without a GPU may result in slow performance and is not recommended. If you must use a CPU-only machine, and experience slow performance or don't see any predictions on the labeling screen, consider one of the following:
-> - Increase memory allocated to the Docker container (e.g. `memory: 16G` in `docker-compose.yml`)
-> - Increase the prediction timeout on Label Studio instance with the `ML_TIMEOUT_PREDICT=100` environment variable.
-> - Use "MobileSAM" as a lightweight alternative to "SAM".
-
-If you want to use a [more efficient version of SAM](https://github.com/ChaoningZhang/MobileSAM), set `USE_MOBILE_SAM=true`.
-
-
-## Batching inputs
-
-https://github.com/HumanSignal/label-studio-ml-backend/assets/106922533/79b788e3-9147-47c0-90db-0404066ee43f
-
-> Note: This is an experimental feature.
-
-1. Clone the Label Studio feature branch that includes the experimental batching functionality.
-
-    `git clone -b feature/dino-support https://github.com/HumanSignal/label-studio.git`
-
-2. Run this branch with `docker compose up`
-3. Do steps 2-5 from the [quickstart section](#quickstart), now using access code and host IP info of the newly cloned Label Studio branch. GroundingSAM is supported.
-4. Go to the Data Manager in your project and select the tasks you would like to annotate.
-5. Select **Actions > Add Text Prompt for GroundingDINO**.
-6. Enter the prompt you would like to retrieve predictions for and click **Submit**.
-
-> Note: If your prompt is different from the label values you have assigned, you can use the underscore to give the correct label values to your prompt outputs. For example, if you wanted to select all brown cats but still give them the label value "cats" from your labeling config, your prompt would be "brown cat_cats".
-
-
-## Other environment variables
-
-Adjust `BOX_THRESHOLD` and `TEXT_THRESHOLD` values in the Dockerfile to a number between 0 to 1 if experimenting. Defaults are set in `dino.py`. For more information about these values, [click here](https://github.com/IDEA-Research/GroundingDINO#star-explanationstips-for-grounding-dino-inputs-and-outputs).
-
-If you want to use SAM models saved from either directories, you can use the `MOBILESAM_CHECKPOINT` and `SAM_CHECKPOINT` as shown in the Dockerfile.
diff --git a/label_studio_ml/examples/grounding_dino/_wsgi.py b/label_studio_ml/examples/grounding_dino/_wsgi.py
@@ -29,7 +29,7 @@
 })
 
 from label_studio_ml.api import init_app
-from dino import DINOBackend
+from dino import GroundingDINO
 
 
 _DEFAULT_CONFIG_PATH = os.path.join(os.path.dirname(__file__), 'config.json')
@@ -102,13 +102,13 @@ def parse_kwargs():
         kwargs.update(parse_kwargs())
 
     if args.check:
-        print('Check "' + DINOBackend.__name__ + '" instance creation..')
-        model = DINOBackend(**kwargs)
+        print('Check "' + GroundingDINO.__name__ + '" instance creation..')
+        model = GroundingDINO(**kwargs)
 
-    app = init_app(model_class=DINOBackend)
+    app = init_app(model_class=GroundingDINO)
 
     app.run(host=args.host, port=args.port, debug=args.debug)
 
 else:
     # for uWSGI use
-    app = init_app(model_class=DINOBackend)
+    app = init_app(model_class=GroundingDINO)