Skip to content

Commit

Permalink
feat: RND-112: Improve GroundingDINO (#592)
Browse files Browse the repository at this point in the history
Co-authored-by: nik <[email protected]>
Co-authored-by: Sergey Zhuk <[email protected]>
  • Loading branch information
3 people authored Aug 6, 2024
1 parent f2ab2c8 commit 94f5a42
Show file tree
Hide file tree
Showing 16 changed files with 1,012 additions and 445 deletions.
3 changes: 1 addition & 2 deletions label_studio_ml/examples/grounding_dino/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -40,10 +40,9 @@ RUN --mount=type=cache,target=${PIP_CACHE_DIR},sharing=locked \
RUN mkdir weights
WORKDIR /GroundingDINO/weights
RUN wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth
RUN wget -q https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha2/groundingdino_swinb_cogcoor.pth

WORKDIR /app
RUN wget -q https://github.com/ChaoningZhang/MobileSAM/raw/master/weights/mobile_sam.pt
RUN wget -q https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

# install test requirements if needed
COPY requirements-test.txt .
Expand Down
65 changes: 13 additions & 52 deletions label_studio_ml/examples/grounding_dino/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,7 @@ categories:
- Computer Vision
- Image Annotation
- Object Detection
- Zero-shot Image Segmentation
- Grounding DINO
- Segment Anything Model
image: "/tutorials/grounding-dino.png"
---
-->
Expand All @@ -28,7 +26,6 @@ This integration will allow you to:

* Use text prompts for zero-shot detection of objects in images.
* Specify the detection of any object and get state-of-the-art results without any model fine tuning.
* Get segmentation predictions from SAM with just text prompts.

See [here](https://github.com/IDEA-Research/GroundingDINO) for more details about the pre-trained Grounding DINO model.

Expand All @@ -52,21 +49,19 @@ See [here](https://github.com/IDEA-Research/GroundingDINO) for more details abou

```xml
<View>
<Image name="image" value="$image"/>
<Style>
<Style>
.lsf-main-content.lsf-requesting .prompt::before { content: ' loading...'; color: #808080; }
</Style>
<View className="prompt">
<TextArea name="prompt" toName="image" editable="true" rows="2" maxSubmissions="1" showSubmitButton="true"/>
</View>
<RectangleLabels name="label" toName="image">
<Label value="cats" background="yellow"/>
<Label value="house" background="blue"/>
</RectangleLabels>
<BrushLabels name="label2" toName="image">
<Label value="cats" background="yellow"/>
<Label value="house" background="blue"/>
</BrushLabels>
</Style>
<View className="prompt">
<Header value="Enter a prompt to detect objects in the image:"/>
<TextArea name="prompt" toName="image" editable="true" rows="2" maxSubmissions="1" showSubmitButton="true"/>
</View>
<Image name="image" value="$image"/>

<RectangleLabels name="label" toName="image">
<Label value="cats" background="yellow"/>
<Label value="house" background="blue"/>
</RectangleLabels>
</View>
```

Expand All @@ -92,39 +87,5 @@ deploy:
## Using GroundingSAM
Combine the Segment Anything Model with your text input to automatically generate mask predictions!
If you are looking for GroundingDINO integration with SAM, [check this example](https://github.com/HumanSignal/label-studio-ml-backend/tree/master/label_studio_ml/examples/grounding_sam).
To do this, set `USE_SAM=true` before running.

> Warning: Using GroundingSAM without a GPU may result in slow performance and is not recommended. If you must use a CPU-only machine, and experience slow performance or don't see any predictions on the labeling screen, consider one of the following:
> - Increase memory allocated to the Docker container (e.g. `memory: 16G` in `docker-compose.yml`)
> - Increase the prediction timeout on Label Studio instance with the `ML_TIMEOUT_PREDICT=100` environment variable.
> - Use "MobileSAM" as a lightweight alternative to "SAM".

If you want to use a [more efficient version of SAM](https://github.com/ChaoningZhang/MobileSAM), set `USE_MOBILE_SAM=true`.


## Batching inputs

https://github.com/HumanSignal/label-studio-ml-backend/assets/106922533/79b788e3-9147-47c0-90db-0404066ee43f

> Note: This is an experimental feature.

1. Clone the Label Studio feature branch that includes the experimental batching functionality.

`git clone -b feature/dino-support https://github.com/HumanSignal/label-studio.git`

2. Run this branch with `docker compose up`
3. Do steps 2-5 from the [quickstart section](#quickstart), now using access code and host IP info of the newly cloned Label Studio branch. GroundingSAM is supported.
4. Go to the Data Manager in your project and select the tasks you would like to annotate.
5. Select **Actions > Add Text Prompt for GroundingDINO**.
6. Enter the prompt you would like to retrieve predictions for and click **Submit**.

> Note: If your prompt is different from the label values you have assigned, you can use the underscore to give the correct label values to your prompt outputs. For example, if you wanted to select all brown cats but still give them the label value "cats" from your labeling config, your prompt would be "brown cat_cats".


## Other environment variables

Adjust `BOX_THRESHOLD` and `TEXT_THRESHOLD` values in the Dockerfile to a number between 0 to 1 if experimenting. Defaults are set in `dino.py`. For more information about these values, [click here](https://github.com/IDEA-Research/GroundingDINO#star-explanationstips-for-grounding-dino-inputs-and-outputs).

If you want to use SAM models saved from either directories, you can use the `MOBILESAM_CHECKPOINT` and `SAM_CHECKPOINT` as shown in the Dockerfile.
10 changes: 5 additions & 5 deletions label_studio_ml/examples/grounding_dino/_wsgi.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
})

from label_studio_ml.api import init_app
from dino import DINOBackend
from dino import GroundingDINO


_DEFAULT_CONFIG_PATH = os.path.join(os.path.dirname(__file__), 'config.json')
Expand Down Expand Up @@ -102,13 +102,13 @@ def parse_kwargs():
kwargs.update(parse_kwargs())

if args.check:
print('Check "' + DINOBackend.__name__ + '" instance creation..')
model = DINOBackend(**kwargs)
print('Check "' + GroundingDINO.__name__ + '" instance creation..')
model = GroundingDINO(**kwargs)

app = init_app(model_class=DINOBackend)
app = init_app(model_class=GroundingDINO)

app.run(host=args.host, port=args.port, debug=args.debug)

else:
# for uWSGI use
app = init_app(model_class=DINOBackend)
app = init_app(model_class=GroundingDINO)
Loading

0 comments on commit 94f5a42

Please sign in to comment.