We're going to combine Grounding-DINO with efficient SAM variants for faster annotating.
-
Install Grounded-SAM
-
Install Fast-SAM
Here's the list of Efficient SAM variants:
Title | Intro | Description | Links |
---|---|---|---|
FastSAM | ![]() |
The Fast Segment Anything Model(FastSAM) is a CNN Segment Anything Model trained by only 2% of the SA-1B dataset published by SAM authors. The FastSAM achieve a comparable performance with the SAM method at 50× higher run-time speed. | [Github] [Demo] |
MobileSAM | ![]() |
MobileSAM performs on par with the original SAM (at least visually) and keeps exactly the same pipeline as the original SAM except for a change on the image encoder. Specifically, we replace the original heavyweight ViT-H encoder (632M) with a much smaller Tiny-ViT (5M). On a single GPU, MobileSAM runs around 12ms per image: 8ms on the image encoder and 4ms on the mask decoder. | [Github] |
-
Firstly, download the pretrained Fast-SAM weight here
-
Run the demo with the following script:
cd Grounded-Segment-Anything
python EfficientSAM/grounded_fast_sam.py --model_path "./FastSAM-x.pt" --img_path "assets/demo4.jpg" --text "the black dog." --output "./output/"
- And the results will be saved in
./output/
as:
Note: Due to the post process of FastSAM, only one box can be annotated at a time, if there're multiple box prompts, we simply save multiple annotate images to ./output
now, which will be modified in the future release.
-
Firstly, download the pretrained MobileSAM weight here
-
Run the demo with the following script:
cd Grounded-Segment-Anything
python EfficientSAM/grounded_mobile_sam.py
- And the result will be saved as
./gronded_mobile_sam_anontated_image.jpg
as: