Exploring image search techniques using transformer models.
The demo was made with streamlit
Models :
- CLIP :
openai/clip-vit-base-patch32
- ViT :
google/vit-base-patch16-224
- Swin-v2 :
microsoft/swinv2-base-patch4-window16-256
The test images are available here
Two images have been cropped and the cropped images have been input for search in the full images dataset
Full image
Test image
Full image
Cropped image
Test image | model | hit position | similarity |
---|---|---|---|
savana scene | Swin-V2 | 1st | 0.21 |
savana scene | CLIP | 1st | 0.68 |
savana scene | ViT | 3rd | 0.15 |
home model | Swin-V2 | 5th | 0.35 |
home model | CLIP | 96th | 0.44 |
home model | ViT | 7th | 0.39 |
- Content-Based Image Retrieval (CBIR)
- Image Recognition
- Feature Extraction
- Semantic Search using trnasformers
- OpenCV
- Scikit-Image
- Pillow/PIL
- TensorFlow and PyTorch
- Elasticsearch and other search engines like typesense
- CLIP : Contrastive Language–Image Pre-training
transformers require PyTorch to be installed
pip3 install torch torchvision torchaudio