This is the code-base for the 14 dataset benchmark for zero-shot classification proposed in
Oindrila Saha, Grant Van Horn, Subhransu Maji
CVPR'24
Create a conda environment with the specifications
conda env create -f environment.yml
conda activate adaptclipzs
Follow DATASETS.md of VDT-Adapter to download datasets and json files. Further download iNaturalist21, NABirds, CUB and Flowers102 from these specified links. Extract all images of CUB into a single folder by running:
cd <path to cub data>/images/
for folder in *; do; mv $folder/* ../images_extracted/.; done
We provide our generated attributes for all datasets in "gpt_descriptions" folder. The folder contains folders for every dataset named in the format <gpt_version>_<Dataset Name>
. Each of the dataset folder contains text files for each class named after the classname. You can also reproduce the process by running
python generate_gpt.py --api_key <your_api_key> --dataset StanfordCars --location --im_dir <path to directory containing images of StanfordCars> --json_file <path to json file of StanfordCars from VDT-Adapter> --gpt_version gpt4_0613_api
The above command will generate attributes for the StanfordCars dataset. The same command can be used to generate descriptions for all 14 datasets by changing the dataset, im_dir and json_file arguments. You do not need to provide json_file for CUB, NABirds and iNaturalist datasets. the location argument indicicates whether you want to generate attributes pertaining to where a certain category is found. We use this for natural domains in the paper i.e. CUB, NABirds. iNaturalist21 and Flowers102.
This will save the attributes in a folders named <gpt_version>_<dataset>
inside AdaptCLIPZS.
For non-natural domains run
python finetune_clip.py --dataset StanfordCars --im_dir <path to directory containing StanfordCars> --json_file <path to json file of StanfordCars from VDT-Adapter> --fewshot --arch ViT-B/16 --save_dir ./ft_clip_cars --text_dir ./gpt4_0613_api_StanfordCars
For natural domains i.e. CUB, iNaturalist, Flowers102 and NABirds run
python finetune_clip_nat.py --dataset CUB --im_dir <path to directory containing CUB> --fewshot --arch ViT-B/16 --save_dir ./ft_clip_cub --text_dir_viz ./gpt4_0613_api_CUB --text_dir_loc ./gpt4_0613_api_CUB_location
The fewshot argument indicates whether you want use 16 images per class for training or the whole dataset. You can also specify hyperparmeters including main_lr, main_wd, proj_lr, proj_wd, tau
.
Following command performs evaluation for CLIPFT+A setup
python test_AdaptZS.py --dataset StanfordCars --im_dir <path to directory containing StanfordCars> --json_file <path to json file of StanfordCars from VDT-Adapter> --arch ViT-B/16 --ckpt_path <path to fine-tuned checkpoints> --text_dir ./gpt4_0613_api_StanfordCars --attributes
For testing vanilla CLIP add --vanillaCLIP argument and for testing without GPT attributes omit --attributes. For natural domains also provide path to location attributes in text_dir_loc argument.
We provide pre-trained checkpoints for iNaturist21, NABirds and CUB datasets for both ViT-B/16 and ViT-B/32 architectures, which can be downloaded here.
You can run the following command with pre-trained checkpoints to reproduce performance testing on CUB dataset.
python test_AdaptZS.py --im_dir <path to directory containing CUB> --ckpt_path ./INaturalist21_b16.pth --text_dir ./gpt_descriptions/gpt4_0613_api_CUB/ --text_dir_loc ./gpt_descriptions/gpt4_0613_api_CUB_location/ --arch ViT-B/16 --attributes
You can modify the --ckpt_path with any of the other checkpoints making sure you provide the corresponding architecture in --arch. Following table shows the accuracies for the various checkpoints.
Model | Accuracy |
---|---|
INaturalist21_b32.pth | 54.54 |
INaturalist21_b16.pth | 56.76 |
NABirds_b32.pth | 55.46 |
NABirds_b16.pth | 56.59 |
CUB_b32.pth | 54.23 |
CUB_b16.pth | 56.01 |
If you find our work useful, please consider citing:
@inproceedings{saha2024improved,
title={Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions},
author={Saha, Oindrila and Van Horn, Grant and Maji, Subhransu},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={17542--17552},
year={2024}
}
Thanks to CoOP and VDT-Adapter for releasing the code base which our code is built upon.