Skip to content

Latest commit

 

History

History
107 lines (75 loc) · 5.66 KB

README.md

File metadata and controls

107 lines (75 loc) · 5.66 KB

AdaptCLIPZS

This is the code-base for the 14 dataset benchmark for zero-shot classification proposed in

Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions

Oindrila Saha, Grant Van Horn, Subhransu Maji

CVPR'24

visiongptmethod

Preparation

Create a conda environment with the specifications

conda env create -f environment.yml
conda activate adaptclipzs

Follow DATASETS.md of VDT-Adapter to download datasets and json files. Further download iNaturalist21, NABirds, CUB and Flowers102 from these specified links. Extract all images of CUB into a single folder by running:

cd <path to cub data>/images/ 
for folder in *; do; mv $folder/* ../images_extracted/.; done

Generate attributes from OpenAI GPT (optional)

We provide our generated attributes for all datasets in "gpt_descriptions" folder. The folder contains folders for every dataset named in the format <gpt_version>_<Dataset Name>. Each of the dataset folder contains text files for each class named after the classname. You can also reproduce the process by running

python generate_gpt.py --api_key <your_api_key> --dataset StanfordCars --location --im_dir <path to directory containing images of StanfordCars> --json_file <path to json file of StanfordCars from VDT-Adapter> --gpt_version gpt4_0613_api

The above command will generate attributes for the StanfordCars dataset. The same command can be used to generate descriptions for all 14 datasets by changing the dataset, im_dir and json_file arguments. You do not need to provide json_file for CUB, NABirds and iNaturalist datasets. the location argument indicicates whether you want to generate attributes pertaining to where a certain category is found. We use this for natural domains in the paper i.e. CUB, NABirds. iNaturalist21 and Flowers102.

This will save the attributes in a folders named <gpt_version>_<dataset> inside AdaptCLIPZS.

Fine-tuning CLIP

For non-natural domains run

python finetune_clip.py --dataset StanfordCars --im_dir <path to directory containing StanfordCars> --json_file <path to json file of StanfordCars from VDT-Adapter> --fewshot --arch ViT-B/16 --save_dir ./ft_clip_cars --text_dir ./gpt4_0613_api_StanfordCars

For natural domains i.e. CUB, iNaturalist, Flowers102 and NABirds run

python finetune_clip_nat.py --dataset CUB --im_dir <path to directory containing CUB> --fewshot --arch ViT-B/16 --save_dir ./ft_clip_cub --text_dir_viz ./gpt4_0613_api_CUB --text_dir_loc ./gpt4_0613_api_CUB_location

The fewshot argument indicates whether you want use 16 images per class for training or the whole dataset. You can also specify hyperparmeters including main_lr, main_wd, proj_lr, proj_wd, tau.

Testing

Following command performs evaluation for CLIPFT+A setup

python test_AdaptZS.py --dataset StanfordCars --im_dir <path to directory containing StanfordCars> --json_file <path to json file of StanfordCars from VDT-Adapter> --arch ViT-B/16 --ckpt_path <path to fine-tuned checkpoints> --text_dir ./gpt4_0613_api_StanfordCars --attributes

For testing vanilla CLIP add --vanillaCLIP argument and for testing without GPT attributes omit --attributes. For natural domains also provide path to location attributes in text_dir_loc argument.

Pre-trained Checkpoints

We provide pre-trained checkpoints for iNaturist21, NABirds and CUB datasets for both ViT-B/16 and ViT-B/32 architectures, which can be downloaded here.

You can run the following command with pre-trained checkpoints to reproduce performance testing on CUB dataset.

python test_AdaptZS.py --im_dir <path to directory containing CUB> --ckpt_path ./INaturalist21_b16.pth --text_dir ./gpt_descriptions/gpt4_0613_api_CUB/ --text_dir_loc ./gpt_descriptions/gpt4_0613_api_CUB_location/ --arch ViT-B/16 --attributes

You can modify the --ckpt_path with any of the other checkpoints making sure you provide the corresponding architecture in --arch. Following table shows the accuracies for the various checkpoints.

Model Accuracy
INaturalist21_b32.pth 54.54
INaturalist21_b16.pth 56.76
NABirds_b32.pth 55.46
NABirds_b16.pth 56.59
CUB_b32.pth 54.23
CUB_b16.pth 56.01

Citation

If you find our work useful, please consider citing:

@inproceedings{saha2024improved,
  title={Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions},
  author={Saha, Oindrila and Van Horn, Grant and Maji, Subhransu},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={17542--17552},
  year={2024}
}

Thanks to CoOP and VDT-Adapter for releasing the code base which our code is built upon.