This is an official implementation for PROMPT-CAM: Making Vision Transformers Interpretable for Fine-Grained Analysis (CVPR'25)
Introducing Prompt-CAM, a
Prompt CAM lets us explore:
- π§ What the model thinks is important for each class?
- β¨ Which traits are shared between two bird species?
- π¨ How different classes βseeβ the same image differently!
π Ever wondered what traits stand out when a model looks at an image of one class but searches with another class in mind? π€ Witness the important traits of different class through the lens of Prompt-CAM with our interactive demos!
π Try our demo without installing anything in Gooogle Colab
- Setup the envoiroment
- download the pre-trained model from link below!
- run the demo.
π You can extend this code base to include: New datasets and New backbones
conda create -n prompt_cam python=3.7
conda activate prompt_cam
source env_setup.sh
You can put all the data in a folder and pass the path to --data_path
argument.
The structure of data/images/
should be organized as follows:
cub/
βββ train/
β βββ 001.Black_footed_Albatross/
β β βββ image_1.jpg
β β βββ image_2.jpg
β β βββ ...
β βββ 002.Laysan_Albatross/
β β βββ image_1.jpg
β β βββ image_2.jpg
β β βββ ...
β βββ ...
βββ val/
βββ 001.Black_footed_Albatross/
β βββ image_1.jpg
β βββ image_2.jpg
β βββ ...
βββ 002.Laysan_Albatross/
β βββ image_1.jpg
Prepare CUB dataset
- Download prepared dataset
Or
Prepare the dataset by yourself- You can download the CUB dataset from the original website and put it in the
data/images/
folder. - You can use the dataset's provided train/val split to create the train/val splits and have their class numbers as the
prefix
of the respective image folder names(starting from 1). - The code will automatically create train and val annotation files in the
data/annotations/
folder for each dataset if not provided.
- You can download the CUB dataset from the original website and put it in the
To add new dataset, see Extensions
- Download from the links below and put it in the
checkpoints/{model}/{dataset}/
folder.
Backbone | Dataset | Prompt-CAM(Acc top%1) | Checkpoint Link |
---|---|---|---|
dino | cub (CUB) | 73.2 | url |
dino | car (Stanford Cars) | 83.2 | url |
dino | dog (Stanford Dogs) | 81.1 | url |
dino | pet (Oxford Pet) | 91.3 | url |
dino | birds_525 (Birds-525) | 98.8 | url |
Backbone | Dataset | Prompt-CAM(Acc top%1) | Checkpoint Link |
---|---|---|---|
dinov2 | cub (CUB) | 74.1 | url |
dinov2 | dog (Stanford Dogs) | 81.3 | url |
dinov2 | pet (Oxford Pet) | 92.7 | url |
- download the checkpoint from url in the Table above and put it in the
checkpoints/{model}/{dataset}/
folder.
For example, to visualize the attention map of the DINO model on the class 024.Red_faced_Cormorant
of CUB dataset, put the checkpoint in checkpoints/dino/cub/
folder and run the following command:
CUDA_VISIBLE_DEVICES=0 python visualize.py --config ./experiment/config/prompt_cam/dino/cub/args.yaml --checkpoint ./checkpoints/dino/cub/model.pt --vis_cls 23
- The output will be saved in the
visualization/dino/cub/class_23/
folder. - Inside the individual image folder, there will be
top_traits
heatmaps for the target class concatenated if the prediction is correct. Otherwise, all the traits will be concatenated. (the prediction is for the respective image can be foundconcatenated_prediction_{predicted_class}.jpg
).
Visualization Configuration Meaning
config
: path to the config file.checkpoint
: path to the checkpoint file.vis_cls
: class number to visualize. (default: 23)vis_attn
: set to True to visualize the attention map. (default: True)top_traits
: number of traits to visualize. (default: 4)nmbr_samples
: number of images from the `vis_cls to visualize. (default: 10)vis_outdir
: output directory. (default: visualization/)
Download the pretrained weights from the following links and put them in the pretrained_weights
folder.
- ViT-B-DINO rename it as
dino_vitbase16_pretrain.pth
- ViT-B-DINOV2 rename it as
dinov2_vitb14_pretrain.pth
See Data Preparation above.
π To train the model on the CUB dataset
using the DINO
model, run the following command:
CUDA_VISIBLE_DEVICES=0 python main.py --config ./experiment/config/prompt_cam/dino/cub/args.yaml
The checkpoint will be saved in the output/vit_base_patch16_dino/cub/
folder. Copy the checkpoint model.pt
to the checkpoints/dino/cub/
folder.
π To train the model on the Oxford Pet dataset
using the DINO
model, run the following command:
CUDA_VISIBLE_DEVICES=0 python main.py --config ./experiment/config/prompt_cam/dino/pet/args.yaml
The checkpoint will be saved in the output/vit_base_patch14_dino/pet/
folder. Copy the checkpoint model.pt
to the checkpoints/dino/pet/
folder.
π To train the model on the Oxford Pet dataset
using the DINOv2
model, run the following command:
CUDA_VISIBLE_DEVICES=0 python main.py --config ./experiment/config/prompt_cam/dinov2/pet/args.yaml
The checkpoint will be saved in the output/vit_base_patch14_dinov2/pet/
folder. Copy the checkpoint model.pt
to the checkpoints/dinov2/pet/
folder.
See Visualization above.
- Prepare dataset using above instructions.
- add a new dataset file in
/data/dataset
. Look at the existing dataset files for reference. - modify build_loader.py to include the new dataset.
- create a new config file in
experiment/config/prompt_cam/{model}/{dataset}/args.yaml
- See
experiment/config/prompt_cam/dino/cub/args.yaml
for reference and what to modify.
- See
- modify
get_base_model()
in build_model.py. - register the new backbone in vision_transformer.py by creating a new function.
- add another option in
--pretrained_weights
and--model
insetup_parser()
function of main.py to include the new backbone.
If you find this repository useful, please consider citing our work π and giving a star π :
@article{chowdhury2025prompt,
title={Prompt-CAM: A Simpler Interpretable Transformer for Fine-Grained Analysis},
author={Chowdhury, Arpita and Paul, Dipanjyoti and Mai, Zheda and Gu, Jianyang and Zhang, Ziheng and Mehrab, Kazi Sajeed and Campolongo, Elizabeth G and Rubenstein, Daniel and Stewart, Charles V and Karpatne, Anuj and others},
journal={arXiv preprint arXiv:2501.09333},
year={2025}
}
- VPT: https://github.com/KMnP/vpt
- PETL_VISION: https://github.com/OSU-MLB/PETL_Vision
Thanks for their wonderful works.
π create an issue for any contributions.