In this repository, you can find the code we used to evaluate these models: open_clip CLIP models, the official OpenAI CLIP models, CyCLIP, FLAVA and ALBEF on compositional reasoning in our paper CREPE: Can Vision-Language Foundation Models Reason Compositionally?.
In crepe_eval_utils.py
, you can find common evaluation util functions, and you will need to replace vg_image_paths
with the path to Visual Genome images on your machine. The VG images can be downloaded here.
We evaluated all models on an NVIDIA TITAN X GPU with a CUDA version of 11.4.
You will need to install the packages required to use open_clip here.
You can download the pretrained CLIP models and replace --model-dir
with your own model checkpoint directory path in crepe_compo_eval_open_clip.py
. (You can also modify the code to use open_clip's
pretrained model interface.)
To evaluate all models reported in our paper, simply run:
python -m crepe_compo_eval_open_clip --compo-type <compositionality_type> --hard-neg-types <negative_type_1> <negative_type_2> --input-dir <path_to_crepe/crepe/syst_hard_negatives> --output-dir <log_directory>
where the valid compositionality types are systematicity
and productivity
. The valid negative types are atom
, comp
and combined
(atom
+comp
) for systematicity, and atom
, swap
and negate
for productivity.
To evaluate other pretrained models, simply modify the --train-dataset
argument and/or the DATA2MODEL
variable in crepe_compo_eval_open_clip.py
.
Note that the systematicity eval set should only be used to evaluate models pretrained on CC12M, YFCC15M or LAION400M.
For each model, you will need to clone the model's official repository, set up
an environment according to its instructions and place the files crepe_prod_eval_<model>.py
and crepe_eval_utils.py
to their relevant locations. In crepe_params.py
, you will need to replace --input-dir
with your own directory path to CREPE's productivity hard negatives test set.
Clone the CLIP repository here and place crepe_prod_eval_clip.py
and crepe_eval_utils.py
on the top level of the repository. To evaluate models, simply run:
python -m crepe_prod_eval_clip --model-name <model_name> --hard-neg-types <negative_type> --output-dir <log_directory>
where the valid negative types are atom
, swap
and negate
, and model names are RN50
, RN101
, ViT-B/32
, ViT-B/16
and ViT-L/14
.
Clone the CyCLIP repository here, place crepe_prod_eval_cyclip.py
and crepe_eval_utils.py
on the top level of the repository and download the
model checkpoint under the folder cyclip.pt
(accessible from the bottom of the
repository's README). To evaluate models, simply run:
python -m crepe_prod_eval_cyclip --hard-neg-types <negative_type> --output-dir <log_directory>
Clone the FLAVA repository here and copy crepe_prod_eval_flava.py
and crepe_eval_utils.py
into the folder examples/flava/
. To evaluate models, simply run:
python -m crepe_prod_eval_flava --hard-neg-types <negative_type> --output-dir <log_directory>
Clone the ALBEF repository here,
copy crepe_prod_eval_albef.py
and crepe_eval_utils.py
to the top level of the repository
and download the pretrained checkpoint marked '14M' from the repository. To evaluate models, simply run:
python -m crepe_prod_eval_albef --hard-neg-types <negative_type> --output-dir <log_directory>
If you find our work helpful, please cite us:
@article{ma2023crepe,
title={CREPE: Can Vision-Language Foundation Models Reason Compositionally?},
author={Zixian Ma and Jerry Hong and Mustafa Omer Gul and Mona Gandhi and Irena Gao and Ranjay Krishna},
year={2023},
journal={arXiv preprint arXiv:2212.07796},
}