Skip to content

Latest commit

 

History

History
58 lines (45 loc) · 4.49 KB

README.md

File metadata and controls

58 lines (45 loc) · 4.49 KB

Ziyan Yang, Kushal Kafle, Zhe Lin, Scott Cohen, Zhihong Ding, Vicente Ordonez

If you have any questions, you can email [email protected]

Abstract

We propose Subject-Conditional Relation Detection SCoRD, where conditioned on an input subject, the goal is to predict all its relations to other objects in a scene along with their locations. Based on the Open Images dataset, we propose a challenging OIv6-SCoRD benchmark such that the training and testing splits have a distribution shift in terms of the occurrence statistics of <subject, relation, object> triplets. To solve this problem, we propose an auto-regressive model that given a subject, it predicts its relations, objects, and object locations by casting this output as a sequence of tokens. First, we show that previous scene-graph prediction methods fail to produce as exhaustive an enumeration of relation-object pairs when conditioned on a subject on this benchmark. Particularly, we obtain a recall@3 of 83.8% for our relation-object predictions compared to the 49.75% obtained by a recent scene graph detector. Then, we show improved generalization on both relation-object and object-box predictions by leveraging during training relation-object pairs obtained automatically from textual captions and for which no object-box annotations are available. Particularly, for <subject, relation, object> triplets for which no object locations are available during training, we are able to obtain a recall@3 of 33.80% for relation-object pairs and 26.75% for their box locations.

Install

Please follow ALBEF to install the required packages.

Data

Download the training and testing splits here. To download images:

Checkpoint

Download the checkpoint for the removing 50% experiment here.

Evaluation

First, run this command to generate <relation, object, object location> triples:

# start and end indices indicate the index of your target checkpoint in the checkpoint folder. If you only have one checkpoint in the folder, the start flag should be 0 and the end flag should be 1
# chunk size indicates how many batches of evaluation samples should be processed
CUDA_VISIBLE_DEVICES=0 python results_generation.py --root your_checkpoint_folder --start 0 --end 1 --chunk 0 --num_seq 3 --num_beams 5 --chunk_size 100 --round 2

Then, run this command to get evaluation results:

python evaluate_results.py --results_folder your_checkpoint_folder/oidv6_results/  --report_unseen True --topk 3

Training

First, download the pre-trained checkpoint from PEVL: Run:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -m torch.distributed.launch --nproc_per_node=8 --master_port=12888 --use_env run_relation_train.py --config configs/relation_grounding.yaml --output_dir your_checkpoint_folder --checkpoint pevl_pretrain.pth

Acknowledgement

We would like to thank ALBEF and PEVL. Their released codebases help a lot in this project.

Citing

If you think this work is interesting, please consider to cite it:

@inproceedings{yang2024scord,
  title={SCoRD: Subject-Conditional Relation Detection with Text-Augmented Data},
  author={Yang, Ziyan and Kafle, Kushal and Lin, Zhe and Cohen, Scott and Ding, Zhihong and Ordonez, Vicente},
  booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
  pages={5731--5741},
  year={2024}
}