Skip to content
forked from jozhang97/DETA

Detection Transformers with Assignment

License

Notifications You must be signed in to change notification settings

SangbumChoi/DETA

 
 

Repository files navigation

Detection Transformers with Assignment

By Jeffrey Ouyang-Zhang, Jang Hyun Cho, Xingyi Zhou, Philipp Krähenbühl

This repository is an official implementation of the paper NMS Strikes Back.

TL; DR. Detection Transformers with Assignment (DETA) re-introduce IoU assignment and NMS for transformer-based detectors. DETA trains and tests comparibly as fast as Deformable-DETR and converges much faster (50.2 mAP in 12 epochs on COCO).

DETR's one-to-one bipartite matching Our many-to-one IoU-based assignment

Main Results

Method Epochs COCO
val AP
Total Train time
(8 GPU hours)
Batch Infer
Speed (FPS)
URL
Two-stage Deformable DETR 50 46.9 42.5 - see
DeformDETR
Improved Deformable DETR 50 49.6 66.6 13.4 config
log
model
DETA 12 50.1 16.3 12.7 config
log
model
DETA 24 51.1 32.5 12.7 config
log
model
DETA (Swin-L) 24 62.9 100 4.2 config-O365
model-O365
config
model

Note:

  1. Unless otherwise specified, the model uses ResNet-50 backbone and training (ResNet-50) is done on 8 Nvidia Quadro RTX 6000 GPU.
  2. Inference speed is measured on Nvidia Tesla V100 GPU.
  3. "Batch Infer Speed" refer to inference with batch size = 4 to maximize GPU utilization.
  4. Improved DeformableDETR implements two-stage Deformable DETR with improved hyperparameters (e.g. more queries, more feature levels, see full list here).
  5. DETA with Swin-L backbone is pretrained on Object-365 and fine-tuned on COCO. This model attains 63.5AP on COCO test-dev. Times refer to fine-tuning (O365 pre-training takes 14000 GPU hours). We additionally provide the pre-trained Object365 config and model prior to fine-tuning.

Installation

Please follow instructions from Deformable-DETR for installation, data preparation, and additional usage examples. Tested on torch1.8.0+cuda10.1 and torch1.6.0+cuda9.2 and torch1.11.0+cuda11.3

Usage

Evaluation

You can evaluate our pretrained DETA models from the above table on COCO 2017 validation set:

./configs/deta.sh --eval --coco_path ./data/coco --resume <path_to_model>

You can also run distributed evaluation:

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/deta.sh \
    --eval --coco_path ./data/coco --resume <path_to_model>

You can also run distributed evaluation on our Swin-L model:

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/deta_swin_ft.sh \
    --eval --coco_path ./data/coco --resume <path_to_model>

Training

Training on single node

Training DETA on 8 GPUs:

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./configs/deta.sh --coco_path ./data/coco

Training on slurm cluster

If you are using slurm cluster, you can simply run the following command to train on 1 node with 8 GPUs:

GPUS_PER_NODE=8 ./tools/run_dist_slurm.sh <partition> deta 8 configs/deta.sh \
    --coco_path ./data/coco

Fine-tune DETA with Swin-L on 2 nodes of each with 8 GPUs:

GPUS_PER_NODE=8 ./tools/run_dist_slurm.sh <partition> deta 16 configs/deta_swin_ft.sh \
    --coco_path ./data/coco --finetune <path_to_o365_model>

License

This project builds heavily off of Deformable-DETR and Detectron2. Please refer to their original licenses for more details. If you are using Swin-L backbone, please see Swin original license.

Citing DETA

If you find DETA useful in your research, please consider citing:

@article{ouyangzhang2022nms,
  title={NMS Strikes Back},
  author={Ouyang-Zhang, Jeffrey and Cho, Jang Hyun and Zhou, Xingyi and Kr{\"a}henb{\"u}hl, Philipp},
  journal={arXiv preprint arXiv:2212.06137},
  year={2022}
}

About

Detection Transformers with Assignment

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 78.2%
  • Cuda 18.7%
  • C++ 1.9%
  • Shell 1.2%