The train_net.py
script reproduces the object detection experiments on Pascal VOC and COCO.
-
Install detectron2.
$ git clone https://github.com/facebookresearch/detectron2.git $ cd detectron2 $ git checkout 3e71a2711bec $ python -m pip install -e .
This requires cuda10.2 to work.
-
Convert a pre-trained model to detectron2's format:
python3 convert-pretrain-to-detectron2.py input.pth.tar output.pkl
-
Put dataset under "./datasets" directory, following the directory structure requried by detectron2.
$ mkdir -p datasets && cd datasets $ ln -s VOC2007 . $ ln -s VOC2012 .
-
Run training:
# r50 python train_net.py --config-file configs/pascal_voc_R_50_C4_24k_moco.yaml \ --num-gpus 8 MODEL.WEIGHTS ./output.pkl # r101 python train_net.py --config-file configs/pascal_voc_R_101_C4_24k_moco.yaml \ --num-gpus 8 MODEL.WEIGHTS ./output.pkl
Or you can see dist_train.sh for the training scripts.
Below are the results on Pascal VOC 2007 test, fine-tuned on 2007+2012 trainval for 24k iterations using Faster R-CNN with a R50/R101-C4 backbone:
pretrain | AP50 | AP | AP75 |
---|---|---|---|
ImageNet-1M, R50, supervised | 81.3 | 53.5 | 58.8 |
ImageNet-1M, R50, MoCo v1, 200ep | 81.5 | 55.9 | 62.6 |
ImageNet-1M, R50, MoCo v2, 200ep | 82.4 | 57.0 | 63.6 |
ImageNet-1M, R50, MoCo v2, 800ep | 82.5 | 57.4 | 64.0 |
ImageNet-1M, R50, DenseCL, 200ep | 82.7 | 58.5 | 65.6 |
ImageNet-1M, R101, DenseCL, 200ep | 83.57 | 61.02 | 68.20 |
ImageNet-1M, R50, RegionCL-D, 200ep | 83.32 | 58.72 | 65.57 |
ImageNet-1M, R101, RegionCL-D, 200ep | 84.30 | 61.59 | 68.17 |
Note: These results are means of 5 trials. Variation on Pascal VOC is large: the std of AP50, AP, AP75 is expected to be 0.2, 0.2, 0.4 in most cases. We recommend to run 5 trials and compute means.
denseCL, r50:
82.64/58.32/64.60
82.64/58.41/64.89
denseCL, r101:
83.57/61.02/68.20
83.52/60.89/67.32
regionCL-D, r50:
83.24/58.84/65.98
83.40/58.60/65.16
regionCL-D, r101:
84.22/61.48/68.14
84.39/61.70/68.21