Name		Name	Last commit message	Last commit date
parent directory ..
configs		configs
det_heads		det_heads
det_necks		det_necks
README.md		README.md
box_ops.py		box_ops.py
coco.py		coco.py
coco_eval.py		coco_eval.py
config.py		config.py
main_multi_gpu.py		main_multi_gpu.py
main_single_gpu.py		main_single_gpu.py
model_utils.py		model_utils.py
nohup.out		nohup.out
random_erasing.py		random_erasing.py
run_eval.sh		run_eval.sh
run_eval_multi.sh		run_eval_multi.sh
run_eval_multi_s.sh		run_eval_multi_s.sh
run_train.sh		run_train.sh
swin.png		swin.png
swin_backbone.py		swin_backbone.py
swin_det.py		swin_det.py
transforms.py		transforms.py
utils.py		utils.py

README.md

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, arxiv

PaddlePaddle training/validation code and pretrained models for Swin Detection.

The official pytorch implementation is here.

This implementation is developed by PaddleViT.

Swin Model Overview

Update

Update (2021-09-15): Code is released and Mask R-CNN ported weights are uploaded.

Models Zoo

Model	backbone	box_mAP	Model
Mask R-CNN	Swin-T 1x	43.7	google/baidu(qev7)
Mask R-CNN	Swin-T 3x	46.0	google/baidu(m8fg)
Mask R-CNN	Swin-S 3x	48.4	google/baidu(hdw5)

The results are evaluated on COCO validation set.

1x/3x is the 'Lr Schd' in the official repo.

Backbone model weights can be found in Swin Transformer Classification here

Notebooks

We provide a few notebooks in aistudio to help you get started:

*(coming soon)*

Requirements

Python>=3.6
yaml>=0.2.5
PaddlePaddle>=2.1.0
yacs>=0.1.8

Data

COCO2017 dataset is used in the following folder structure:

COCO dataset folder
├── annotations
│   ├── captions_train2017.json
│   ├── captions_val2017.json
│   ├── instances_train2017.json
│   ├── instances_val2017.json
│   ├── person_keypoints_train2017.json
│   └── person_keypoints_val2017.json
├── train2017
│   ├── 000000000009.jpg
│   ├── 000000000025.jpg
│   ├── 000000000030.jpg
│   ├── 000000000034.jpg
|   ...
└── val2017
    ├── 000000000139.jpg
    ├── 000000000285.jpg
    ├── 000000000632.jpg
    ├── 000000000724.jpg
    ...

More details about the COCO dataset can be found here and COCO official dataset.

Usage

To use the model with pretrained weights, download the .pdparam weight file and change related file paths in the following python scripts. The model config files are located in ./configs/.

For example, assume the downloaded weight file is stored in ./mask_rcnn_swin_tiny_patch4_window7.pdparams, to use the swin_t_maskrcnn model in python:

from config import get_config
from swin_det import build_swin_det
# config files in ./configs/
config = get_config('./configs/swin_t_maskrcnn.yaml')
# build model
model = build_swin_det(config)
# load pretrained weights
model_state_dict = paddle.load('./mask_rcnn_swin_tiny_patch4_window7.pdparams')
model.set_dict(model_state_dict)

Evaluation

To evaluate Swin detection model performance on COCO2017 with a single GPU, run the following script using command line:

sh run_eval.sh

or

CUDA_VISIBLE_DEVICES=0 \
python main_single_gpu.py \
    -cfg=./configs/swin_t_maskrcnn.yaml \
    -dataset=coco \
    -batch_size=4 \
    -data_path=/path/to/dataset/coco/val \
    -eval \
    -pretrained=/path/to/pretrained/model/mask_rcnn_swin_tiny_patch4_window7  # .pdparams is NOT needed

Run evaluation using multi-GPUs:

sh run_eval_multi.sh

or

CUDA_VISIBLE_DEVICES=0,1,2,3 \
python main_multi_gpu.py \
    -cfg=./configs/swin_t_maskrcnn.yaml \
    -dataset=coco \
    -batch_size=4 \
    -data_path=/path/to/dataset/coco/val \
    -eval \
    -pretrained=/path/to/pretrained/model/mask_rcnn_swin_tiny_patch4_window7  # .pdparams is NOT needed

Training

To train the Swin detection model on COCO2017 with single GPU, run the following script using command line:

sh run_train.sh

or

CUDA_VISIBLE_DEVICES=1 \
python main_single_gpu.py \
    -cfg=./configs/swin_t_maskrcnn.yaml \
    -dataset=coco \
    -batch_size=2 \
    -data_path=/path/to/dataset/coco/train \
    -pretrained=/path/to/pretrained/model/swin_tiny_patch4_window7_224.pdparams  # .pdparams is NOT needed

The pretrained arguments sets the pretrained backbone weights, which can be found in Swin classification here.

Run training using multi-GPUs:

sh run_train_multi.sh

or

CUDA_VISIBLE_DEVICES=0,1,2,3 \
python main_multi_gpu.py \
    -cfg=./configs/swin_t_maskrcnn.yaml \
    -dataset=coco \
    -batch_size=2 \
    -data_path=/path/to/dataset/coco/train \
    -pretrained=/path/to/pretrained/model/swin_tiny_patch4_window7_224.pdparams  # .pdparams is NOT needed

The pretrained arguments sets the pretrained backbone weights, which can be found in Swin classification here.

Visualization

coming soon

Reference

@article{liu2021swin,
  title={Swin transformer: Hierarchical vision transformer using shifted windows},
  author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
  journal={arXiv preprint arXiv:2103.14030},
  year={2021}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Swin

Swin

README.md

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, arxiv

Update

Models Zoo

Notebooks

Requirements

Data

Usage

Evaluation

Training

Visualization

Reference

Files

Swin

Directory actions

More options

Directory actions

More options

Latest commit

History

Swin

Folders and files

parent directory

README.md

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, arxiv

Update

Models Zoo

Notebooks

Requirements

Data

Usage

Evaluation

Training

Visualization

Reference