InternImage

This repository is an official implementation of the InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions.

Paper | Blog in Chinese

News

Feb 28, 2023: InternImage is accepted to CVPR 2023!
Nov 18, 2022: 🚀 InternImage-XL merged into BEVFormer v2 achieves state-of-the-art performance of 63.4 NDS on nuScenes Camera Only.
Nov 10, 2022: 🚀🚀 InternImage-H achieves a new record 65.4 mAP on COCO detection test-dev and 62.9 mIoU on ADE20K, outperforming previous models by a large margin.

Coming soon

InternImage-H(1B)/G(3B)
Other downstream tasks.
TensorRT inference.
Classification code of the InternImage series.
InternImage-T/S/B/L/XL ImageNet-1k pretrained model.
InternImage-L/XL ImageNet-22k pretrained model.
InternImage-T/S/B/L/XL detection and instance segmentation model.
InternImage-T/S/B/L/XL semantic segmentation model.

Introduction

InternImage, initially described in arxiv, can be a general backbone for computer vision. It takes deformable convolution as the core operator to obtain large effective receptive fields, and introducing adaptive spatial aggregation to reduces the strict inductive bias. Our model makes it possible to learn more stronger and robust models with large-scale parameters from massive data.

Main Results on ImageNet with Pretrained Models

ImageNet-1K and ImageNet-22K Pretrained InternImage Models

name	pretrain	resolution	acc@1	#param	FLOPs	22K model	1K model
InternImage-T	ImageNet-1K	224x224	83.5	30M	5G	-	ckpt \| cfg
InternImage-S	ImageNet-1K	224x224	84.2	50M	8G	-	ckpt \| cfg
InternImage-B	ImageNet-1K	224x224	84.9	97M	16G	-	ckpt \| cfg
InternImage-L	ImageNet-22K	384x384	87.7	223M	108G	ckpt	ckpt \| cfg
InternImage-XL	ImageNet-22K	384x384	88.0	335M	163G	ckpt	ckpt \| cfg

Main Results on Downstream Tasks

COCO Object Detection

backbone	method	schd	box mAP	mask mAP	#param	FLOPs	Download
InternImage-T	Mask R-CNN	1x	47.2	42.5	49M	270G	ckpt \| cfg
InternImage-T	Mask R-CNN	3x	49.1	43.7	49M	270G	ckpt \| cfg
InternImage-S	Mask R-CNN	1x	47.8	43.3	69M	340G	ckpt \| cfg
InternImage-S	Mask R-CNN	3x	49.7	44.5	69M	340G	ckpt \| cfg
InternImage-B	Mask R-CNN	1x	48.8	44.0	115M	501G	ckpt \| cfg
InternImage-B	Mask R-CNN	3x	50.3	44.8	115M	501G	ckpt \| cfg
InternImage-L	Cascade	1x	54.9	47.7	277M	1399G	ckpt \| cfg
InternImage-L	Cascade	3x	56.1	48.5	277M	1399G	ckpt \| cfg
InternImage-XL	Cascade	1x	55.3	48.1	387M	1782G	ckpt \| cfg
InternImage-XL	Cascade	3x	56.2	48.8	387M	1782G	ckpt \| cfg

ADE20K Semantic Segmentation

backbone	resolution	single scale	multi scale	#param	FLOPs	Download
InternImage-T	512x512	47.9	48.1	59M	944G	ckpt \| cfg
InternImage-S	512x512	50.1	50.9	80M	1017G	ckpt \| cfg
InternImage-B	512x512	50.8	51.3	128M	1185G	ckpt \| cfg
InternImage-L	640x640	53.9	54.1	256M	2526G	ckpt \| cfg
InternImage-XL	640x640	55.0	55.3	368M	3142G	ckpt \| cfg

Main Results of FPS

name	resolution	#params	FLOPs	Batch 1 FPS(TensorRT)
InternImage-T	224x224	30M	5G	156
InternImage-S	224x224	50M	8G	129
InternImage-B	224x224	97M	16G	116
InternImage-L	384x384	223M	108G	56
InternImage-XL	384x384	335M	163G	47

Citation

If this work is helpful for your research, please consider citing the following BibTeX entry.

@article{wang2022internimage,
  title={InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions},
  author={Wang, Wenhai and Dai, Jifeng and Chen, Zhe and Huang, Zhenhang and Li, Zhiqi and Zhu, Xizhou and Hu, Xiaowei and Lu, Tong and Lu, Lewei and Li, Hongsheng and others},
  journal={arXiv preprint arXiv:2211.05778},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
classification		classification
detection		detection
figs		figs
segmentation		segmentation
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InternImage

News

Coming soon

Introduction

Main Results on ImageNet with Pretrained Models

Main Results on Downstream Tasks

Main Results of FPS

Citation

About

Releases

Packages

Languages

License

toilaluan/InternImage

Folders and files

Latest commit

History

Repository files navigation

InternImage

News

Coming soon

Introduction

Main Results on ImageNet with Pretrained Models

Main Results on Downstream Tasks

Main Results of FPS

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages