code release

xtk8532704 · Mar 9, 2023 · 8ea1ddd · 8ea1ddd
1 parent ff41783
commit 8ea1ddd
Show file tree

Hide file tree

Showing 53 changed files with 8,742 additions and 11 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,144 @@
+# Byte-compiled / optimized / DLL files
+__pycache__/
+*.py[cod]
+*$py.class
+*.ipynb
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+tmp/
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+.installed.cfg
+*.egg
+MANIFEST
+
+mmdetection3d/
+mmdetection3d
+mmdet3d
+
+# PyInstaller
+#  Usually these files are written by a python script from a template
+#  before PyInstaller builds the exe, so as to inject date/other infos into it.
+*.manifest
+*.spec
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+
+# Unit test / coverage reports
+htmlcov/
+.tox/
+.coverage
+.coverage.*
+.cache
+hostfile.txt
+nosetests.xml
+coverage.xml
+*.cover
+.hypothesis/
+.pytest_cache/
+
+# Translations
+*.mo
+*.pot
+
+# Django stuff:
+*.log
+local_settings.py
+db.sqlite3
+
+# Flask stuff:
+instance/
+.webassets-cache
+
+# Scrapy stuff:
+.scrapy
+
+# Sphinx documentation
+docs/_build/
+
+# PyBuilder
+target/
+
+# Jupyter Notebook
+.ipynb_checkpoints
+
+# pyenv
+.python-version
+
+# celery beat schedule file
+celerybeat-schedule
+
+# SageMath parsed files
+*.sage.py
+
+# Environments
+.env
+.venv
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
+
+# Spyder project settings
+.spyderproject
+.spyproject
+
+# Rope project settings
+.ropeproject
+
+# mkdocs documentation
+/site
+
+# mypy
+.mypy_cache/
+
+# cython generated cpp
+data
+ckpts
+.vscode
+.idea
+
+# custom
+nuscenes_gt_database
+nuscenes_unified_gt_database
+work_dirs
+*.pkl
+*.pkl.json
+*.log.json
+work_dirs/
+exps/
+*~
+mmdet3d/.mim
+
+# Pytorch
+*.pth
+
+# demo
+# *.jpg
+# *.png
+data/s3dis/Stanford3dDataset_v1.2_Aligned_Version/
+data/scannet/scans/
+data/sunrgbd/OFFICIAL_SUNRGBD/
+*.obj
+*.ply
+
+# Waymo evaluation
+mmdet3d/core/evaluation/waymo_utils/compute_detection_metrics_main
diff --git a/README.md b/README.md
@@ -1,5 +1,5 @@
 
-# Cross Modal Transformer via Coordinates Encoding for 3D Object Dectection
+# Cross Modal Transformer: Towards Fast and Robust 3D Object Detection
 [![arXiv](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/pdf/2301.01283.pdf)
 ![visitors](https://visitor-badge.glitch.me/badge?page_id=junjie18/CMT)
 <!-- ## Introduction -->
@@ -9,31 +9,36 @@ https://user-images.githubusercontent.com/18145538/210828888-a944817a-858f-45ef-
 This repository is an official implementation of [CMT](https://arxiv.org/pdf/2301.01283.pdf).
 
 <div align="center">
-  <img src="figs/overview.png"/>
+  <img src="figs/cmt_fps.pdf"/>
 </div><br/>
 
-CMT is a robust 3D detector for end-to-end 3D multi-modal detection. A DETR-like framework is designed for multi-modal detection(CMT) and lidar-only detection(CMT-L), which obtains **73.5%** and **70.1%** NDS separately on nuScenes benchmark.
+CMT is a robust 3D detector for end-to-end 3D multi-modal detection. A DETR-like framework is designed for multi-modal detection(CMT) and lidar-only detection(CMT-L), which obtains **74.1%**(SoTA among all single models) and **70.1%** NDS separately on nuScenes benchmark.
 Without explicit view transformation, CMT takes the image and point clouds tokens as inputs and directly outputs accurate 3D bounding boxes. CMT can be a strong baseline for further research.
 
 
 ## Preparation
 
 * Environments  
-Python == 3.8, CUDA == 11.1, pytorch == 1.9.0, mmdet3d == 1.0.0rc5   
+Python == 3.8 \
+CUDA == 11.1 \
+pytorch == 1.9.0 \
+mmdet3d == 1.0.0rc5 \
+spconv-cu111 == 2.1.21 \
+[flash-attn](https://github.com/HazyResearch/flash-attention)
 
 * Data   
-Follow the mmdet3d to process the nuScenes dataset (https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/data_preparation.md).
+Follow the [mmdet3d](https://github.com/open-mmlab/mmdetection3d/blob/master/docs/en/data_preparation.md) to process the nuScenes dataset.
 
 
 ## Main Results
-We provide some results on nuScenes **val set**. The default batch size is 2 on each GPU.
+Results on nuScenes **val set**. The default batch size is 2 on each GPU. The FPS are all evaluated with a single Tesla A100 GPU.
 
-| config            | mAP      | NDS     | GPU | schedule| time    | 
+| Config            |Modality| mAP      | NDS     | Schedule|Inference FPS|
 |:--------:|:----------:|:---------:|:--------:|:--------:|:--------:|
-| CMT-pillar0200-r50-704x256 | 53.8%     | 58.5%    | 8 x 2080ti | 20 epoch| 13 hours  |  
-| CMT-voxel0100-r50-800x320 | 60.1%     | 63.4%    | 8 x 2080ti | 20 epoch| 14 hours   |    
-| CMT-voxel0075-vov-1600x640  | 69.4%     | 71.9%    | 8 x A100 | 15e+5e(with cbgs) | 45 hours  |    
-
+| [vov_1600x640](./projects/configs/camera/cmt_camera_vov_1600x640_cbgs.py) |C| 40.6% | 46.0%  | 20e| 
+| [voxel0075](./projects/configs/lidar/cmt_lidar_voxel0075_cbgs.py) |L| 62.14 | 68.6%    | 15e+5e |   
+| [voxel0100_r50_800x320](./projects/configs/fusion/cmt_voxel0100_r50_800x320_cbgs.py)  |C+L| 67.9%     | 70.8%    | 15e+5e |
+| [voxel0075_vov_1600x640](./projects/configs/fusion/cmt_voxel0075_vov_1600x640_cbgs.py)  |C+L| 70.3% | 72.9%    | 15e+5e |
 ## Citation
 If you find CMT helpful in your research, please consider citing: 
 ```bibtex   

diff --git a/figs/cmt_fps.png b/figs/cmt_fps.png
diff --git a/figs/cmt_robust.png b/figs/cmt_robust.png