README中文|README_EN
This project replicates VoxelNet, a voxel-based 3D target detection algorithm based on the PaddlePaddle framework, and experiments on the KITTI data set.
The project provides pre-trained models and AiStudio online experience with NoteBook.
Paper:
- [1] Yin Zhou, Oncel Tuzel. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection
Project Reference:
-
https://github.com/qianguih/voxelnet
The repo performance: easy: 53.43 moderate:48.78 hard:48.06
Since the paper does not provide open source code, no project can be found to reproduce the metrics in the paper. Therefore, this project is based on the reference project (voxelnet-tensorflow) and the subsequent improved version of the algorithm (second) of the paper.
The results on the KITTI val dataset (50/50 split as paper) are shown in the table below。
1、When the network structure and loss function as well as most of the data processing and training configuration are the same as the original paper, the weight distribution (1:1 in the paper, 1:2 here is better after experiment) and batch size and learning rate of cls loss and loc loss are different. The achieved results are shown in the following table:
NetWork | epochs | opt | lr | batch_size | dataset | config |
---|---|---|---|---|---|---|
VoxelNet | 160 | SGD | 0.0015 | 2 * 1(card) | KITTI | config |
Car [email protected], 0.70, 0.70:
bbox AP:90.26, 86.24, 79.26
bev AP:89.92, 86.04, 79.14
3d AP:77.00, 66.40, 63.24
aos AP:38.34, 37.30, 33.19
Car [email protected], 0.50, 0.50:
bbox AP:90.26, 86.24, 79.26
bev AP:90.80, 89.84, 88.88
3d AP:90.75, 89.32, 87.84
aos AP:38.34, 37.30, 33.19
Car coco [email protected]:0.05:0.95:
bbox AP:67.72, 63.70, 61.10
bev AP:67.13, 63.44, 61.15
3d AP:53.45, 48.92, 46.34
aos AP:28.82, 27.54, 25.55
Pre-trained weights and training log:Baidu Cloud | AiStudio
2、The results are improved when the CrossEntropy loss is changed to FocalLoss and when the direction classification loss for aos is added
NetWork | epochs | opt | lr | batch_size | dataset | config |
---|---|---|---|---|---|---|
VoxelNet | 160 | SGD | 0.005 | 2 * 4(card) | KITTI | configFix |
Car [email protected], 0.70, 0.70:
bbox AP:90.19, 85.78, 79.38
bev AP:89.79, 85.26, 78.93
3d AP:81.78, 66.88, 63.51
aos AP:89.81, 84.55, 77.71
Car [email protected], 0.50, 0.50:
bbox AP:90.19, 85.78, 79.38
bev AP:96.51, 89.53, 88.59
3d AP:90.65, 89.08, 87.52
aos AP:89.81, 84.55, 77.71
Car coco [email protected]:0.05:0.95:
bbox AP:67.15, 63.05, 60.58
bev AP:68.90, 63.78, 61.08
3d AP:54.88, 49.42, 46.82
aos AP:66.89, 62.19, 59.23
Pre-trained weights and training log:Baidu Cloud | AiStudio
**In addition, this project are referred to the implementation of the Second project for the details not mentioned in the paper, **
git clone https://github.com/CuberrChen/VoxelNet.git
project structure:
VoxelNet/
├── images
├── log
├── paddleplus
│ ├── nn
│ ├── ops
│ ├── train
│ ├── __init__.py
│ ├── metrics.py
│ └── tools.py
├── README_EN.md
├── README.md
├── requirements.txt
└── voxelnet
├── builder
├── configs
├── core
├── data
├── kittiviewer
├── output
├── pypaddle
├── utils
├── __init__.py
└── create_data.py
The most suitable environment configuration:
- python version:3.7.4
- PaddlePaddle version:2.2.1
- CUDA version: NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 cuDNN:7.6
Note: Due to PaddlePaddle/cuDNN's BUG, when CUDA 10.1, batch size > 2, there is a error:
OSError: (External) CUDNN error(7), CUDNN_STATUS_MAPPING_ERROR.
[Hint: 'CUDNN_STATUS_MAPPING_ERROR'. An access to GPU memory space failed, which is usually caused by a failure to bind a texture. To correct, prior to the function call, unbind any previously bound textures. Otherwise, this may indicate an internal error/bug in the library. ] (at /paddle/paddle/fluid/operators/conv_cudnn_op.cu:758)
Therefore, if the single card environment is not CUDA 11.0 or above, the batch size in the config file can be set to 2. Subsequently, the gradient accrual is turned on by the accum_step parameter of training to increase the effect of bs. Set accum_step=8 that means bs=16, and do the initial learning rate adjustment of the corresponding config file.
- Dependency Package Installation:
cd VoxelNet/
pip install -r requirements.txt
you need to add following environment variable for numba.cuda, you can add them to ~/.bashrc:
export NUMBAPRO_CUDA_DRIVER=/usr/lib/x86_64-linux-gnu/libcuda.so
export NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so
export NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice
cd ..
export PYTHONPATH=$PYTHONPATH:`pwd`:`pwd`/VoxelNet
- Dataset preparation
Fristly, Download Official KITTI 3D Object Det or AiStudio kitti_detection and create some folders:
└── KITTI_DATASET_ROOT
├── training <-- 7481 train data
| ├── image_2 <-- for visualization
| ├── calib
| ├── label_2
| ├── velodyne
| └── velodyne_reduced <-- empty directory
└── testing <-- 7580 test data
├── image_2 <-- for visualization
├── calib
├── velodyne
└── velodyne_reduced <-- empty directory
- Create kitti infos:
cd ./VoxelNet/voxelnet
python create_data.py create_kitti_info_file --data_path=KITTI_DATASET_ROOT
- Create reduced point cloud:
python create_data.py create_reduced_point_cloud --data_path=KITTI_DATASET_ROOT
- Create groundtruth-database infos:
python create_data.py create_groundtruth_database --data_path=KITTI_DATASET_ROOT
- configs/config.py to fix config file
There is some path need to be configured in config file:
train_input_reader: {
...
database_sampler {
database_info_path: "/path/to/kitti_dbinfos_train.pkl" # 比如 /home/aistudio/data/kitti/kitti_dbinfos_train.pkl
...
}
kitti_info_path: "/path/to/kitti_infos_train.pkl"
kitti_root_path: "KITTI_DATASET_ROOT"
# 比如 kitti_info_path: "/home/aistudio/data/kitti/kitti_infos_train.pkl"
# 比如 kitti_root_path: "/home/aistudio/data/kitti"
}
...
eval_input_reader: {
...
kitti_info_path: "/path/to/kitti_infos_val.pkl"
kitti_root_path: "KITTI_DATASET_ROOT"
}
Setting Notes.
1\If the gradient accumulation option is to be turned on for training.
- The decay_steps of the learning rate is set according to the total steps corresponding to the batch size after gradient accumulation.
- train_config.steps is set according to the total steps corresponding to the initial batch size when no gradient accrual is applied.
2\The configuration file should be placed in voxelnet/configs/***.py
python ./pypaddle/train.py train --config_path=./configs/config.py --model_dir=./output
python -m paddle.distributed.launch ./pypaddle/train_mgpu.py --config_path=./configs/config.py --model_dir=./output
Note:
- The training memory is about 11G for batch size 2. You can save memory by modifying the range of post_center_limit_range Z and the size of max_number_of_voxels.
python ./pypaddle/train.py evaluate --config_path=./configs/config.py --model_dir=./output
- The detection results are saved as a result.pkl file to model_dir/eval_results/step_xxx or to the official KITTI label format if you specify --pickle_result=False.
- You can specify the pre-trained model you want to evaluate with --ckpt_path=path/***.ckpt, if not specified, the latest model will be found in the model_dir folder containing the training generated json files by default.
For example: Using the pre-trained model provided above Baidu Cloud | AiStudio .
Place the downloaded model parameters in voxelnet/output. Place pipeline.py in voxelnet/configs
python ./pypaddle/train.py evaluate --config_path=./configs/pipeline.py --model_dir=./output --ckpt_path=./output/voxelnet-73601.ckpt
details in ./pypaddle/sample_infer.py
python ./pypaddle/sample_infer.py --config_path=./configs/config.py --checkpoint_path=./output/**.ckpt --index 564
you can test pointcloud and visualize its BEV result.
retust picture:
Note:Points that are projectedoutside of image boundaries are removed(in Paper Section 3.1)),So there is no detection frame behind the body.
-
run
python ./kittiviewer/backend.py main --port=xxxx
in your server/local. -
run
cd ./kittiviewer/frontend && python -m http.server
to launch a local web server. -
open your browser and enter your frontend url (e.g. http://127.0.0.1:8000, default]).
-
input backend url (e.g. http://127.0.0.1:16666)
-
input root path, info path and det path (optional)
-
click load, loadDet (optional), input image index in center bottom of screen and press Enter.
Firstly the load button must be clicked and load successfully.
-
input checkpointPath and configPath.
-
click buildNet.
-
click inference.
You should use kitti viewer based on pyqt and pyqtgraph to check data before training.
run python ./kittiviewer/viewer.py
, check following picture to use kitti viewer:
- Kitti lidar box
A kitti lidar box is consist of 7 elements: [x, y, z, w, l, h, rz], see figure.
All training and inference code use kitti box format. So we need to convert other format to KITTI format before training.
- Kitti camera box
A kitti camera box is consist of 7 elements: [x, y, z, l, h, w, ry].
Related Information:
Information | Description |
---|---|
Author | xbchen |
Date | 2021.1 |
Framework | PaddlePaddle>=2.2.1 |
Scenarios | 3D target detection |
Hardware | GPU |
Online | Notebook |
Multi-card | Shell |
- Thanks for yan yan's second.pytorch project.
@inproceedings{Yin2018voxelnet,
author={Yin Zhou, Oncel Tuzel},
title={VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year = {2018}
}