PETRv1, PETRv2 is designed for multi-view 3D object detection task. PETR stands for "position embedding transformation" and encodes the position information into image features thus able to produce position-aware features. For more detail, please refer to:
- PETRv1: [ECCV2022] PETR: Position Embedding Transformation for Multi-View 3D Object Detection arxiv
- PETRv2: [ICCV2023] PETRv2: A Unified Framework for 3D Perception from Multi-Camera Images arxiv
In this demo, we will use PETR-vov-p4-800x320 for PETRv1 and PETRv2-vov-p4-800x320 for PETRv2 as our deployment targets.
Method | Backbone precision |
Head precision |
Framework | mAP | Latency(ms) |
---|---|---|---|---|---|
PETR-vov-p4-800x320 | fp16 | fp32 | PyTorch | 0.3778 | - |
PETR-vov-p4-800x320 | fp16 | fp16 | TensorRT | 0.3774 | 52.37 (On Orin) |
PETRv2-vov-p4-800x320 | fp16 | fp32 | PyTorch | 0.4106 | - |
PETRv2-vov-p4-800x320 | fp16 | fp16 | TensorRT | 0.4101 | 55.77 (On Orin) |
cd /workspace
git clone https://github.com/NVIDIA/DL4AGX.git
cd DL4AGX
git submodule update --init --recursive
cd /workspace
git clone https://github.com/megvii-research/PETR
cd PETR
git apply /workspace/DL4AGX/AV-Solutions/petr-trt/patch.diff
git clone https://github.com/open-mmlab/mmdetection3d.git -b v0.17.1
Please follow the instructions in the official repo (install.md, prepare-dataset.md) to setup the environment for PyTorch inference first. Since there are some api changes in newer mmcv/mmdet, we adjust the original configs. You may find those minor changes in patch.diff.
Then download PETR-vov-p4-800x320_epoch24.pth
from https://drive.google.com/file/d/1-afU8MhAf92dneOIbhoVxl_b72IAWOEJ/view?usp=sharing
and PETRv2-vov-p4-800x320_epoch24.pth
from https://drive.google.com/file/d/1tv_D8Ahp9tz5n4pFp4a64k-IrUZPu5Im/view?usp=sharing
to folder /workspace/PETR/ckpts
. Note: these two files originally have the same name epoch_24.pth
so don't forget to rename after download them.
After the setup, your PETR folder should looks like:
├── data/
│ └── nuscenes/
│ ├── v1.0-trainval/
│ ├── samples/
│ ├── sweeps/
│ ├── nuscenes_infos_train.pkl
│ ├── nuscenes_infos_val.pkl
│ ├── mmdet3d_nuscenes_30f_infos_train.pkl
│ └── mmdet3d_nuscenes_30f_infos_val.pkl
├── mmdetection3d/
├── projects/
├── tools/
├── install.md
├── requirements.txt
├── LICENSE
└── README.md
You may verify your installation with
cd /workspace/PETR
CUDA_VISIBLE_DEVICES=0 python tools/test.py /workspace/PETR/projects/configs/petr/petr_vovnet_gridmask_p4_800x320.py /workspace/PETR/ckpts/PETR-vov-p4-800x320_e24.pth --eval bbox
This command line is expected to output the benchmark results. This environment for PyTorch inference and benchmark will be referred to as torch container
.
NOTE
- For the best user experience, we highly recommend use torch >= 1.14. You may also build the docker with given ./dockerfile. To build the docker, here is the example command line. You may change the argument for volume mapping according to your setup.
cd /workspace/DL4AGX/AV-Solutions/petr-trt docker build --network=host -f dockerfile . -t petr-trt docker run --name=petr-trt -d -it --rm --shm-size=4096m --privileged --gpus all -it --network=host \ -v /workspace:/workspace -v <path to nuscenes>:/data \ petr-trt /bin/bash
To setup the deployment environment, you may run the following commands. Please note that we will export the onnx inside petr-trt.
cd /workspace/DL4AGX/AV-Solutions/petr-trt/export_eval
ln -s /workspace/PETR/data data # create a soft-link to the data folder
ln -s /workspace/PETR/mmdetection3d mmdetection3d # create a soft-link to the mmdetection3d folder
export PYTHONPATH=.:/workspace/PETR/:/workspace/DL4AGX/AV-Solutions/petr-trt/export_eval/
To export the ONNX of PETRv1
cd /workspace/DL4AGX/AV-Solutions/petr-trt/export_eval
python v1/v1_export_to_onnx.py /workspace/PETR/projects/configs/petr/petr_vovnet_gridmask_p4_800x320.py /workspace/PETR/ckpts/PETR-vov-p4-800x320_e24.pth --eval bbox
This script will create PETRv1.extract_feat.onnx
and PETRv1.pts_bbox_head.forward.onnx
inside onnx_files
.
As PETRv2
is a temporal model, the inference behavior is slightly different from PETRv1
.
Originally the backbone extract features from two input frames, i.e. the current and the previous frames.
However the feature extracted from the previous frame can be reused to improve efficiency.
So, we modify the behavior of function extract_feat
when we export the model.
It will use cached feature map as input instead of recomputing them.
cd /workspace/DL4AGX/AV-Solutions/petr-trt/export_eval
python v2/v2_export_to_onnx.py /workspace/PETR/projects/configs/petrv2/petrv2_vovnet_gridmask_p4_800x320.py /workspace/PETR/ckpts/PETRv2-vov-p4-800x320_e24.pth --eval bbox
This script will create PETRv2.extract_feat.onnx
and PETRv2.pts_bbox_head.forward.onnx
inside onnx_files
.
NOTE As
coords_position_embeding
solely depends onlidar2img
andimg_shape
fromimg_metas
, we move this part outside of the onnx. We can use the samecoords_position_embeding
tensor iflidar2img
andimg_shape
remains unchange.
We provide v1/v1_evaluate_trt.py
and v2/v2_evaluate_trt.py
to run benchmark with TensorRT. It will produce similar result as the original benchmark with PyTorch.
- Prepare dependencies for benchmark:
pip install <TensorRT Root>/python/tensorrt-<version>-cp38-none-linux_aarch64.whl
- Build TensorRT engine
We provide a script that will load and create engine files for the four simplified onnx files.
export TRT_ROOT=<path to your tensorrt dir>
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$TRT_ROOT/lib
cd /workspace/DL4AGX/AV-Solutions/petr-trt/export_eval
bash onnx2trt.sh
The above script builds TensorRT engines in FP16 precision as an example.
- Run benchmark with TensorRT
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$TRT_ROOT/lib
cd /workspace/DL4AGX/AV-Solutions/petr-trt/export_eval
# benchmark PETRv1
python v1/v1_evaluate_trt.py /workspace/PETR/projects/configs/petr/petr_vovnet_gridmask_p4_800x320.py /workspace/PETR/ckpts/PETR-vov-p4-800x320_e24.pth --eval bbox
# benchmark PETRv2
python v2/v2_evaluate_trt.py /workspace/PETR/projects/configs/petrv2/petrv2_vovnet_gridmask_p4_800x320.py /workspace/PETR/ckpts/PETRv2-vov-p4-800x320_e24.pth --eval bbox
As we replace the backend from PyTorch to TensorRT while keeping other parts like data loading and evaluation unchanged, you are expected to see outputs similar to the PyTorch benchmark.
This model is to be deployed on NVIDIA DRIVE Orin with TensorRT 10.8.0.32.
We will use the following NVIDIA DRIVE docker image drive-agx-orin-linux-aarch64-pdk-build-x86:6.5.1.0-latest
as the cross-compile environment, this container will be referred to as the build container
.
To launch the docker on the host x86 machine, you may run:
docker run --gpus all -it --network=host --rm \
-v /workspace:/workspace \
nvcr.io/drive/driveos-sdk/drive-agx-orin-linux-aarch64-pdk-build-x86:6.5.1.0-latest
To gain access to the docker image and the corresponding TensorRT, please join the DRIVE AGX SDK Developer Program. You can find more details on NVIDIA DRIVE site.
Similar to what we did when building plugins, you may run the following commands inside the build container
.
# inside cross-compile environment
cd /workspace/dl4agx/AV-Solutions/petr-trt/app
bash setup_dep.sh # download dependencies (stb, cuOSD)
mkdir -p build+orin && cd build+orin
cmake -DTARGET=aarch64 -DTRT_ROOT=<path to your aarch64 tensorrt dir> .. && make
We expect to see petr_v1
and petr_v2
under petr-trt/app/build+orin/
In this demo run, we will setup everything under folder petr-trt/app/demo/
.
- Copy cross-compiled application to demo folder
cd /workspace/dl4agx/AV-Solutions/petr-trt/
cp app/build+orin/petr* app/demo/
- Prepare input data for inference
In the torch container
environment on x86, run
cd /workspace/dl4agx/AV-Solutions/petr-trt/export_eval
python v1/v1_save_data.py /workspace/PETR/projects/configs/petr/petr_vovnet_gridmask_p4_800x320.py /workspace/PETR/ckpts/PETR-vov-p4-800x320_e24.pth --eval bbox
python v2/v2_save_data.py /workspace/PETR/projects/configs/petrv2/petrv2_vovnet_gridmask_p4_800x320.py /workspace/PETR/ckpts/PETRv2-vov-p4-800x320_e24.pth --eval bbox
This will dump necessary data files to petr-trt/export_eval/demo/data/
. Please beware that v2_save_data.py
will only generate necessary data files on top of v1_save_data.py
. Make sure you call v1/v1_save_data.py
first then v2/v2_save_data.py
.
We can then move them by
cd /workspace/DL4AGX/AV-Solutions/petr-trt/
cp -r export_eval/demo/data/ app/demo/
cp -r export_eval/onnx_files/*.onnx app/demo/onnx_files/
Now the petr-trt/app/demo
folder should be organized as:
├── data/
│ ├── cams/
│ ├── imgs/
│ ├── lidar2imgs/
│ ├── v1_coords_pe.bin
│ ├── v2_coords_pe.bin
│ └── v2_mean_time_stamp.bin
├── engines/
├── onnx_files/
│ ├── PETRv1.extract_feat.onnx
│ ├── PETRv1.pts_bbox_head.forward.onnx
│ ├── PETRv2.extract_feat.onnx
│ └── PETRv2.pts_bbox_head.forward.onnx
├── viz_v1/
├── viz_v2/
├── onnx2trt.sh
├── simhei.ttf
├── v1_config.json
└── v2_config.json
Now you may copy or mount all the data under DL4AGX/AV-Solutions/petr-trt/app/demo
to /demo
folder on NVIDIA Drive Orin.
You may utilize trtexec
to build the engine from the onnx files on NVIDIA Drive Orin. We provide a bash script that wraps trtexec
commands.
export TRT_ROOT=<path to tensorrt on NVIDIA Drive Orin>
cd /demo
bash onnx2trt.sh
This script will load all four onnx files under /demo/onnx_files
and generate the corresponding engine files under /demo/engines/
.
You may explore the script, and modify options like precision according to your needs.
To run the demo app, just simply call
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$TRT_ROOT/lib
cd /demo
# to run petr v1
./petr_v1 ./v1_config.json
# to run petr v2
./petr_v2 ./v2_config.json
Then you may find visualization result under /demo/viz_v1
and /demo/viz_v2
in jpg format.
Example (Left PETRv1, Right PETRv2):
- PETRv1&v2 and it's related code were licensed under Apache-2.0
- cuOSD and it's related code were licensed under MIT