This is a PyTorch implementation for MonoPCC: Photometric-invariant Cycle Constraint for Monocular Depth Estimation of Endoscopic Images.
Our experiments are conducted in a conda environment (python 3.7 is recommended) and you can use the below commands to install necessary dependencies:
pip3 install torch==1.9.0+cu111 torchvision==0.10.0+cu111 torchaudio==0.9.0
pip install dominate==2.4.0 Pillow==6.1.0 visdom==0.1.8
pip install tensorboardX==1.4 opencv-python matplotlib scikit-image
pip3 install mmcv-full==1.3.0 mmsegmentation==0.11.0
pip install timm einops IPython
The datasets in our experimental results are SCARED(additional application to [email protected] is necessary), SimCol3D and SERV-CT.
SCARED split
The training/test split for SCARED in our works is defined in the splits/endovis
and further preprocessing is available in AF-SfMLearner.
To prepare the ground truth depth maps, please follow the AF-SfMLearner. Moreover, here we provide the model files to reproduce the reported results.
To evaluate model performance on SCARED, you need to run the following command:
CUDA_VISIBLE_DEVICES=0 python evaluate_depth.py --data_path <your_data_path> --load_weights_folder <your_weight_path> \
--eval_split endovis --dataset endovis --max_depth 150 --png --eval_mono
After that, you can acquire the below evaluation results (the model files and prediction results of SOTAs are also provided here for statistical analysis, e.g., T-test):
Methods |
Abs Rel | Sq Rel | RMSE | RMSE log |
|
---|---|---|---|---|---|
Monodepth2 |
0.060 | 0.432 | 4.885 | 0.082 | 0.972 |
FeatDepth |
0.055 | 0.392 | 4.702 | 0.077 | 0.976 |
HR-Depth |
0.058 | 0.439 | 4.886 | 0.081 | 0.969 |
DIFFNet |
0.057 | 0.423 | 4.812 | 0.079 | 0.975 |
Endo-SfMLearner |
0.057 | 0.414 | 4.756 | 0.078 | 0.976 |
AF-SfMLearner |
0.055 | 0.384 | 4.585 | 0.075 | 0.979 |
MonoViT |
0.057 | 0.416 | 4.919 | 0.079 | 0.977 |
Lite-Mono |
0.056 | 0.398 | 4.614 | 0.077 | 0.974 |
MonoPCC(Ours) |
0.051 | 0.349 | 4.488 | 0.072 | 0.983 |
📌📌 Note that, since our training split is slightly different from AF-SfMLearner, we supplement the comparison results using their training setting here.
As a plug-and-play design, PCC can theoretically be embedded into any backbone network. In addition to the previously used MonoViT, we extend the PCC strategy to the more recent methods, e.g., EndoDAC, achieving the following results:
Methods |
Abs Rel | Sq Rel | RMSE | RMSE log |
|
---|---|---|---|---|---|
EndoDAC |
0.051 | 0.341 | 4.347 | 0.072 | 0.981 |
EndoDAC+PCC |
0.049 | 0.334 | 4.322 | 0.070 | 0.981 |
We also evaluate the performance of pose estimation. Before that, please please follow AF-SfMLearner and prepare the ground truth.
Using the PoseNet models, you can run the below command to acquire the results of pose estimation on two trajectories:
CUDA_VISIBLE_DEVICES=0 python evaluate_pose.py --data_path <your_data_path> --load_weights_folder <your_weight_path>
Here we use MonoPCC to estimate depth maps for a video sequence in SCARED, and then perform 3D reconstruction (ElasticFusion) with the RGB and pseudo depth data:
To further demonstrate the effectiveness of MonoPCC in more general scenarios, we conduct comparison experiment on KITTI dataset. Here we exhibit SOTAs mentioned in our paper, and provide the depth models to validate the below experimental results.
Methods |
Abs Rel | Sq Rel | RMSE | RMSE log |
|
---|---|---|---|---|---|
Monodepth2 |
0.115 | 0.903 | 4.863 | 0.193 | 0.877 |
FeatDepth |
0.104 | 0.729 | 4.481 | 0.179 | 0.893 |
HR-Depth |
0.109 | 0.792 | 4.632 | 0.185 | 0.884 |
DIFFNet |
0.102 | 0.749 | 4.445 | 0.179 | 0.897 |
MonoViT |
0.099 | 0.708 | 4.372 | 0.175 | 0.900 |
Lite-Mono |
0.101 | 0.729 | 4.454 | 0.178 | 0.897 |
MonoPCC(Ours) |
0.098 | 0.677 | 4.318 | 0.173 | 0.900 |
To be specific, please run the following command:
CUDA_VISIBLE_DEVICES=0 python evaluate_depth_kitti.py --data_path <kitti_data_path> --load_weights_folder <your_weight_path> \
--eval_split eigen --dataset kitti --eval_mono
Currently, we have released the evaluation code and model weight files of MonoPCC, which can reproduce the result in our work. In the near future, we will continue to update the complete training code.
Thanks the authors for their works: