This repo can be used for training and testing of
- RGB based depth prediction
- sparse depth based depth prediction
- RGBd (i.e., both RGB and sparse depth) based depth prediction
Our work is based on the following 2 papers: https://arxiv.org/abs/1709.07492, and https://arxiv.org/abs/1807.00275.
This branch has our work on the model with VGG + Random Sampling + Nearest Neighbor Interpolation. Do remember to checkout the branch adhyay2000 for our work on training the same model using self-supervised learning.
Just train on a small subset of KITTI. (We have trained on the whole KITTI odometry dataset)
-
Replace the feature extractor from RESNET-18 to VGGNet for KITTI
-
Use Nearest Neighbour upsampling instead of bilinear interpolation.
-
Use Uniform random sampling with depth points restricted in numbers to 20000.
Compare results with the paper.
Improvements have been made over the proposed model. A self supervised framework was used for getting better accuracies. A Plug-and-play module was also used (https://arxiv.org/pdf/1812.08350.pdf) to generate better results during evaluation of the model.
- Install PyTorch on a machine with CUDA GPU.
- Install the HDF5 and other dependencies (files in our pre-processed datasets are in HDF5 formats).
Note: Please install the above versions of OpenCV only. Our code may not work with the latest version of OpenCV
sudo apt-get update sudo apt-get install -y libhdf5-serial-dev hdf5-tools pip3 install h5py matplotlib imageio scikit-image pip3 install opencv-python==3.4.2.16 pip3 install opencv-contrib-python==3.4.2.16
Note: Our code will not work on a machine which does not have a GPU, as our code uses CUDA.
The training scripts come with several options, which can be listed with the --help
flag.
python3 main.py --help
For instance, run the following command to train a network with ResNet18 as the encoder, upprojection as the decoder, and both RGB and 100 random sparse depth samples as the input to the network and uniform random sampling as the sparsifier. (For the sparsifier using Random Sampling as required by the problem statement: use --sparsifier ran instead)
python3 main.py -a resnet18 -d upproj -m rgbd -s 100 --data kitti --sparsifier uar
We have trained using the following options:
python3 main.py -a vgg16 -d upproj -m rgbd -s 100 --data kitti --sparsifier ran
python3 main.py -a vgg16 -d upproj -m rgb -s 100 --data kitti --sparsifier ran
python3 main.py -a resnet18 -d upproj -m rgbd -s 100 --data kitti --sparsifier uar
python3 main.py -a resnet18 -d upproj -m rgb -s 100 --data kitti --sparsifier uar
Training results will be saved in a folder under the results
folder. The folder's name contain the command-line arguments we used (If some argument not mentioned as command-line argument, a default value is used). To resume a previous training (A checkpoint is saved after every epoch), run
python3 main.py --resume [path_to_previous_model]
To test the performance of a trained model without training, simply run main.py with the --evaluate
option. For instance,
python3 main.py --evaluate [path_to_trained_model]
To test the performance using the plug-and-play module:
python3 main.py --evaluate [path_to_trained_model] --pnp yes
Check out the results of the evaluation in the file eval.csv in the model's corresponding folder in the results folder.
Also, remember that plug and play module can be used only on models trained using rgbd input. It won't work if only rgb is given as input. (That is how the algorithm is designed)
Our trained models are available here.
The names of the models are self explanatory. This folder also has graphs: rmse.png, absrel.png, delta1.png and delta2.png to graphically visualize the performance of our model. This folder also has a picture corresponding to each model, to visualize the prediction of our model on some sample images. The leftmost column has the RGB images, middle column the sparse depth map and the rightmost column the predicted dense depth map.
-
Error metrics on KITTI dataset:
MODEL RMSE(in mm) ABSREL DELTA1 DELTA2 VGG_RGB 4780.1 0.118 84.9 95.38 VGG_RGBD 3729.031 0.0712 93.00 97.34 RESNET_RGB 4858.7 0.1205 84.51 95.22 RESNET_RGBD 3798.221 0.0712 92.79 97.18 SELF_VGG_RGBD 2486.115 0.058 96.15 98.13 SELF_VGG_RGBD_PNP 2434.896 0.056 96.74 98.25 VGG_RGBD_PNP 3724.902 0.0697 93.01 97.31
- Results plotted against number of samples
- Top Row shows the result on RGBd images for ResNet(Left) and VGG(right). Bottom Row shows the result on RGB images for ResNet(Left) and VGG(right)
We have used the below sources for the purpose of this project. We acknowledge the use of code from these sources:
Fangchang Ma, et al. "Sparse-to-Dense: Depth Prediction from Sparse Depth Samples and a Single Image." (2017).
Ma, Fangchang et al. "Self-supervised Sparse-to-Dense: Self-supervised Depth Completion from LiDAR and Monocular Camera". arXiv preprint arXiv:1807.00275. (2018).
Tsun-Hsuan Wang, et al. "Plug-and-Play: Improve Depth Estimation via Sparse Data Propagation." (2018).