🚨 This repository contains download links to our dataset, code snippets, and trained deep models of our work "Learning Depth Estimation for Transparent and Mirror Surfaces", ICCV 2023
by Alex Costanzino*, Pierluigi Zama Ramirez*, Matteo Poggi*, Fabio Tosi, Stefano Mattoccia, and Luigi Di Stefano. * Equal Contribution
University of Bologna
Inferring the depth of transparent or mirror (ToM) surfaces represents a hard challenge for either sensors, algorithms, or deep networks. We propose a simple pipeline for learning to estimate depth properly for such surfaces with neural networks, without requiring any ground-truth annotation. We unveil how to obtain reliable pseudo labels by in-painting ToM objects in images and processing them with a monocular depth estimation model. These labels can be used to fine-tune existing monocular or stereo networks, to let them learn how to deal with ToM surfaces. Experimental results on the Booster dataset show the dramatic improvements enabled by our remarkably simple proposal.
🖋️ If you find this code useful in your research, please cite:
@inproceedings{costanzino2023iccv,
title = {Learning Depth Estimation for Transparent and Mirror Surfaces},
author = {Costanzino, Alex and Zama Ramirez, Pierluigi and Poggi, Matteo and Tosi, Fabio and Mattoccia, Stefano and Di Stefano, Luigi},
booktitle = {The IEEE International Conference on Computer Vision},
note = {ICCV},
year = {2023},
}
In our experiments, we employed two datasets featuring transparent or mirror objets: Trans10K and MSD. With our in-painting technique we obtain virtual depth maps to finetune monocular networks. For sake of reproducibility, we make available Trans10K and MSD together with proxy labels used to finetune our models.
Trans10K and MSD with Virtual Depths. [Download]
We also employed the Booster Dataset in our experiment. [Download]
Here, you can download the weights of MiDAS and DPT architectures employed in the results of Table 2 and Table 3 of our paper. If you just need the best model, use "Table 2/Ft. Virtual Depth/dpt_large_final.pt
To use these weights, please follow these steps:
- Create a folder named
weights
in the project directory. - Download the weights [Download]
- Copy the downloaded weights into the
weights
folder.
Warning:
- Please be aware that we will not be releasing the training code for deep stereo models. We provide only our algorithm to obtain proxy depth labels by merging monocular and stereo predictions.
- The code utilizes
wandb
during training to log results. Please be sure to have a wandb account. Otherwise, if you prefer to not usewandb
, comment the wandb logging code lines infinetune.py
.
Dependencies: Ensure that you have installed all the necessary dependencies. The list of dependencies can be found in the ./requirements.txt
file.
The run.py
script test monocular networks. It can be used to predict the monocular depth maps from pretrained networks, or to apply our in-painting strategy of Base networks to obtain Virtual Depths.
You can specify the following options:
--input_path
: Path to the root directory of the dataset. E.g., Booster/balanced/train if you want to test the model on the training set of Booster.--dataset_txt
: The list of the dataset samples. Each line contains the relative path toinput_path
of each image. You can find some examples in the folder datasets/. E.g., to run on the training set of booster use datasets\booster\train_stereo.txt--mask_path
: Optional path with the folder containing masks. Each mask shoud have the same relative path of the corresponding image. When this path is specified, masks are applied to colorize ToM objects.--cls2mask
: IDs referring to ToM objects in masks.--it
: Number of inferences for each image. Used when in-painting with several random colors.--output_path
: Output directory,--output_list
: Save the prediction paths in a txt file.--save_full_res
: Save the prediction at the input resolution. If not specified save the predictions at the model output resolution.--model_weights
: Path to the trained weights of the model. If not specified load the Base network weights from default paths.--model_type
: Model type. Eitherdpt_large
ormidas_v21
.
You can reproduce the results of Table 2 and Table 3 of the paper by running scripts/table2.sh
and scripts/table3.sh
.
If you haven't downloaded the pretrained models yet, you can find the download links in the Pretrained Models section above.
To finetune networks refer to the example in scripts/finetune.sh
To generate virtual depth from depth networks using our in-paiting strategy refer to the example in scripts/generate_virtual_depth.sh
To generate proxy depth maps with our merging strategy to finetune stereo networks you can use create_proxy_stereo.py
.
As explained above, we will not release the code for finetuning stereo networks. However, our implementation was based on the official codes of RAFT-Stereo and CREStereo.
In this section, we present illustrative examples that demonstrate the effectiveness of our proposal.
For questions, please send an email to [email protected], [email protected], [email protected], or [email protected]
We would like to extend our sincere appreciation to the authors of the following projects for making their code available, which we have utilized in our work:
- We would like to thank the authors of MiDAS, RAFT-Stereo and CREStereo for providing their code, which has been instrumental in our experiments.
We deeply appreciate the authors of the competing research papers for their helpful responses, and provision of model weights, which greatly aided accurate comparisons.