Learning Depth Estimation for Transparent and Mirror Surfaces (ICCV 2023)

🚨 This repository contains download links to our dataset, code snippets, and trained deep models of our work "Learning Depth Estimation for Transparent and Mirror Surfaces", ICCV 2023

by Alex Costanzino*, Pierluigi Zama Ramirez*, Matteo Poggi*, Fabio Tosi, Stefano Mattoccia, and Luigi Di Stefano. * Equal Contribution

University of Bologna

Project Page | Paper

🎬 Introduction

Inferring the depth of transparent or mirror (ToM) surfaces represents a hard challenge for either sensors, algorithms, or deep networks. We propose a simple pipeline for learning to estimate depth properly for such surfaces with neural networks, without requiring any ground-truth annotation. We unveil how to obtain reliable pseudo labels by in-painting ToM objects in images and processing them with a monocular depth estimation model. These labels can be used to fine-tune existing monocular or stereo networks, to let them learn how to deal with ToM surfaces. Experimental results on the Booster dataset show the dramatic improvements enabled by our remarkably simple proposal.

🖋️ If you find this code useful in your research, please cite:

@inproceedings{costanzino2023iccv,
    title = {Learning Depth Estimation for Transparent and Mirror Surfaces},
    author = {Costanzino, Alex and Zama Ramirez, Pierluigi and Poggi, Matteo and Tosi, Fabio and Mattoccia, Stefano and Di Stefano, Luigi},
    booktitle = {The IEEE International Conference on Computer Vision},
    note = {ICCV},
    year = {2023},
}

🗄️ Dataset

In our experiments, we employed two datasets featuring transparent or mirror objets: Trans10K and MSD. With our in-painting technique we obtain virtual depth maps to finetune monocular networks. For sake of reproducibility, we make available Trans10K and MSD together with proxy labels used to finetune our models.

⬇️ Get Your Hands on the Data

Trans10K and MSD with Virtual Depths. [Download]

We also employed the Booster Dataset in our experiment. [Download]

📥 Pretrained Models

Here, you can download the weights of MiDAS and DPT architectures employed in the results of Table 2 and Table 3 of our paper. If you just need the best model, use "Table 2/Ft. Virtual Depth/dpt_large_final.pt

To use these weights, please follow these steps:

Create a folder named weights in the project directory.
Download the weights [Download]
Copy the downloaded weights into the weights folder.

📝 Code

Warning:

Please be aware that we will not be releasing the training code for deep stereo models. We provide only our algorithm to obtain proxy depth labels by merging monocular and stereo predictions.
The code utilizes wandb during training to log results. Please be sure to have a wandb account. Otherwise, if you prefer to not use wandb, comment the wandb logging code lines in finetune.py.

🛠️ Setup Instructions

Dependencies: Ensure that you have installed all the necessary dependencies. The list of dependencies can be found in the ./requirements.txt file.

🚀 Inference Monocular Networks

The run.py script test monocular networks. It can be used to predict the monocular depth maps from pretrained networks, or to apply our in-painting strategy of Base networks to obtain Virtual Depths.

You can specify the following options:

--input_path: Path to the root directory of the dataset. E.g., Booster/balanced/train if you want to test the model on the training set of Booster.
--dataset_txt: The list of the dataset samples. Each line contains the relative path to input_path of each image. You can find some examples in the folder datasets/. E.g., to run on the training set of booster use datasets\booster\train_stereo.txt
--mask_path: Optional path with the folder containing masks. Each mask shoud have the same relative path of the corresponding image. When this path is specified, masks are applied to colorize ToM objects.
--cls2mask: IDs referring to ToM objects in masks.
--it: Number of inferences for each image. Used when in-painting with several random colors.
--output_path: Output directory,
--output_list: Save the prediction paths in a txt file.
--save_full_res: Save the prediction at the input resolution. If not specified save the predictions at the model output resolution.
--model_weights: Path to the trained weights of the model. If not specified load the Base network weights from default paths.
--model_type: Model type. Either dpt_large or midas_v21.

You can reproduce the results of Table 2 and Table 3 of the paper by running scripts/table2.sh and scripts/table3.sh.

If you haven't downloaded the pretrained models yet, you can find the download links in the Pretrained Models section above.

🚀 Train Monocular Networks

To finetune networks refer to the example in scripts/finetune.sh

🚀 Monocular Virtual Depth Generation

To generate virtual depth from depth networks using our in-paiting strategy refer to the example in scripts/generate_virtual_depth.sh

🚀 Stereo Proxy Depth Generation

To generate proxy depth maps with our merging strategy to finetune stereo networks you can use create_proxy_stereo.py.

As explained above, we will not release the code for finetuning stereo networks. However, our implementation was based on the official codes of RAFT-Stereo and CREStereo.

🎨 Qualitative Results

In this section, we present illustrative examples that demonstrate the effectiveness of our proposal.

✉️ Contacts

For questions, please send an email to [email protected], [email protected], [email protected], or [email protected]

🙏 Acknowledgements

We would like to extend our sincere appreciation to the authors of the following projects for making their code available, which we have utilized in our work:

We would like to thank the authors of MiDAS, RAFT-Stereo and CREStereo for providing their code, which has been instrumental in our experiments.

We deeply appreciate the authors of the competing research papers for their helpful responses, and provision of model weights, which greatly aided accurate comparisons.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Learning Depth Estimation for Transparent and Mirror Surfaces (ICCV 2023)

Project Page | Paper

📑 Table of Contents

🎬 Introduction

🗄️ Dataset

⬇️ Get Your Hands on the Data

📥 Pretrained Models

📝 Code

🛠️ Setup Instructions

🚀 Inference Monocular Networks

🚀 Train Monocular Networks

🚀 Monocular Virtual Depth Generation

🚀 Stereo Proxy Depth Generation

🎨 Qualitative Results

✉️ Contacts

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
datasets		datasets
images		images
midas		midas
scripts		scripts
README.md		README.md
create_proxy_stereo.py		create_proxy_stereo.py
evaluate_mono.py		evaluate_mono.py
finetune.py		finetune.py
loss.py		loss.py
requirements.txt		requirements.txt
run.py		run.py
utils.py		utils.py

CVLAB-Unibo/Depth4ToM-code

Folders and files

Latest commit

History

Repository files navigation

Learning Depth Estimation for Transparent and Mirror Surfaces (ICCV 2023)

Project Page | Paper

📑 Table of Contents

🎬 Introduction

🗄️ Dataset

⬇️ Get Your Hands on the Data

📥 Pretrained Models

📝 Code

🛠️ Setup Instructions

🚀 Inference Monocular Networks

🚀 Train Monocular Networks

🚀 Monocular Virtual Depth Generation

🚀 Stereo Proxy Depth Generation

🎨 Qualitative Results

✉️ Contacts

🙏 Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages