Dolly Zoom

Introduction

We aim to create a dolly zoom effect on a single shot without any depth information.

In Dolly Zoom, the foreground stays the same while the background moves, so we need to have depth and matting information for the shot.

We used 3D ken burn effect as our baseline. We upgraded the depth estimation block to "Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging".

Here is our pipeline:

Given a sequence of video frames, we generate depth and trimap estimations, we then fuse the trimap and depth map and generate a refined depth map. We give in our refined depth map and our input source frames to our view synthesis network to generate the final result.

setup

Tested with Python 3.6 and Pytorch 1.6.

Several functions are implemented in CUDA using CuPy, which is why CuPy is a required dependency. It can be installed using pip install cupy or alternatively using one of the provided binary packages as outlined in the CuPy repository. Please also make sure to have the CUDA_HOME environment variable configured.

In order to generate the video results, please also make sure to have pip install moviepy installed.

Three different Repositories are used in this repo.

Download Midas model weights from https://github.com/intel-isl/MiDaS. Put the weight in the following path :

midas/model-f46da743.pt

Download depthmerge model weights from https://github.com/ouranonymouscvpr/cvpr2021_ouranonymouscvpr. Put the weights in the following path :

depthmerge/checkpoints/scaled_04_1024/latest_net_G.pth

Usage

To run it on a video and generate the Vertigo effect (Dolly Zoom) fully automatically, use the following command.

first edit the following three lines of dollyzoom.py

    arguments_strIn = ['./images/input.mp4']
    arguments_strOut = './output'
    starter_zoom = 2

Then run

python dollyzoom.py'

Results

The original video is on the left, and the final result with the dolly zoom effect is on the right.

arezou.mp4

mahsa.mp4

separez.mp4

Acknowledgement

We borrowed some parts of the following papers and their implementation for our project

Boosting Monocular Depth Estimation

https://github.com/compphoto/BoostingMonocularDepth

@INPROCEEDINGS{Miangoleh2021Boosting,
author={S. Mahdi H. Miangoleh and Sebastian Dille and Long Mai and Sylvain Paris and Ya\u{g}{\i}z Aksoy},
title={Boosting Monocular Depth Estimation Models to High-Resolution via Content-Adaptive Multi-Resolution Merging},
journal={Proc. CVPR},
year={2021},
}

Midas

https://github.com/intel-isl/MiDaS

@article{Ranftl2020,
	author    = {Ren\'{e} Ranftl and Katrin Lasinger and David Hafner and Konrad Schindler and Vladlen Koltun},
	title     = {Towards Robust Monocular Depth Estimation: Mixing Datasets for Zero-shot Cross-dataset Transfer},
	journal   = {IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI)},
	year      = {2020},
}

3D ken Burns effect from a single image

https://github.com/sniklaus/3d-ken-burns

@article{Niklaus_TOG_2019,
         author = {Simon Niklaus and Long Mai and Jimei Yang and Feng Liu},
         title = {3D Ken Burns Effect from a Single Image},
         journal = {ACM Transactions on Graphics},
         volume = {38},
         number = {6},
         pages = {184:1--184:15},
         year = {2019}
     }

Pix2Pix

https://github.com/junyanz/pytorch-CycleGAN-and-pix2pix

@inproceedings{CycleGAN2017,
  title={Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networkss},
  author={Zhu, Jun-Yan and Park, Taesung and Isola, Phillip and Efros, Alexei A},
  booktitle={Computer Vision (ICCV), 2017 IEEE International Conference on},
  year={2017}
}


@inproceedings{isola2017image,
  title={Image-to-Image Translation with Conditional Adversarial Networks},
  author={Isola, Phillip and Zhu, Jun-Yan and Zhou, Tinghui and Efros, Alexei A},
  booktitle={Computer Vision and Pattern Recognition (CVPR), 2017 IEEE Conference on},
  year={2017}
}

MODNet: Is a Green Screen Really Necessary for Real-Time Portrait Matting?

https://github.com/ZHKKKe/MODNet/blob/master/README.md

@article{MODNet,
  author = {Zhanghan Ke and Kaican Li and Yurou Zhou and Qiuhua Wu and Xiangyu Mao and Qiong Yan and Rynson W.H. Lau},
  title = {Is a Green Screen Really Necessary for Real-Time Portrait Matting?},
  journal={ArXiv},
  volume={abs/2011.11961},
  year = {2020},
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
MODNet		MODNet
depthmerge		depthmerge
midas		midas
models		models
.gitignore		.gitignore
README.md		README.md
autozoom.py		autozoom.py
benchmark-ibims.py		benchmark-ibims.py
benchmark-nyu.py		benchmark-nyu.py
common.cuda		common.cuda
common.py		common.py
depthestim.py		depthestim.py
dollyzoom.py		dollyzoom.py
interface.html		interface.html
interface.py		interface.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dolly Zoom

Introduction

setup

Usage

Results

Acknowledgement

Boosting Monocular Depth Estimation

Midas

3D ken Burns effect from a single image

Pix2Pix

MODNet: Is a Green Screen Really Necessary for Real-Time Portrait Matting?

About

Releases

Packages

Languages

sepideh-srj/dollyzoom

Folders and files

Latest commit

History

Repository files navigation

Dolly Zoom

Introduction

setup

Usage

Results

Acknowledgement

Boosting Monocular Depth Estimation

Midas

3D ken Burns effect from a single image

Pix2Pix

MODNet: Is a Green Screen Really Necessary for Real-Time Portrait Matting?

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages