Skip to content

Latest commit

 

History

History
256 lines (169 loc) · 7.93 KB

README.md

File metadata and controls

256 lines (169 loc) · 7.93 KB
If you like our project, please give us a star ⭐ on GitHub for the latest update.

webpage arXiv License: MIT

😮 Highlights

Free-Editor allows you to edit your 3D scenes by editing only a single view of that scene. The editing is training-free and can be done in a matter of 3 minutes! instead of 70 minutes! in SOTA.

💡 Training-free, View Consistent, High-quality, and Fast-speed

  • Stable Diffusion (SD) for image generation --> high-quality
  • Single View editing --> higher chance of view-consistent editing as it is hard to obtain consistent editing effects in multiple views with SD
  • The editing process is training-free as we use a generalized NeRF model --> fast high-quality 3D content reconstruction.

🚩 Updates

Welcome to watch 👀 this repository for the latest updates.

[2023.12.21] : We have released our paper, Free-Editor on arXiv.

[2023.12.18] : Release project page.

  • Code release.

🛠️ Methodology

Overview of our proposed method. We train a generalized NeRF (G(.)) that takes a single edited starting view and M source views to render a novel target view. Here, ”Edited Target View” is not the input to the model rather will be rendered and works as the ground truth for the prediction of G(.). In G(.) we employ a special Edit Transformer that utilizes: cross-attention to produce style-informed source feature maps that will be aggregated through an Epipolar Transformer. At inference, we can synthesize novel edited views in a zero-shot manner. To edit a scene, we take only a single image as the starting view and edit it using a Text-to-Image (T2I) diffusion model. Based on this starting view, we can render novel edited target views.

Implementation

Create environment

Do the Following-

      conda create --name nerfstudio -y python=3.9
      conda activate nerfstudio
      python -m pip install --upgrade pip

Install dependencies

  • If you have exisiting installation, first make sure to uninstall using this command:
      pip uninstall torch torchvision functorch tinycudann
  • Then Install CUDA 11.8 with this command:
      conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit
  • Then install Pytorch 2.1.2 using this command:
      pip install torch==2.1.2+cu118 torchvision==0.16.2+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
  • After pytorch and ninja, install the torch bindings for tiny-cuda-nn:
      pip install ninja git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
  • Installing NerfStudio: Sometimes, you may face issue with configobj or other packages, manually install them from source. For example,
      git clone https://github.com/DiffSK/configobj.git
      cd configobj
      python setup.py install
  • From pip (it does not work in cluster):
      pip install nerfstudio
  • If you want build from source and want the latest development version, use this command:
      git clone https://github.com/nerfstudio-project/nerfstudio.git
      cd nerfstudio
      pip install --upgrade pip setuptools
      pip install -e .
  • Download some test data:
      ns-download-data nerfstudio --capture-name=poster           
  • Train model
      ns-train nerfacto --data data/nerfstudio/poster

If you start seeing on your linux terminal that it started training, then it means everything is good to go!

  • Install ImageMagick as we need it for some datasets.
      cd ~
      wget https://download.imagemagick.org/ImageMagick/download/ImageMagick.tar.gz
      tar -xvzf ImageMagick.tar.gz
      cd ImageMagick-*
      ./configure --prefix=$HOME/imagemagick
      make
      make install

There maybe additional dependencies you have to install as well.

Download Datasets and Pre-trained Models

To download other datasets, please visit this link - https://huggingface.co/datasets/yangtaointernship/RealEstate10K-subset/tree/main

  • Here, "synthetic_scenes.zip" is the deepvoxels data.

  • "nerf_synthetic" and blender dataset possibly the same dataset.

  • "frames.zip" is the extracted frames for 200 scenes of RealEstate10K dataset. "train.zip" is the camera files.

  • For Shiny Dataset, go to - https://nex-mpi.github.io/

  • For Spaces Dataset,

      git clone https://github.com/augmentedperception/spaces_dataset
  • If you want to use "nerfbaselines",
      conda deactivate 
      conda create --name nerfbase 
      conda activate nerfbase
      pip install nerfbaselines
  • Download Sample Datasets. For Example,

  • Downloads the garden scene to the cache folder.

      mdkir data
      cd data
      mkdir nerf_dataset
      cd nerf_dataset

      nerfbaselines download-dataset external://mipnerf360/garden
  • Downloads all nerfstudio scenes to the cache
      nerfbaselines download-dataset external://nerfstudio
  • Downloads kithen scene to folder kitchen
      nerfbaselines download-dataset external://mipnerf360/kitchen -o kitchen
  • Caption Generation Model.
      git clone https://huggingface.co/Salesforce/blip2-opt-2.7b

If you want to use a smaller version, use this

      from transformers import BlipProcessor, BlipForConditionalGeneration
      
      processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
      model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")
  • Stable Diffusion 3 Medium (Fast and Accurate). However, it does not support text-to-image editing yet. So, we use v1.5 which can be downloaded automatically (See the next step).
      git clone https://huggingface.co/stabilityai/stable-diffusion-3-medium
  • If you don't want to download the pre-trained model, generate an access token in hugging face (Go to your account settings) and login into your account
      huggingface-cli login

Free-Editor Dataset Generation

      python src/fedit/dataset_creation.py 

Free-Editor Training

      python train.py 

🚀 3D-Editing Results

Qualitative comparison

Quantitative comparison

Quantitative evaluation of scene edits in terms of Edit PSNR, CLIP Text-Image Directional Similarity (CTDS) and CLIP directional consistency (CDS).

👍 Acknowledgement

This work is built on many amazing research works and open-source projects, thanks a lot to all the authors for sharing!

✏️ Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and a citation 📝.

@misc{karim2023freeeditor,
      title={Free-Editor: Zero-shot Text-driven 3D Scene Editing}, 
      author={Nazmul Karim and Umar Khalid and Hasan Iqbal and Jing Hua and Chen Chen},
      year={2023},
      eprint={2312.13663},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}