Free-Editor: Zero-shot Text-driven 3D Scene Editing

If you like our project, please give us a star ⭐ on GitHub for the latest update.

Project page | Paper

😮 Highlights

Free-Editor allows you to edit your 3D scenes by editing only a single view of that scene. The editing is training-free and can be done in a matter of 3 minutes! instead of 70 minutes! in SOTA.

💡 Training-free, View Consistent, High-quality, and Fast-speed

Stable Diffusion (SD) for image generation --> high-quality
Single View editing --> higher chance of view-consistent editing as it is hard to obtain consistent editing effects in multiple views with SD
The editing process is training-free as we use a generalized NeRF model --> fast high-quality 3D content reconstruction.

🚩 Updates

Welcome to watch 👀 this repository for the latest updates.

✅ [2023.12.21] : We have released our paper, Free-Editor on arXiv.

✅ [2023.12.18] : Release project page.

Code release.

🛠️ Methodology

Overview of our proposed method. We train a generalized NeRF (G(.)) that takes a single edited starting view and M source views to render a novel target view. Here, ”Edited Target View” is not the input to the model rather will be rendered and works as the ground truth for the prediction of G(.). In G(.) we employ a special Edit Transformer that utilizes: cross-attention to produce style-informed source feature maps that will be aggregated through an Epipolar Transformer. At inference, we can synthesize novel edited views in a zero-shot manner. To edit a scene, we take only a single image as the starting view and edit it using a Text-to-Image (T2I) diffusion model. Based on this starting view, we can render novel edited target views.

Implementation

Create environment

Do the Following-

      conda create --name nerfstudio -y python=3.9
      conda activate nerfstudio
      python -m pip install --upgrade pip

Install dependencies

If you have exisiting installation, first make sure to uninstall using this command:

      pip uninstall torch torchvision functorch tinycudann

Then Install CUDA 11.8 with this command:

      conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit

Then install Pytorch 2.1.2 using this command:

      pip install torch==2.1.2+cu118 torchvision==0.16.2+cu118 --extra-index-url https://download.pytorch.org/whl/cu118

After pytorch and ninja, install the torch bindings for tiny-cuda-nn:

      pip install ninja git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch

Installing NerfStudio: Sometimes, you may face issue with configobj or other packages, manually install them from source. For example,

      git clone https://github.com/DiffSK/configobj.git
      cd configobj
      python setup.py install

From pip (it does not work in cluster):

      pip install nerfstudio

If you want build from source and want the latest development version, use this command:

      git clone https://github.com/nerfstudio-project/nerfstudio.git
      cd nerfstudio
      pip install --upgrade pip setuptools
      pip install -e .

Download some test data:

      ns-download-data nerfstudio --capture-name=poster

Train model

      ns-train nerfacto --data data/nerfstudio/poster

If you start seeing on your linux terminal that it started training, then it means everything is good to go!

Install ImageMagick as we need it for some datasets.

      cd ~
      wget https://download.imagemagick.org/ImageMagick/download/ImageMagick.tar.gz
      tar -xvzf ImageMagick.tar.gz
      cd ImageMagick-*
      ./configure --prefix=$HOME/imagemagick
      make
      make install

There maybe additional dependencies you have to install as well.

Download Datasets and Pre-trained Models

To download other datasets, please visit this link - https://huggingface.co/datasets/yangtaointernship/RealEstate10K-subset/tree/main

Here, "synthetic_scenes.zip" is the deepvoxels data.
"nerf_synthetic" and blender dataset possibly the same dataset.
"frames.zip" is the extracted frames for 200 scenes of RealEstate10K dataset. "train.zip" is the camera files.
For Shiny Dataset, go to - https://nex-mpi.github.io/
For Spaces Dataset,

      git clone https://github.com/augmentedperception/spaces_dataset

If you want to use "nerfbaselines",

      conda deactivate 
      conda create --name nerfbase 
      conda activate nerfbase
      pip install nerfbaselines

Download Sample Datasets. For Example,
Downloads the garden scene to the cache folder.

      mdkir data
      cd data
      mkdir nerf_dataset
      cd nerf_dataset

      nerfbaselines download-dataset external://mipnerf360/garden

Downloads all nerfstudio scenes to the cache

      nerfbaselines download-dataset external://nerfstudio

Downloads kithen scene to folder kitchen

      nerfbaselines download-dataset external://mipnerf360/kitchen -o kitchen

Caption Generation Model.

      git clone https://huggingface.co/Salesforce/blip2-opt-2.7b

If you want to use a smaller version, use this

      from transformers import BlipProcessor, BlipForConditionalGeneration
      
      processor = BlipProcessor.from_pretrained("Salesforce/blip-image-captioning-base")
      model = BlipForConditionalGeneration.from_pretrained("Salesforce/blip-image-captioning-base")

Stable Diffusion 3 Medium (Fast and Accurate). However, it does not support text-to-image editing yet. So, we use v1.5 which can be downloaded automatically (See the next step).

      git clone https://huggingface.co/stabilityai/stable-diffusion-3-medium

If you don't want to download the pre-trained model, generate an access token in hugging face (Go to your account settings) and login into your account

      huggingface-cli login

Free-Editor Dataset Generation

      python src/fedit/dataset_creation.py

Free-Editor Training

      python train.py

🚀 3D-Editing Results

Qualitative comparison

Quantitative comparison

Quantitative evaluation of scene edits in terms of Edit PSNR, CLIP Text-Image Directional Similarity (CTDS) and CLIP directional consistency (CDS).

👍 Acknowledgement

This work is built on many amazing research works and open-source projects, thanks a lot to all the authors for sharing!

InstructPix2Pix
Instruct-NeRF2NeRF
Diffusers

✏️ Citation

If you find our paper and code useful in your research, please consider giving a star ⭐ and a citation 📝.

@misc{karim2023freeeditor,
      title={Free-Editor: Zero-shot Text-driven 3D Scene Editing}, 
      author={Nazmul Karim and Umar Khalid and Hasan Iqbal and Jing Hua and Chen Chen},
      year={2023},
      eprint={2312.13663},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Free-Editor: Zero-shot Text-driven 3D Scene Editing

If you like our project, please give us a star ⭐ on GitHub for the latest update.

Project page | Paper

😮 Highlights

💡 Training-free, View Consistent, High-quality, and Fast-speed

🚩 Updates

🛠️ Methodology

Implementation

Create environment

Install dependencies

Download Datasets and Pre-trained Models

Free-Editor Dataset Generation

Free-Editor Training

🚀 3D-Editing Results

Qualitative comparison

Quantitative comparison

👍 Acknowledgement

✏️ Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Free-Editor: Zero-shot Text-driven 3D Scene Editing

If you like our project, please give us a star ⭐ on GitHub for the latest update.

Project page | Paper

😮 Highlights

💡 Training-free, View Consistent, High-quality, and Fast-speed

🚩 Updates

🛠️ Methodology

Implementation

Create environment

Install dependencies

Download Datasets and Pre-trained Models

Free-Editor Dataset Generation

Free-Editor Training

🚀 3D-Editing Results

Qualitative comparison

Quantitative comparison

👍 Acknowledgement

✏️ Citation