DisCo: Diffusion-based Cross-modal Shape Reconstruction

✨ Overview

This repository contains code, models, and demos for a Cross-modal Shape Reconstruction Model called DisCo. Key features:

Utilizes Triplane Diffusion Transformers (Triplane-DiT) for memory-efficient 3D reconstruciotn
Robustly processes multi-view images, adeptly handling real-world challenges such as occlusion and motion blur
Seamlessly integrates point cloud and posed image data and achieve metric-scale 3D reconstructions
Trained on high-quality 3D datasets (LASA, ABO, 3DFRONT, ShapeNet)

Environment Setup

Hardware

We train our model on 8x A100 GPUs with a batch size of 22 per GPU.

Setup environment

The following steps have been tested on Ubuntu20.04. - You must have an NVIDIA graphics card with at least 12GB VRAM and have [CUDA](https://developer.nvidia.com/cuda-downloads) installed. - Install `Python >= 3.8`. - Install `PyTorch==2.3.0` and `torchvision==0.18.0`. ```sh pip install torch==2.3.0 torchvision==0.18.0 --index-url https://download.pytorch.org/whl/cu118 pip install torch-scatter -f https://data.pyg.org/whl/torch-2.3.0+cu118.html ```

Install dependencies:

pip install -r requirements.txt

Install DisCo:

pip install -e .

Inference

Prepare Pretrained Weights

Download the pretrained weight from BaiduYun or SharePoint.
Put ae,dm, and finetune_diffusion folder under DisCo/output. Only the ae and finetune_dm is needed for final evaluation:
- The ae folder stores the VAE weight,
- dm folder stores the diffusion model trained on synthetic data.
- finetune_dm folder stores the diffusion model finetuned on LASA dataset.

Data Preparation

Follow these steps to prepare the data for training:

Obtain Training Data
- Follow the instructions in DATA.md to obtain the necessary training data.
Download CLIP Model Weights
- Download the open_clip_pytorch_model.bin file from SharePoint.
- Place the downloaded file in the DisCo/data directory.
- This weight file is used for extracting image features using the ViT model.

Additional Notes

The weight file is specifically for extracting ViT features from images.
Ensure you have the necessary permissions to access the SharePoint link.
If you encounter any issues during the data preparation process, please refer to the project's issue tracker or contact the maintainers.

Train && Evaluation

Train the Triplane-VAE Model

python launch.py --mode train_vae --gpus 0,1,2,3,4,5,6,7 --category chair

Cache Image and Triplane Features

python launch.py  --mode cache_image_features --gpus 0,1,2,3,4,5,6,7 --category chair
python launch.py  --mode cache_triplane_features --gpus 0,1,2,3,4,5,6,7 --category chair

Train the Triplane-Diffusion Model on Synthetic dataset

python launch.py  --mode train_diffusion --gpus 0,1,2,3,4,5,6,7 --category chair

Finetune the Triplane-Diffusion Model on LASA dataset

python launch.py  --mode finetune_diffusion --gpus 0,1,2,3,4,5,6,7 --category chair

Evaluate the Tripalne-Diffusion Model

python launch.py  --mode evaluate --gpus 0,1,2,3,4,5,6,7 --category chair

results will be saved under ./results/

Put Inference results to scene

python launch.py --mode put_resutls_to_scene

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

DisCo: Diffusion-based Cross-modal Shape Reconstruction

✨ Overview

Contents

Environment Setup

Inference

Data Preparation

Additional Notes

Train && Evaluation

Files

README.md

Latest commit

History

README.md

File metadata and controls

DisCo: Diffusion-based Cross-modal Shape Reconstruction

✨ Overview

Contents

Environment Setup

Inference

Data Preparation

Additional Notes

Train && Evaluation