This repository contains code, models, and demos for a Cross-modal Shape Reconstruction Model called DisCo. Key features:
- Utilizes Triplane Diffusion Transformers (Triplane-DiT) for memory-efficient 3D reconstruciotn
- Robustly processes multi-view images, adeptly handling real-world challenges such as occlusion and motion blur
- Seamlessly integrates point cloud and posed image data and achieve metric-scale 3D reconstructions
- Trained on high-quality 3D datasets (LASA, ABO, 3DFRONT, ShapeNet)
Hardware
We train our model on 8x A100 GPUs with a batch size of 22 per GPU.Setup environment
The following steps have been tested on Ubuntu20.04. - You must have an NVIDIA graphics card with at least 12GB VRAM and have [CUDA](https://developer.nvidia.com/cuda-downloads) installed. - Install `Python >= 3.8`. - Install `PyTorch==2.3.0` and `torchvision==0.18.0`. ```sh pip install torch==2.3.0 torchvision==0.18.0 --index-url https://download.pytorch.org/whl/cu118 pip install torch-scatter -f https://data.pyg.org/whl/torch-2.3.0+cu118.html ```- Install dependencies:
pip install -r requirements.txt
- Install DisCo:
pip install -e .
Prepare Pretrained Weights
-
Download the pretrained weight from BaiduYun or SharePoint.
-
Put
ae
,dm
, andfinetune_diffusion
folder under DisCo/output. Only the ae and finetune_dm is needed for final evaluation:- The
ae
folder stores the VAE weight, dm
folder stores the diffusion model trained on synthetic data.finetune_dm
folder stores the diffusion model finetuned on LASA dataset.
- The
Follow these steps to prepare the data for training:
-
Obtain Training Data
- Follow the instructions in DATA.md to obtain the necessary training data.
-
Download CLIP Model Weights
- Download the
open_clip_pytorch_model.bin
file from SharePoint. - Place the downloaded file in the
DisCo/data
directory. - This weight file is used for extracting image features using the ViT model.
- Download the
- The weight file is specifically for extracting ViT features from images.
- Ensure you have the necessary permissions to access the SharePoint link.
- If you encounter any issues during the data preparation process, please refer to the project's issue tracker or contact the maintainers.
-
Train the Triplane-VAE Model
python launch.py --mode train_vae --gpus 0,1,2,3,4,5,6,7 --category chair
-
Cache Image and Triplane Features
python launch.py --mode cache_image_features --gpus 0,1,2,3,4,5,6,7 --category chair python launch.py --mode cache_triplane_features --gpus 0,1,2,3,4,5,6,7 --category chair
-
Train the Triplane-Diffusion Model on Synthetic dataset
python launch.py --mode train_diffusion --gpus 0,1,2,3,4,5,6,7 --category chair
-
Finetune the Triplane-Diffusion Model on LASA dataset
python launch.py --mode finetune_diffusion --gpus 0,1,2,3,4,5,6,7 --category chair
-
Evaluate the Tripalne-Diffusion Model
python launch.py --mode evaluate --gpus 0,1,2,3,4,5,6,7 --category chair
results will be saved under ./results/
-
Put Inference results to scene
python launch.py --mode put_resutls_to_scene