Skip to content

[CVPR 2024] The official implementation of paper "Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training"

Notifications You must be signed in to change notification settings

UCSC-VLAA/MixCon3D

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training

This repository contains the official implementation of "MixCon3D" in our paper.

PWC

PWC

PWC

Overview of the MixCon3D. We integrate the image and point cloud modality information, formulating a holistic 3D instance-level representation for cross-modal alignment.

News

[2023.11.2] We release the training code of the MixCon3D.

Installation

Please refer to this instruction for step-by-step installation guidance. Both the necessary packages and some helpful debug experience are provided.

Data Downloading

First, modify the path in the download_data.py. Then, execute the following command to download data from Hugging Face:

python3 download_data.py

The datasets used for experiments are the same as OpenShape. Please refer to OpenShape for more details of the data.

Training

To train the PointBERT model using the MixCon3D, please change the [data_path] to your specified local data path. Then, running the following command:

  1. On an 8 GPU server, training with batchsize 2048
torchrun --nproc_per_node=8 src/main.py --ngpu 8 dataset.folder=data_path dataset.train_batch_size=256 model.name=PointBERT model.scaling=3 model.use_dense=True --trial_name MixCon3D --config src/configs/train.yaml
  1. If the GPU memory is not enough, you can lower the train_batch_size and run with a specific accumulated iter num as follows (The equivalent batchsize = train_batch_size * accum_freq):
torchrun --nproc_per_node=8 src/main.py --ngpu 8 dataset.folder=data_path dataset.train_batch_size=128 dataset.accum_freq=2 model.name=PointBERT model.scaling=3 model.use_dense=True --trial_name MixCon3D --config src/configs/train.yaml

To train on different datasets, use the following command:

(Ensemble_no_LVIS)

torchrun --nproc_per_node=8 src/main.py --ngpu 8 dataset.folder=data_path dataset.train_split=meta_data/split/train_no_lvis.json dataset.train_batch_size=128 dataset.accum_freq=2 model.name=PointBERT model.scaling=3 model.use_dense=True --trial_name MixCon3D --config src/configs/train.yaml

(ShapeNet_only)

torchrun --nproc_per_node=8 src/main.py --ngpu 8 dataset.folder=data_path dataset.train_split=meta_data/split/ablation/train_shapenet_only.json dataset.train_batch_size=128 dataset.accum_freq=1 model.name=PointBERT model.scaling=3 model.use_dense=True --trial_name MixCon3D --config src/configs/train.yaml

We use the wandb for logging.

Acknowledgment

This codebase is based on OpenShape, timm and PointBERT. Thanks to the authors for their awesome contributions! This work is partially supported by TPU Research Cloud (TRC) program and Google Cloud Research Credits program.

Citation

@inproceedings{gao2023mixcon3d,
  title     = {Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training},
  author    = {Yipeng Gao and Zeyu Wang and Wei-Shi Zheng and Cihang Xie and Yuyin Zhou},
  booktitle = {CVPR},
  year      = {2024},
}

Contact

Questions and discussions are welcome via [email protected] or open an issue here.

About

[CVPR 2024] The official implementation of paper "Sculpting Holistic 3D Representation in Contrastive Language-Image-3D Pre-training"

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages