Skip to content

[AAAI 2025]MMGDreamer: Mixed-Modality Graph for Geometry-Controllable 3D Indoor Scene Generation

License

Notifications You must be signed in to change notification settings

yangzhifeio/MMGDreamer

Repository files navigation

LogoMMGDreamer: Mixed-Modality Graph for Geometry-Controllable 3D Indoor Scene Generation



1Peking University    2Beihang University    3 Beijing Digital Native Digital City Research Center   
4 Theta Icon Theta Labs, Inc.    5 Technical University of Munich
AAAI 2025

Indicates Corresponding Author

pku-api-logo     theta-video-api-logo     tum-api-logo

teaser
MMGDreamer is a dual-branch diffusion model for scene generation that incorporates a novel Mixed-Modality Graph, visual enhancement module, and relation predictor.
Feel free to contact Zhifei Yang (email at [email protected]) or open an issue if you have any questions or suggestions.

📢 News

  • 2024-12-22: The source code are released. 🎉🎉
  • 2024-12-10: MMGDreamer is accepted by AAAI 2025.👏👏

📋 TODO

  • Release the training and evaluation model code
  • Release the pre-trained weights of VQ-VAE
  • Release the training scripts
  • Release the evaluation scripts
  • Release the VEM and RP modules code

🔧 Installation

conda create -n mmgdreamer python=3.8
conda activate mmgdreamer

We have tested it on Ubuntu 20.04 with PyTorch 1.11.0, CUDA 11.3 and Pytorch3D.

pip install -r requirements.txt 
pip install einops omegaconf tensorboardx open3d

(Note: if one encounters a problem with PyYAML, please refer to this link.)

Install mmcv-det3d (optional):

pip install openmim
mim install mmengine
mim install mmcv
mim install mmdet
mim install mmdet3d

Install CLIP:

pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git

📊 Dataset

I. Download 3D-FUTURE-SDF. This is processed by ourselves on the 3D-FUTURE meshes using tools in SDFusion.

II. Follow this page for downloading SG-FRONT dataset and accessing more information.

III. Optional

  1. Download the 3D-FRONT dataset from their official site.

  2. Preprocess the dataset following ATISS.

IV. Create a folder named FRONT, and copy all files to it.

The structure should be similar like this:

FRONT
|--3D-FUTURE-SDF
|--All SG-FRONT files (.json and .txt)
|--3D-FRONT (optional)
|--3D-FRONT-texture (optional)
|--3D-FUTURE-model (optional)
|--3D-FUTURE-scene (optional)
|--3D-FRONT_preprocessed (optional, by ATISS)
|--threed_front.pkl (optional, by ATISS)

Models

Essential: Download pretrained VQ-VAE model from here to the folder checkpoint. Thanks Guangyao Zhai for provided the pretrained weights.

🛩 Training

To train the models, run:

bash scripts_sh/train_all_mask.sh

--exp: the path where trained models and logs would like to be stored.

--room_type: rooms to train, e.g., 'livingroom', 'diningroom', 'bedroom', and 'all'. We train all rooms together in the implementation.

--network_type: the network to be trained. We use mmgdreamer

--with_SDF: set to True.

--large : default is False, True means more concrete categories.

--with_CLIP: set to True. Encoding using CLIP.

--with_image: set to True. True means node contains image.

📈 Evaluation

To evaluate the models run:

bash scripts_sh/eval_all_mask.sh

--exp: where the models are stored. If one wants to load our provided models, the path should be aligned with

--gen_shape: set True to make shape branch work.

--room_type: rooms to evaluation, e.g., 'livingroom', 'diningroom', 'bedroom', and 'all'. Please use 'all' to evaluate the results.

--render_type: set mmgscene to make model work.

--with_image: set True if you want to use image in the node.

--mask_type: set three for I+R.

FID/KID

This metric aims to evaluate scene-level fidelity. To evaluate FID/KID, you need to collect ground truth top-down renderings by modifying and running collect_gt_sdf_images.py.

Make sure you download all the files and preprocess the 3D-FRONT. The renderings of generated scenes can be obtained via bash scripts_sh/eval_all_mask.sh.

After obtaining both ground truth images and generated scenes renderings, run:

bash scripts_sh/compute_fid_scores.sh

Attention: FID/KID does not include lamp when calculating, make sure gt image and generated scenes renderings do not include lamp.

MMD/COV/1-NN

This metric aims to evaluate object-level fidelity. To evaluate this, you need to first obtain ground truth object meshes from here (~5G). Thanks Guangyao Zhai for provided the dataset.

Secondly, store per generated object in the generated scenes, which can be done in bash scripts_sh/eval_all_mask.sh. After obtaining object meshes, modify the path in compute_mmd_cov_1nn.py, run:

bash scripts_sh/mmd_cov_1nn.sh

We use CD distance to calculate.

😁 Acknowledgements

Relevant work: CommonScenes, EchoScene.

Disclaimer: This is a code repository for reference only; in case of any discrepancies, the paper shall prevail.

We sincerely thank Echoscene's author Guangyao Zhai for providing the baseline code and helpful discussions.

📚 Citation

If you find our work useful in your research, please consider citing it: @misc{ }

About

[AAAI 2025]MMGDreamer: Mixed-Modality Graph for Geometry-Controllable 3D Indoor Scene Generation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published