MMGDreamer: Mixed-Modality Graph for Geometry-Controllable 3D Indoor Scene Generation

Zhifei Yang¹, Keyang Lu², Chao Zhang^3✉, Jiaxing Qi², Hanqi Jiang³, Ruifei Ma³, Shenglin Yin¹, Yifan Xu², Mingzhe Xing¹, Zhen Xiao^1✉, Jieyi Long⁴, Xiangde Liu³, Guangyao Zhai⁵

¹Peking University ²Beihang University ³ Beijing Digital Native Digital City Research Center
⁴

Theta Labs, Inc. ⁵ Technical University of Munich
AAAI 2025
^✉Indicates Corresponding Author

MMGDreamer is a dual-branch diffusion model for scene generation that incorporates a novel Mixed-Modality Graph, visual enhancement module, and relation predictor.
Feel free to contact Zhifei Yang (email at [email protected]) or open an issue if you have any questions or suggestions.

📢 News

2024-12-22: The source code are released. 🎉🎉
2024-12-10: MMGDreamer is accepted by AAAI 2025.👏👏

📋 TODO

Release the training and evaluation model code
Release the pre-trained weights of VQ-VAE
Release the training scripts
Release the evaluation scripts
Release the VEM and RP modules code

🔧 Installation

conda create -n mmgdreamer python=3.8
conda activate mmgdreamer

We have tested it on Ubuntu 20.04 with PyTorch 1.11.0, CUDA 11.3 and Pytorch3D.

pip install -r requirements.txt 
pip install einops omegaconf tensorboardx open3d

(Note: if one encounters a problem with PyYAML, please refer to this link.)

Install mmcv-det3d (optional):

pip install openmim
mim install mmengine
mim install mmcv
mim install mmdet
mim install mmdet3d

Install CLIP:

pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git

📊 Dataset

I. Download 3D-FUTURE-SDF. This is processed by ourselves on the 3D-FUTURE meshes using tools in SDFusion.

II. Follow this page for downloading SG-FRONT dataset and accessing more information.

III. Optional

Download the 3D-FRONT dataset from their official site.
Preprocess the dataset following ATISS.

IV. Create a folder named FRONT, and copy all files to it.

The structure should be similar like this:

FRONT
|--3D-FUTURE-SDF
|--All SG-FRONT files (.json and .txt)
|--3D-FRONT (optional)
|--3D-FRONT-texture (optional)
|--3D-FUTURE-model (optional)
|--3D-FUTURE-scene (optional)
|--3D-FRONT_preprocessed (optional, by ATISS)
|--threed_front.pkl (optional, by ATISS)

Models

Essential: Download pretrained VQ-VAE model from here to the folder checkpoint. Thanks Guangyao Zhai for provided the pretrained weights.

🛩 Training

To train the models, run:

bash scripts_sh/train_all_mask.sh

--exp: the path where trained models and logs would like to be stored.

--room_type: rooms to train, e.g., 'livingroom', 'diningroom', 'bedroom', and 'all'. We train all rooms together in the implementation.

--network_type: the network to be trained. We use mmgdreamer

--with_SDF: set to True.

--large : default is False, True means more concrete categories.

--with_CLIP: set to True. Encoding using CLIP.

--with_image: set to True. True means node contains image.

📈 Evaluation

To evaluate the models run:

bash scripts_sh/eval_all_mask.sh

--exp: where the models are stored. If one wants to load our provided models, the path should be aligned with

--gen_shape: set True to make shape branch work.

--room_type: rooms to evaluation, e.g., 'livingroom', 'diningroom', 'bedroom', and 'all'. Please use 'all' to evaluate the results.

--render_type: set mmgscene to make model work.

--with_image: set True if you want to use image in the node.

--mask_type: set three for I+R.

FID/KID

This metric aims to evaluate scene-level fidelity. To evaluate FID/KID, you need to collect ground truth top-down renderings by modifying and running collect_gt_sdf_images.py.

Make sure you download all the files and preprocess the 3D-FRONT. The renderings of generated scenes can be obtained via bash scripts_sh/eval_all_mask.sh.

After obtaining both ground truth images and generated scenes renderings, run：

bash scripts_sh/compute_fid_scores.sh

Attention: FID/KID does not include lamp when calculating, make sure gt image and generated scenes renderings do not include lamp.

MMD/COV/1-NN

This metric aims to evaluate object-level fidelity. To evaluate this, you need to first obtain ground truth object meshes from here (~5G). Thanks Guangyao Zhai for provided the dataset.

Secondly, store per generated object in the generated scenes, which can be done in bash scripts_sh/eval_all_mask.sh. After obtaining object meshes, modify the path in compute_mmd_cov_1nn.py, run:

bash scripts_sh/mmd_cov_1nn.sh

We use CD distance to calculate.

😁 Acknowledgements

Relevant work: CommonScenes, EchoScene.

Disclaimer: This is a code repository for reference only; in case of any discrepancies, the paper shall prevail.

We sincerely thank Echoscene's author Guangyao Zhai for providing the baseline code and helpful discussions.

📚 Citation

If you find our work useful in your research, please consider citing it: @misc{ }

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
assets		assets
config		config
dataset		dataset
extension/old_chamfer		extension/old_chamfer
helpers		helpers
model		model
scripts		scripts
scripts_sh		scripts_sh
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MMGDreamer: Mixed-Modality Graph for Geometry-Controllable 3D Indoor Scene Generation

📢 News

📋 TODO

🔧 Installation

📊 Dataset

Models

🛩 Training

📈 Evaluation

FID/KID

MMD/COV/1-NN

😁 Acknowledgements

📚 Citation

About

Releases

Packages

Languages

License

yangzhifeio/MMGDreamer

Folders and files

Latest commit

History

Repository files navigation

MMGDreamer: Mixed-Modality Graph for Geometry-Controllable 3D Indoor Scene Generation

📢 News

📋 TODO

🔧 Installation

📊 Dataset

Models

🛩 Training

📈 Evaluation

FID/KID

MMD/COV/1-NN

😁 Acknowledgements

📚 Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages