4 Theta Labs, Inc. 5 Technical University of Munich
AAAI 2025
✉Indicates Corresponding Author
MMGDreamer is a dual-branch diffusion model for scene generation that incorporates a novel Mixed-Modality Graph, visual enhancement module, and relation predictor.
Feel free to contact Zhifei Yang (email at [email protected]) or open an issue if you have any questions or suggestions.
- 2024-12-22: The source code are released. 🎉🎉
- 2024-12-10: MMGDreamer is accepted by AAAI 2025.👏👏
- Release the training and evaluation model code
- Release the pre-trained weights of VQ-VAE
- Release the training scripts
- Release the evaluation scripts
- Release the VEM and RP modules code
conda create -n mmgdreamer python=3.8
conda activate mmgdreamer
We have tested it on Ubuntu 20.04 with PyTorch 1.11.0, CUDA 11.3 and Pytorch3D.
pip install -r requirements.txt
pip install einops omegaconf tensorboardx open3d
(Note: if one encounters a problem with PyYAML, please refer to this link.)
Install mmcv-det3d (optional):
pip install openmim
mim install mmengine
mim install mmcv
mim install mmdet
mim install mmdet3d
Install CLIP:
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
I. Download 3D-FUTURE-SDF. This is processed by ourselves on the 3D-FUTURE meshes using tools in SDFusion.
II. Follow this page for downloading SG-FRONT dataset and accessing more information.
III. Optional
-
Download the 3D-FRONT dataset from their official site.
-
Preprocess the dataset following ATISS.
IV. Create a folder named FRONT
, and copy all files to it.
The structure should be similar like this:
FRONT
|--3D-FUTURE-SDF
|--All SG-FRONT files (.json and .txt)
|--3D-FRONT (optional)
|--3D-FRONT-texture (optional)
|--3D-FUTURE-model (optional)
|--3D-FUTURE-scene (optional)
|--3D-FRONT_preprocessed (optional, by ATISS)
|--threed_front.pkl (optional, by ATISS)
Essential: Download pretrained VQ-VAE model from here to the folder checkpoint
. Thanks Guangyao Zhai for provided the pretrained weights.
To train the models, run:
bash scripts_sh/train_all_mask.sh
--exp
: the path where trained models and logs would like to be stored.
--room_type
: rooms to train, e.g., 'livingroom', 'diningroom', 'bedroom', and 'all'. We train all rooms together in the implementation.
--network_type
: the network to be trained. We use mmgdreamer
--with_SDF
: set to True
.
--large
: default is False
, True
means more concrete categories.
--with_CLIP
: set to True
. Encoding using CLIP.
--with_image
: set to True
. True
means node contains image.
To evaluate the models run:
bash scripts_sh/eval_all_mask.sh
--exp
: where the models are stored. If one wants to load our provided models, the path should be aligned with
--gen_shape
: set True
to make shape branch work.
--room_type
: rooms to evaluation, e.g., 'livingroom', 'diningroom', 'bedroom', and 'all'. Please use 'all' to evaluate the results.
--render_type
: set mmgscene
to make model work.
--with_image
: set True
if you want to use image in the node.
--mask_type
: set three
for I+R.
This metric aims to evaluate scene-level fidelity. To evaluate FID/KID, you need to collect ground truth top-down renderings by modifying and running collect_gt_sdf_images.py
.
Make sure you download all the files and preprocess the 3D-FRONT. The renderings of generated scenes can be obtained via bash scripts_sh/eval_all_mask.sh
.
After obtaining both ground truth images and generated scenes renderings, run:
bash scripts_sh/compute_fid_scores.sh
Attention: FID/KID does not include lamp when calculating, make sure gt image and generated scenes renderings do not include lamp.
This metric aims to evaluate object-level fidelity. To evaluate this, you need to first obtain ground truth object meshes from here (~5G). Thanks Guangyao Zhai for provided the dataset.
Secondly, store per generated object in the generated scenes, which can be done in bash scripts_sh/eval_all_mask.sh
.
After obtaining object meshes, modify the path in compute_mmd_cov_1nn.py
, run:
bash scripts_sh/mmd_cov_1nn.sh
We use CD distance to calculate.
Relevant work: CommonScenes, EchoScene.
Disclaimer: This is a code repository for reference only; in case of any discrepancies, the paper shall prevail.
We sincerely thank Echoscene's author Guangyao Zhai for providing the baseline code and helpful discussions.
If you find our work useful in your research, please consider citing it: @misc{ }