Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MMDiT implementation and text-to-image training with rectified flows #155

Merged
merged 27 commits into from
Jul 26, 2024

Conversation

coryMosaicML
Copy link
Collaborator

This PR contains an implementation of the MMDiT model from the SD3 paper, along with a model class for using it to train text to image models. To support this, a generic model inference class is also included.

Major additions:
diffusion/inference/inference_model.py has Modelnference class for inference with arbitrary models from models.py
diffusion/models/models.py Includes a text_to_image_transformer model for SD3 style MMDiT
diffusion/models/t2i_transformer.py has the ComposerModel class for the MMDiT text to image model
diffusion/models/transformer.py has the layers/blocks for the MMDiT model
diffusion/train.py includes a new function to configure the optimizer for the new text to image model

Copy link
Contributor

@A-Jacobson A-Jacobson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few commends about structure. Basically, I'd like to see more of the transformer logic confined to the transformer with the ComposerModel getting as close to "create the 3 models and call them" as possible. Overall, it's awesome and I conditionally approve as it's non-breaking and also has successful test runs.

diffusion/models/t2i_transformer.py Show resolved Hide resolved
diffusion/models/t2i_transformer.py Show resolved Hide resolved
@coryMosaicML coryMosaicML merged commit ef74f2b into mosaicml:main Jul 26, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants