Skip to content

Latest commit

 

History

History
2119 lines (1041 loc) · 144 KB

Editing-in-Diffusion.md

File metadata and controls

2119 lines (1041 loc) · 144 KB

Image Editing In Diffusion Open in Notion

Editing

*[ICLR2022; Stanford & CMU] SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations [PDF, Page]

*[arxiv 22.08; meta] Prompt-to-Prompt Image Editing with Cross Attention Control [PDF ]

[arxiv 22.08; Scale AI] Direct Inversion: Optimization-Free Text-Driven Real Image Editing with Diffusion Models [PDF]

[arxiv 22.11; UC Berkeley] InstructPix2Pix: Learning to Follow Image Editing Instructions [PDF, Page]

[arxiv 2022; Nvidia ] eDiffi: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers [PDF, Code]

[arxiv 2022; Goolge ] Imagic: Text-Based Real Image Editing with Diffusion Models [PDF, Code]

[arxiv 2022] DiffEdit: Diffusion-based semantic image editing with mask guidance [Paper]

[arxiv 2022] DiffIT: Diffusion-based Image Translation Using Disentangled Style and Content Repesentation [Paper]

[arxiv 2022] Dual Diffusion Implicit Bridges for Image-to-image Translation [Paper]
*[ICLR 23, Google] Classifier-free Diffusion Guidance [Paper]

[arxiv 2022] EDICT: Exact Diffusion Inversion via Coupled Transformations [PDF]

[arxiv 22.11] Paint by Example: Exemplar-based Image Editing with Diffusion Models [PDF]

[arxiv 2022.10; ByteDance]MagicMix: Semantic Mixing with Diffusion Models [PDF]

[arxiv 2022.12; Microsoft]X-Paste: Revisit Copy-Paste at Scale with CLIP and StableDiffusion[PDF]

[arxi 2022.12]SINE: SINgle Image Editing with Text-to-Image Diffusion Models [PDF]

[arxiv 2022.12]Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models[PDF]

[arxiv 2022.12]Optimizing Prompts for Text-to-Image Generation [PDF]

[arxiv 2023.01]Guiding Text-to-Image Diffusion Model Towards Grounded Generation [PDF, Page]

[arxiv 2023.02, Adobe]Controlled and Conditional Text to Image Generation with Diffusion Prior [PDF]

[arxiv 2023.02]Learning Input-agnostic Manipulation Directions in StyleGAN with Text Guidance [PDF]

[arxiv 2023.02]Towards Enhanced Controllability of Diffusion Models[PDF]

[arxiv 2023.03]X&Fuse: Fusing Visual Information in Text-to-Image Generation [PDF]

[arxiv 2023.03]Lformer: Text-to-Image Generation with L-shape Block Parallel Decoding [PDF]

[arxiv 2023.03]CoralStyleCLIP: Co-optimized Region and Layer Selection for Image Editing [PDF]

[arxiv 2023.03]Erasing Concepts from Diffusion Models [PDF, Code]

[arxiv 2023.03]Editing Implicit Assumptions in Text-to-Image Diffusion Models [PDF, Page]

[arxiv 2023.03]Localizing Object-level Shape Variations with Text-to-Image Diffusion Models [PDF, Page]

[arxiv 2023.03]SVDiff: Compact Parameter Space for Diffusion Fine-Tuning[PDF]

[arxiv 2023.03]Ablating Concepts in Text-to-Image Diffusion Models [PDF, Page]

[arxiv 2023.03]ReVersion : Diffusion-Based Relation Inversion from Images[PDF, Page]

[arxiv 2023.03]MagicFusion: Boosting Text-to-Image Generation Performance by Fusing Diffusion Models [PDF]

[arxiv 2023.04]One-shot Unsupervised Domain Adaptation with Personalized Diffusion Models [PDF]

[arxiv 2023.04]3D-aware Image Generation using 2D Diffusion Models [PDF]

[arxiv 2023.04]Inst-Inpaint: Instructing to Remove Objects with Diffusion Models[PDF]

[arxiv 2023.04]Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synthesis [PDF]

->[arxiv 2023.04]Expressive Text-to-Image Generation with Rich Text [PDF, Page]

[arxiv 2023.04]DiffusionRig: Learning Personalized Priors for Facial Appearance Editing [PDF]

[arxiv 2023.04]An Edit Friendly DDPM Noise Space: Inversion and Manipulations [PDF]

[arxiv 2023.04]Gradient-Free Textual Inversion [PDF]

[arxiv 2023.04]Improving Diffusion Models for Scene Text Editing with Dual Encoders [PDF]

[arxiv 2023.04]Delta Denoising Score [PDF, Page]

[arxiv 2023.04]MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing [PDF, Page]

[arxiv 2023.04]Edit Everything: A Text-Guided Generative System for Images Editing [PDF]

[arxiv 2023.05]In-Context Learning Unlocked for Diffusion Models [PDF]

[arxiv 2023.05]ReGeneration Learning of Diffusion Models with Rich Prompts for Zero-Shot Image Translation [PDF]

[arxiv 2023.05]RAPHAEL: Text-to-Image Generation via Large Mixture of Diffusion Paths [PDF]

[arxiv 2023.05]Controllable Text-to-Image Generation with GPT-4 [PDF]

[arxiv 2023.06]Diffusion Self-Guidance for Controllable Image Generation [PDF, Page]

[arxiv 2023.06]SyncDiffusion: Coherent Montage via Synchronized Joint Diffusions [PDF, Page]

[arxiv 2023.06]MagicBrush: A Manually Annotated Dataset for Instruction-Guided Image Editing[PDF, Page]

[arxiv 2023.06 ]Diffusion in Diffusion: Cyclic One-Way Diffusion for Text-Vision-Conditioned Generation [PDf]

->[arxiv 2023.06]Paste, Inpaint and Harmonize via Denoising: Subject-Driven Image Editing with Pre-Trained Diffusion Model [PDF]

[arxiv 2023.06]Controlling Text-to-Image Diffusion by Orthogonal Finetuning [PDF]

[arxiv 2023.06]Localized Text-to-Image Generation for Free via Cross Attention Control[PDF]

[arxiv 2023.06]Filtered-Guided Diffusion: Fast Filter Guidance for Black-Box Diffusion Models [PDF]

[arxiv 2023.06]PFB-Diff: Progressive Feature Blending Diffusion for Text-driven Image Editing [PDF]

[arxiv 2023.06]DragDiffusion: Harnessing Diffusion Models for Interactive Point-based Image Editing[PDF]

[arxiv 2023.06]Diffusion Self-Guidance for Controllable Image Generation [PDF]

[arxiv 2023.07]Counting Guidance for High Fidelity Text-to-Image Synthesis [PDF]

[arxiv 2023.07]LEDITS: Real Image Editing with DDPM Inversion and Semantic Guidance [PDF]

[arxiv 2023.07]DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models [PDF]

[arxiv 2023.07]Taming Encoder for Zero Fine-tuning Image Customization with Text-to-Image Diffusion Models [PDF]

[arxiv 2023.07]Not All Steps are Created Equal: Selective Diffusion Distillation for Image Manipulation [PDF]

[arxiv 2023.07]FABRIC: Personalizing Diffusion Models with Iterative Feedback [PDF]

[arxiv 2023.07]Understanding the Latent Space of Diffusion Models through the Lens of Riemannian Geometry [PDF]

[arxiv 2023.07]Interpolating between Images with Diffusion Models [PDF]

[arxiv 2023.07]TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition [PDF]

[arxiv 2023.08]ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation [PDF]

[arxiv 2023.09]Iterative Multi-granular Image Editing using Diffusion Models [PDF]

[arxiv 2023.09]InstructDiffusion: A Generalist Modeling Interface for Vision Tasks [PDF]

[arxiv 2023.09]InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation [PDF,Page]

[arxiv 2023.09]ITI-GEN: Inclusive Text-to-Image Generation [PDF, Page]

[arxiv 2023.09]MaskDiffusion: Boosting Text-to-Image Consistency with Conditional Mask [PDF]

[arxiv 2023.09]FreeU : Free Lunch in Diffusion U-Net [PDF,Page]

[arxiv 2023.09]Dream the Impossible: Outlier Imagination with Diffusion Models [PDF]

[arxiv 2023.09]Dynamic Prompt Learning: Addressing Cross-Attention Leakage for Text-Based Image Editing [PDF, Page]

[arxiv 2023.09]RealFill: Reference-Driven Generation for Authentic Image Completion [PDF, Page]

[arxiv 2023.10]Aligning Text-to-Image Diffusion Models with Reward Backpropagation [PDF,Page]

[arxiv 2023.10]InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists [PDF]

[arxiv 2023.10]Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis [PDF]

[arxiv 2023.10]Guiding Instruction-based Image Editing via Multimodal Large Language Models [PDF,Page]

[arxiv 2023.10]Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion [PDF]

[arxiv 2023.10]JOINTNET: EXTENDING TEXT-TO-IMAGE DIFFUSION FOR DENSE DISTRIBUTION MODELING [PDF]

[arxiv 2023.10]Uni-paint: A Unified Framework for Multimodal Image Inpainting with Pretrained Diffusion Model [PDF]

[arxiv 2023.10]Unsupervised Discovery of Interpretable Directions in h-space of Pre-trained Diffusion Models [PDF]

[arxiv 2023.10]Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation [PDf,Page]

[arxiv 2023.10]SingleInsert: Inserting New Concepts from a Single Image into Text-to-Image Models for Flexible Editing [PDF,Page]

[arxiv 2023.10]CycleNet: Rethinking Cycle Consistency in Text-Guided Diffusion for Image Manipulation [PDF ]

[arxiv 2023.10]CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models[PDF,Page]

[arxiv 2023.11]LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing [PDF,Page]

[arxiv 2023.11]The Blessing of Randomness: SDE Beats ODE in General Diffusion-based Image Editing[PDF]

[arxiv 2023.11]FaceComposer: A Unified Model for Versatile Facial Content Creation [PDF]

[arxiv 2023.11]Emu Edit: Precise Image Editing via Recognition and Generation Tasks[PDF]

[arxiv 2023.11]Fine-grained Appearance Transfer with Diffusion Models [PDF, Page]

[arxiv 2023.11]Text-Driven Image Editing via Learnable Regions [PDF, Page]

[arxiv 2023.12]Contrastive Denoising Score for Text-guided Latent Diffusion Image Editing [PDF,Page]

[arxiv 2023.12]Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models [PDF,Page]

[arxiv 2023.12]ControlNet-XS: Designing an Efficient and Effective Architecture for Controlling Text-to-Image Diffusion Models [PDF]

[arxiv 2023.12]Emu Edit: Precise Image Editing via Recognition and Generation Tasks [PDF,Page]

[arxiv 2023.12]DiffMorpher: Unleashing the Capability of Diffusion Models for Image Morphing [PDF]

[arxiv 2023.12]AdapEdit: Spatio-Temporal Guided Adaptive Editing Algorithm for Text-Based Continuity-Sensitive Image Editing [PDF]

[arxiv 2023.12]LIME: Localized Image Editing via Attention Regularization in Diffusion Models [PDF]

[arxiv 2023.12]Diffusion Cocktail: Fused Generation from Diffusion Models [PDF]

[arxiv 2023.12]Prompting Hard or Hardly Prompting: Prompt Inversion for Text-to-Image Diffusion Models [PDF]

[arxiv 2023.12]Fixed-point Inversion for Text-to-image diffusion models [PDF]

[arxiv 2023.12]StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation [PDF]

[arxiv 2023.12]MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance [PDF,Page]

[arxiv 2023.12]Tuning-Free Inversion-Enhanced Control for Consistent Image Editing [PDF]

[arxiv 2023.12]High-Fidelity Diffusion-based Image Editing [PDF]

[arxiv 2023.12]ZONE: Zero-Shot Instruction-Guided Local Editing [PDF]

[arxiv 2024.1]PIXART-δ: Fast and Controllable Image Generation with Latent Consistency Models [PDF]

[arxiv 2024.1]Wavelet-Guided Acceleration of Text Inversion in Diffusion-Based Image Editing [PDF]

[arxiv 2024.1]Edit One for All: Interactive Batch Image Editing [PDF,Page]

[arxiv 2024.01]UNIMO-G: Unified Image Generation through Multimodal Conditional Diffusion [PDF]

[arxiv 2024.01]Text Image Inpainting via Global Structure-Guided Diffusion Models [PDF]

[arxiv 2024.01]Motion Guidance: Diffusion-Based Image Editing with Differentiable Motion Estimators [PDF]

[arxiv 2024.02]Latent Inversion with Timestep-aware Sampling for Training-free Non-rigid Editing [PDf]

[arxiv 2024.02]DiLightNet: Fine-grained Lighting Control for Diffusion-based Image Generation[PDF]

[arxiv 2024.02]CustomSketching: Sketch Concept Extraction for Sketch-based Image Synthesis and Editing [PDF]

[arxiv 2024.03]Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks [PDF]

[arxiv 2024.03]LoMOE: Localized Multi-Object Editing via Multi-Diffusion [PDF]

[arxiv 2024.03]Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing [PDF]

[arxiv 2024.03]StableDrag: Stable Dragging for Point-based Image Editing[PDF]

[arxiv 2024.03]InstructGIE: Towards Generalizable Image Editing [PDF]

[arxiv 2024.03]An Item is Worth a Prompt: Versatile Image Editing with Disentangled Control [PDF]

[arxiv 2024.03]Holo-Relighting: Controllable Volumetric Portrait Relighting from a Single Image [PDF(https://arxiv.org/abs/2403.09632)]

[arxiv 2024.03]Editing Massive Concepts in Text-to-Image Diffusion Models [PDF,Page]

[arxiv 2024.03]Ground-A-Score: Scaling Up the Score Distillation for Multi-Attribute Editing [PDF]

[arxiv 2024.03]Magic Fixup: Streamlining Photo Editing by Watching Dynamic Videos [PDF,Page]

[arxiv 2024.03]LASPA: Latent Spatial Alignment for Fast Training-free Single Image Editing [PDF]

[arxiv 2024.03]ReNoise: Real Image Inversion Through Iterative Noising[PDF,Page]

[arxiv 2024.03]AID: Attention Interpolation of Text-to-Image Diffusion [PDF,Page]

[arxiv 2024.03]InstructBrush: Learning Attention-based Instruction Optimization for Image Editing [PDF]

[arxiv 2024.03]TextCraftor: Your Text Encoder Can be Image Quality Controller [PDF]

[arxiv 2024.04]Drag Your Noise: Interactive Point-based Editing via Diffusion Semantic Propagation [PDF,Page]

[arxiv 2024.04]Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models [PDF]

[arxiv 2024.04]SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing [PDF]

[arxiv 2024.04]Responsible Visual Editing [PDF]

[arxiv 2024.04]ByteEdit: Boost, Comply and Accelerate Generative Image Editing [PDF]

[arxiv 2024.04]ShoeModel: Learning to Wear on the User-specified Shoes via Diffusion Model [PDF]

[arxiv 2024.040]GoodDrag: Towards Good Practices for Drag Editing with Diffusion Models[PDF,Page]

[arxiv 2024.04]HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing [PDF]

[arxiv 2024.04]MaxFusion: Plug&Play Multi-Modal Generation in Text-to-Image Diffusion Models [PDF]

[arxiv 2024.04]Magic Clothing: Controllable Garment-Driven Image Synthesis [PDF]

[arxiv 2024.04]Factorized Diffusion: Perceptual Illusions by Noise Decomposition [PDF,Page]

[arxiv 2024.04]TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing [PDF]

[arxiv 2024.04]Lazy Diffusion Transformer for Interactive Image Editing [PDF]

[arxiv 2024.04]FreeDiff: Progressive Frequency Truncation for Image Editing with Diffusion Models [PDF]

[arxiv 2024.04]GeoDiffuser: Geometry-Based Image Editing with Diffusion Models [PDF]

[arxiv 2024.04]LocInv: Localization-aware Inversion for Text-Guided Image Editing [PDF]

[arxiv 2024.05]SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models[PDF]

[arxiv 2024.05]MMTryon: Multi-Modal Multi-Reference Control for High-Quality Fashion Generation [[PDF(https://arxiv.org/abs/2405.00448)]

[arxiv 2024.05]Streamlining Image Editing with Layered Diffusion Brushes [PDF]

[arxiv 2024.05]SOEDiff: Efficient Distillation for Small Object Editing [PDF]

[arxiv 2024.05]Analogist: Out-of-the-box Visual In-Context Learning with Image Diffusion Model [PDF,Page]

[arxiv 2024.05]Face Adapter for Pre-Trained Diffusion Models with Fine-Grained ID and Attribute Control [PDF,Page]

[arxiv 2024.05] EmoEdit: Evoking Emotions through Image Manipulation [PDF]

[arxiv 2024.05] ReasonPix2Pix: Instruction Reasoning Dataset for Advanced Image Editing [PDF]

[arxiv 2024.05] EditWorld: Simulating World Dynamics for Instruction-Following Image Editing [PDF,Page]

[arxiv 2024.05]InstaDrag: Lightning Fast and Accurate Drag-based Image Editing Emerging from Videos [PDF,Page]

[arxiv 2024.05] FastDrag: Manipulate Anything in One Step [PDF]

[arxiv 2024.05] Enhancing Text-to-Image Editing via Hybrid Mask-Informed Fusion [PDF]

[arxiv 2024.06] DiffUHaul: A Training-Free Method for Object Dragging in Images [PDF,Page]

[arxiv 2024.06] MultiEdits: Simultaneous Multi-Aspect Editing with Text-to-Image Diffusion Models [PDF,Page]

[arxiv 2024.06] Dreamguider: Improved Training free Diffusion-based Conditional Generation [PDF,Page]

[arxiv 2024.06]Zero-shot Image Editing with Reference Imitation [PDF,Page]

[arxiv 2024.07] Image Inpainting Models are Effective Tools for Instruction-guided Image Editing[PDF]

[arxiv 2024.07]Text2Place: Affordance-aware Text Guided Human Placement [PDF,Page]

[arxiv 2024.07] RegionDrag: Fast Region-Based Image Editing with Diffusion Models[PDF,Page]

[arxiv 2024.07] FlexiEdit: Frequency-Aware Latent Refinement for Enhanced Non-Rigid Editing [PDF]

[arxiv 2024.07] DragText: Rethinking Text Embedding in Point-based Image Editing [PDF,Page]

[arxiv 2024.08] MagicFace: Training-free Universal-Style Human Image Customized Synthesis [PDF,Page]

[arxiv 2024.08] TurboEdit: Instant text-based image editing[PDF,Page]

[arxiv 2024.08] FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing [PDF,Page]

[arxiv 2024.08] CODE: Confident Ordinary Differential Editing [PDF,Page]

[arxiv 2024.08] Prompt-Softbox-Prompt: A free-text Embedding Control for Image Editing [PDF]

[arxiv 2024.08] Task-Oriented Diffusion Inversion for High-Fidelity Text-based Editing= [PDF]

[arxiv 2024.08] DiffAge3D: Diffusion-based 3D-aware Face Aging [PDF,]

[arxiv 2024.09] Guide-and-Rescale: Self-Guidance Mechanism for Effective Tuning-Free Real Image Editing [PDF,Page]

[arxiv 2024.09] InstantDrag: Improving Interactivity in Drag-based Image Editing [PDF,Page]

[arxiv 2024.09] SimInversion: A Simple Framework for Inversion-Based Text-to-Image Editing [PDF]

[arxiv 2024.09]FreeEdit: Mask-free Reference-based Image Editing with Multi-modal Instruction [PDF,Page]

[arxiv 2024.09] GroupDiff: Diffusion-based Group Portrait Editing [PDF]

[arxiv 2024.10] Combing Text-based and Drag-based Editing for Precise and Flexible Image Editing [PDF]

[arxiv 2024.10] PostEdit: Posterior Sampling for Efficient Zero-Shot Image Editing [PDF,Page]

[arxiv 2024.10] Context-Aware Full Body Anonymization using Text-to-Image Diffusion Models [PDF]

[arxiv 2024.10] BELM: Bidirectional Explicit Linear Multi-step Sampler for Exact Inversion in Diffusion Models [PDF

[arxiv 2024.10] Vision-guided and Mask-enhanced Adaptive Denoising for Prompt-based Image Editing [PDF]

[arxiv 2024.10]MagicEraser: Erasing Any Objects via Semantics-Aware Control[PDF]

[arxiv 2024.10] SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing [PDF,Page]

[arxiv 2024.10] AdaptiveDrag: Semantic-Driven Dragging on Diffusion-Based Image Editing [PDF,Page]

[arxiv 2024.10] MambaPainter: Neural Stroke-Based Rendering in a Single Step[PDF]

[arxiv 2024.10] ERDDCI: Exact Reversible Diffusion via Dual-Chain Inversion for High-Quality Image Editing [PDF]

[arxiv 2024.10] Schedule Your Edit: A Simple yet Effective Diffusion Noise Schedule for Image Editing [PDF]

[arxiv 2024.10] [PDF,Page]

Unified Generation

[arxiv 2024.10] A Simple Approach to Unifying Diffusion-based Conditional Generation [PDF,Page]

[arxiv 2024.10] [PDF,Page]

Architecture

[arxiv 2024.03]Switch Diffusion Transformer: Synergizing Denoising Tasks with Sparse Mixture-of-Experts [PDF,Page]

[arxiv 2024.05] TerDiT: Ternary Diffusion Models with Transformers [PDF,Page]

[arxiv 2024.05] DiG: Scalable and Efficient Diffusion Models with Gated Linear Attention [PDF,Page]

[arxiv 2024.05] ViG: Linear-complexity Visual Sequence Learning with Gated Linear Attention [PDF,Page]

[arxiv 2024.06] Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models[PDF,Page]

[arxiv 2024.07] UltraEdit: Instruction-based Fine-Grained Image Editing at Scale [PDF,Page]

[arxiv 2024.07] Add-SD: Rational Generation without Manual Reference [PDF,Page]

[arxiv 2024.07] Specify and Edit: Overcoming Ambiguity in Text-Based Image Editing [PDF,Page]

[arxiv 2024.08] FastEdit: Fast Text-Guided Single-Image Editing via Semantic-Aware Diffusion Fine-Tuning [PDF,Page]

[arxiv 2024.08] EasyInv: Toward Fast and Better DDIM Inversion [PDF]

[arxiv 2024.08] AnyDesign: Versatile Area Fashion Editing via Mask-Free Diffusion [PDF]

[arxiv 2024.10] On Inductive Biases That Enable Generalization of Diffusion Transformers [PDF,Page]

[arxiv 2024.10] [PDF,Page]

Distribution

[arxiv 2024.10] Rectified Diffusion: Straightness Is Not Your Need [PDF,Page]

[arxiv 2024.10] Simple ReFlow: Improved Techniques for Fast Flow Models [PDF,Page]

[arxiv 2024.10] Consistency Diffusion Bridge Models [PDF]

[arxiv 2024.10] [PDF,Page]

LLMs for editing

[arxiv 2024.07] GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing [PDF,Page]

[arxiv 2024.07] UltraEdit: Instruction-based Fine-Grained Image Editing at Scale [PDF,Page]

[arxiv 2024.10] [PDF,Page]

Improve T2I base modules

[arxiv 2023]LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts [PDF,Page]

[arxiv 2023.11]Self-correcting LLM-controlled Diffusion Models [PDF]

[arxiv 2023.11]Enhancing Diffusion Models with Text-Encoder Reinforcement Learning [PDF]

[arxiv 2023.11]Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following [PDF]

[arxiv 2023.12]Unlocking Spatial Comprehension in Text-to-Image Diffusion Models [PDF]

[arxiv 2023.12]Fair Text-to-Image Diffusion via Fair Mapping [PDF]

[arxiv 2023.12]CONFORM: Contrast is All You Need For High-Fidelity Text-to-Image Diffusion Models[PDF]

[arxiv 2023.120]DreamDistribution: Prompt Distribution Learning for Text-to-Image Diffusion Models[ PDF,Page]

[arxiv 2023.12]Prompt Expansion for Adaptive Text-to-Image Generation [PDF]

[arxiv 2023.12]Diffusion Model with Perceptual Loss [PDF]

[arxiv 2024.01]EmoGen: Emotional Image Content Generation with Text-to-Image Diffusion Models [PDF]

[arxiv 2024.01]DiffusionGPT: LLM-Driven Text-to-Image Generation System [PDF]

[arxiv 2024.01]Divide and Conquer: Language Models can Plan and Self-Correct for Compositional Text-to-Image Generation[PDF]

[arxiv 2024.02]MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis [PDF,Page]

[arxiv 2024.02]Get What You Want, Not What You Don't: Image Content Suppression for Text-to-Image Diffusion Models [PDF,Page]

[arxiv 2024.02]InstanceDiffusion: Instance-level Control for Image Generation [PDF,Page]

[arxiv 2024.02]Learning Continuous 3D Words for Text-to-Image Generation[PDF,Page]

[arxiv 2024.02]Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation[PDF]

[arxiv 2024.02]RealCompo: Dynamic Equilibrium between Realism and Compositionality Improves Text-to-Image Diffusion Models [PDF,Page]

[arxiv 2024.02]A User-Friendly Framework for Generating Model-Preferred Prompts in Text-to-Image Synthesis [PDF]

[arxiv 2024.02]Contrastive Prompts Improve Disentanglement in Text-to-Image Diffusion Models [PDF]

[arxiv 2024.02]Structure-Guided Adversarial Training of Diffusion Models[PDF]

[arxiv 2024.03]SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data [PDF,Page]

[arxiv 2024.03]ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment [PDF,Page]

[arxiv 2024.03]Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation [PDF,Page]

[arxiv 2024.03]Optimizing Negative Prompts for Enhanced Aesthetics and Fidelity in Text-To-Image Generation [PDF]

[arxiv 2024.03]FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis [PDF]

[arxiv 2024.04]Getting it Right: Improving Spatial Consistency in Text-to-Image Models [PDF,Page]

[arxiv 2024.04]Dynamic Prompt Optimizing for Text-to-Image Generation [PDF]

[arxiv 2024.04]Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching [PDF,Page]

[arxiv 2024.04]Align Your Steps: Optimizing Sampling Schedules in Diffusion Models [PDF,Page]

[arxiv 2024.04]Stylus: Automatic Adapter Selection for Diffusion Models [PDF,Page]

[arxiv 2024.05]Deep Reward Supervisions for Tuning Text-to-Image Diffusion Models [PDF]

[arxiv 2024.05]Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model [PDF]

[arxiv 2024.05]Attention-Driven Training-Free Efficiency Enhancement of Diffusion Models [PDF]

[arxiv 2024.05]An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation [PDF]

[arxiv 2024.05] Learning Multi-dimensional Human Preference for Text-to-Image Generation [PDF]

[arxiv 2024.05] Class-Conditional self-reward mechanism for improved Text-to-Image models [PDF]

[arxiv 2024.05] LiteVAE: Lightweight and Efficient Variational Autoencoders for Latent Diffusion Models [PDF]

[arxiv 2024.05] SG-Adapter: Enhancing Text-to-Image Generation with Scene Graph Guidance [PDF]

[arxiv 2024.05] Training-free Editioning of Text-to-Image Models [PDF]

[arxiv 2024.05] PromptFix: You Prompt and We Fix the Photo [PDF,Page]

[arxiv 2024.06] Kaleido Diffusion: Improving Conditional Diffusion Models with Autoregressive Latent Modeling [PDF]

[arxiv 2024.06]Improving GFlowNets for Text-to-Image Diffusion Alignment [PDF]

[arxiv 2024.06] Diffusion Soup: Model Merging for Text-to-Image Diffusion Models [PDF,Page]

[arxiv 2024.06] CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models[PDF,Page]

[arxiv 2024.06]Understanding and Mitigating Compositional Issues in Text-to-Image Generative Models [PDF,Page]

[arxiv 2024.06] Make It Count: Text-to-Image Generation with an Accurate Number of Objects [PDF,Page]

[arxiv 2024.06] AITTI: Learning Adaptive Inclusive Token for Text-to-Image Generation [PDF,Page]

[arxiv 2024.06] Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models [PDF]

[arxiv 2024.06] Neural Residual Diffusion Models for Deep Scalable Vision Generation [PDF]

[arxiv 2024.06] ARTIST: Improving the Generation of Text-rich Images by Disentanglement [PDF,Page]

[arxiv 2024.06] Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models [PDF]

[arxiv 2024.06]Fine-tuning Diffusion Models for Enhancing Face Quality in Text-to-image Generation[PDF]

[arxiv 2024.07]PopAlign: Population-Level Alignment for Fair Text-to-Image Generation [PDF]

[arxiv 2024.07] LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation [PDF,Page]

[arxiv 2024.07] Prompt Refinement with Image Pivot for Text-to-Image Generation [PDF,Page]

[arxiv 2024.07] Improved Noise Schedule for Diffusion Training [PDF]

[arxiv 2024.07] No Training, No Problem: Rethinking Classifier-Free Guidance for Diffusion Models [PDF]

[arxiv 2024.07] Not All Noises Are Created Equally:Diffusion Noise Selection and Optimization [PDF]

[arxiv 2024.07] GeoGuide: Geometric guidance of diffusion models [PDF]

[arxiv 2024.08] Understanding the Local Geometry of Generative Model Manifolds [PDF]

[arxiv 2024.08] Iterative Object Count Optimization for Text-to-image Diffusion Models[PDF]

[arxiv 2024.08]FRAP: Faithful and Realistic Text-to-Image Generation with Adaptive Prompt Weighting[PDF]

[arxiv 2024.08] Compress Guidance in Conditional Diffusion Sampling [PDF]

[arxiv 2024.09] Elucidating Optimal Reward-Diversity Tradeoffs in Text-to-Image Diffusion Models [PDF]

[arxiv 2024.09] Generalizing Alignment Paradigm of Text-to-Image Generation with Preferences through f-divergence Minimization [PDF]

[arxiv 2024.09] Pixel-Space Post-Training of Latent Diffusion Models [PDF,Page]

[arxiv 2024.09] Improvements to SDXL in NovelAI Diffusion V3 [PDF]

[arxiv 2024.10] Removing Distributional Discrepancies in Captions Improves Image-Text Alignment [PDF,Page]

[arxiv 2024.10] ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation [PDF,Page]

[arxiv 2024.10] Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think [PDF]

[arxiv 2024.10] Decouple-Then-Merge: Towards Better Training for Diffusion Models [PDF]

[arxiv 2024.10] Sparse Repellency for Shielded Generation in Text-to-image Diffusion Models [PDF,Page]

[arxiv 2024.10] Training-free Diffusion Model Alignment with Sampling Demons [PDF]

[arxiv 2024.10] Diffusion Models Need Visual Priors for Image Generation [PDF]

[arxiv 2024.10] Improving Long-Text Alignment for Text-to-Image Diffusion Models [PDF,Page]

[arxiv 2024.10] Dynamic Negative Guidance of Diffusion Models[PDF]

[arxiv 2024.10] GraspDiffusion: Synthesizing Realistic Whole-body Hand-Object Interaction [PDF,Page]

[arxiv 2024.10] Progressive Compositionality In Text-to-Image Generative Models [PDF,Page]

[arxiv 2024.10] [PDF,Page]

VAE

[arxiv 2024.06] Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99% [PDF)]

[arxiv 2024.10] [PDF,Page]

autoregressive

[arxiv 2024.10] ControlAR: Controllable Image Generation with Autoregressive Models [PDF,Page]

[arxiv 2024.10] LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding [PDF]

[arxiv 2024.10] CAR: Controllable Autoregressive Modeling for Visual Generation [PDF,Page]

[arxiv 2024.10] Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective [PDF,Page]

[arxiv 2024.10] LARP: Tokenizing Videos with a Learned Autoregressive Generative Prior [PDF,Page]

[arxiv 2024.10] [PDF,Page]

Distill Diffusion Model

[arxiv 2024.05]Distilling Diffusion Models into Conditional GANs [PDF,Page]

[arxiv 2024.06] Plug-and-Play Diffusion Distillation [PDF]

[arxiv 2024.10] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models [PDF]

[arxiv 2024.10] DDIL: Improved Diffusion Distillation With Imitation Learning[PDF]

Try-on

[arxiv 2024.03]Time-Efficient and Identity-Consistent Virtual Try-On Using A Variant of Altered Diffusion Models [PDF]

[arxiv 2024.03]Wear-Any-Way: Manipulable Virtual Try-on via Sparse Correspondence Alignment [PDF,Page]

[arxiv 2024.04]Texture-Preserving Diffusion Models for High-Fidelity Virtual Try-On [PDF]

[arxiv 2024.04]TryOn-Adapter: Efficient Fine-Grained Clothing Identity Adaptation for High-Fidelity Virtual Try-On [PDF,Page]

[arxiv 2024.04]FLDM-VTON: Faithful Latent Diffusion Model for Virtual Try-on [PDF]

[arxiv 2024.03]Improving Diffusion Models for Authentic Virtual Try-on in the Wild [PDF,Page]

[arxiv 2024.04]MV-VTON: Multi-View Virtual Try-On with Diffusion Models [PDF]

[arxiv 2024.05]AnyFit: Controllable Virtual Try-on for Any Combination of Attire Across Any Scenario [PDF,Page]

[arxiv 2024.06] GraVITON: Graph based garment warping with attention guided inversion for Virtual-tryon [PDF]

[arxiv 2024.06]M&M VTO: Multi-Garment Virtual Try-On and Editing[PDF,Page]

[arxiv 2024.06]Self-Supervised Vision Transformer for Enhanced Virtual Clothes Try-On [PDF]

[arxiv 2024.06] MaX4Zero: Masked Extended Attention for Zero-Shot Virtual Try-On In The Wild [PDF,Page]

[arxiv 2024.07] D4-VTON: Dynamic Semantics Disentangling for Differential Diffusion based Virtual Try-On [PDF,Page]

[arxiv 2024.07] DreamVTON: Customizing 3D Virtual Try-on with Personalized Diffusion Models [PDF]

[arxiv 2024.07]OutfitAnyone: Ultra-high Quality Virtual Try-On for Any Clothing and Any Person[PDF,Page]

[arxiv 2024.07] CatVTON: Concatenation Is All You Need for Virtual Try-On with Diffusion Models [PDF,Page]

[arxiv 2024.08] BooW-VTON: Boosting In-the-Wild Virtual Try-On via Mask-Free Pseudo Data Training [PDF,Page]

[arxiv 2024.09] Improving Virtual Try-On with Garment-focused Diffusion Models [PDF,Page]

[arxiv 2024.09] AnyLogo: Symbiotic Subject-Driven Diffusion System with Gemini Status [PDF]

[arxiv 2024.10] GS-VTON: Controllable 3D Virtual Try-on with Gaussian Splatting[PDF,Page]

[arxiv 2024.10] [PDF,Page]

Model adapatation/Merge

[arxiv 2023.12]X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model [PDF,Page]

[arxiv 2024.10] Model merging with SVD to tie the Knots [PDF,Page]

[arxiv 2024.10] [PDF,Page]

Text

[arxiv 2023.12]UDiffText: A Unified Framework for High-quality Text Synthesis in Arbitrary Images via Character-aware Diffusion Models [PDF]

[arxiv 2023.12]Brush Your Text: Synthesize Any Scene Text on Images via Diffusion Model [PDF]

[arxiv 2024.04]Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering [PDF,Page]

[arxiv 2024.05] CustomText: Customized Textual Image Generation using Diffusion Models [PDF]

[arxiv 2024.06] SceneTextGen: Layout-Agnostic Scene Text Image Synthesis with Diffusion Models [PDF]

[arxiv 2024.06] FontStudio: Shape-Adaptive Diffusion Model for Coherent and Consistent Font Effect Generation [PDF,Page]

[arxiv 2024.09] DiffusionPen: Towards Controlling the Style of Handwritten Text Generation [PDF,Page]

[arxiv 2024.10] TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control [PDF,Page]

[arxiv 2024.10] TextMaster: Universal Controllable Text Edit [PDF,Page]

[arxiv 2024.10] [PDF,Page]

Caption

[arxiv 2024.10] CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning [PDF]

[arxiv 2024.10] Altogether: Image Captioning via Re-aligning Alt-text [PDF,Page]

[arxiv 2024.10] [PDF,Page]

face swapping

[arxiv 2024.03]Infinite-ID: Identity-preserved Personalization via ID-semantics Decoupling Paradigm [PDF,Page]

[github] Reactor

Concept / personalization

*[Arxiv.2208; NVIDIA] An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion [PDF, Page, Code ]

[NIPS 22; google] DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation [PDF, Page, Code]

[arxiv 2022.12; UT] Multiresolution Textual Inversion [PDF]

*[arxiv 2022.12]Multi-Concept Customization of Text-to-Image Diffusion [PDF, Page, code]

[arxiv 2023.02]ELITE: Encoding Visual Concepts into Textual Embeddings for Customized Text-to-Image Generation [PDF]

[arxiv 2023.02, tel]Designing an Encoder for Fast Personalization of Text-to-Image Models [PDF, Page]

[arxiv 2023.03]Cones: Concept Neurons in Diffusion Models for Customized Generation [PDF]

[arxiv 2023.03]P+: Extended Textual Conditioning in Text-to-Image Generation [PDF]

[arxiv 2023.03]Highly Personalized Text Embedding for Image Manipulation by Stable Diffusion [PDF]

->[arxiv 2023.04]Continual Diffusion: Continual Customization of Text-to-Image Diffusion with C-LoRA[PDF, Page]

[arxiv 2023.04]Controllable Textual Inversion for Personalized Text-to-Image Generation [PDF]

*[arxiv 2023.04]InstantBooth: Personalized Text-to-Image Generation without Test-Time Finetuning [PDF]

[arxiv 2023.05]Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models [PDF,Page]

[arxiv 2023.05]Custom-Edit: Text-Guided Image Editing with Customized Diffusion Models [PDF]

[arxiv 2023.05]DisenBooth: Disentangled Parameter-Efficient Tuning for Subject-Driven Text-to-Image Generation [PDF]

[arxiv 2023.05]PHOTOSWAP:Personalized Subject Swapping in Images [PDF]

[Siggraph 2023.05]Key-Locked Rank One Editing for Text-to-Image Personalization [PDF, Page]

[arxiv 2023.05]A Neural Space-Time Representation for Text-to-Image Personalization [PDF,Page]

->[arxiv 2023.05]BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing [PDF, Page]

[arxiv 2023.05]Concept Decomposition for Visual Exploration and Inspiration[PDF,Page]

[arxiv 2023.05]FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention[PDF,Page]

[arxiv 2023.06]Cones 2: Customizable Image Synthesis with Multiple Subjects [PDF]

[arxiv 2023.06]Inserting Anybody in Diffusion Models via Celeb Basis [PDF, Page]

->[arxiv 2023.06]A-STAR: Test-time Attention Segregation and Retention for Text-to-image Synthesis [PDF]

[arxiv 2023.06]Generate Anything Anywhere in Any Scene [PDF,Page]

[arxiv 2023.07]HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Models [PDF,Page]

[arxiv 2023.07]Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models [PDF, Page]

[arxiv 2023.07]ReVersion: Diffusion-Based Relation Inversion from Images [PDF,Page]

[arxiv 2023.07]AnyDoor: Zero-shot Object-level Image Customization [PDF,Page]

[arxiv 2023.0-7]Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning [PDF, Page]

[arxiv 2023.08]ConceptLab: Creative Generation using Diffusion Prior Constraints [PDF,Page]

[arxiv 2023.08]Unified Concept Editing in Diffusion Models [[PDF]https://arxiv.org/pdf/2308.14761.pdf), Page]

[arxiv 2023.09]Create Your World: Lifelong Text-to-Image Diffusion[PDF]

[arxiv 2023.09]MagiCapture: High-Resolution Multi-Concept Portrait Customization [PDF]

[arxiv 2023.10]Multi-Concept T2I-Zero: Tweaking Only The Text Embeddings and Nothing Else [PDF]

[arxiv 2023.11]A Data Perspective on Enhanced Identity Preservation for Diffusion Personalization [PDF]

[arxiv 2023.11]The Chosen One: Consistent Characters in Text-to-Image Diffusion Models [PDF, Page]

[arxiv 2023.11]High-fidelity Person-centric Subject-to-Image Synthesis[PDF]

[arxiv 2023.11]An Image is Worth Multiple Words: Multi-attribute Inversion for Constrained Text-to-Image Synthesis [PDF]

[arxiv 2023.11]CatVersion: Concatenating Embeddings for Diffusion-Based Text-to-Image Personalization [PDF,Page]

[arxiv 2023.12]PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding [PDF,Page]

[arxiv 2023.12]Context Diffusion: In-Context Aware Image Generation [PDF]

[arxiv 2023.12]Customization Assistant for Text-to-image Generation [PDF]

[arxiv 2023.12]InstructBooth: Instruction-following Personalized Text-to-Image Generation [PDF]

[arxiv 2023.12]FaceStudio: Put Your Face Everywhere in Seconds [PDF,Page]

[arxiv 2023.12]Orthogonal Adaptation for Modular Customization of Diffusion Models [PDF,Page]

[arxiv 2023.12]Separate-and-Enhance: Compositional Finetuning for Text2Image Diffusion Models [PDF, Page]

[arxiv 2023.12]Compositional Inversion for Stable Diffusion Models [PDF,Page]

[arxiv 2023.12]SimAC: A Simple Anti-Customization Method against Text-to-Image Synthesis of Diffusion Models [PDF]

[arxiv 2023.12]InstantID : Zero-shot Identity-Preserving Generation in Seconds [PDF,Page]

[arxiv 2023.12]All but One: Surgical Concept Erasing with Model Preservation in Text-to-Image Diffusion Models[PDf]

[arxiv 2023.12]Cross Initialization for Personalized Text-to-Image Generation [PDF]

[arxiv 2023.12]PALP: Prompt Aligned Personalization of Text-to-Image Models[PDF, Page]

[arxiv 2024.02]Pick-and-Draw: Training-free Semantic Guidance for Text-to-Image Personalization [PDF]

[arxiv 2024.02]Separable Multi-Concept Erasure from Diffusion Models[PDF]

[arxiv 2024.02]λ-ECLIPSE: Multi-Concept Personalized Text-to-Image Diffusion Models by Leveraging CLIP Latent Space[PDF,Page]

[arxiv 2024.02]Training-Free Consistent Text-to-Image Generation [PDF,Page]

[arxiv 2024.02]Textual Localization: Decomposing Multi-concept Images for Subject-Driven Text-to-Image Generation [PDF,Page]

[arxiv 2024.02]DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization [PDF, Page]

[arxiv 2024.02]Direct Consistency Optimization for Compositional Text-to-Image Personalization [PDF,Page]

[arxiv 2024.02]ComFusion: Personalized Subject Generation in Multiple Specific Scenes From Single Image [PDF]

[arxiv 2024.02]Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition[PDF, Page]

[arxiv 2024.02]DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Model [PDF,Page]

[arxiv 2024.03]RealCustom: Narrowing Real Text Word for Real-Time Open-Domain Text-to-Image Customization [PDF,Page]

[arxiv 2024.03]Face2Diffusion for Fast and Editable Face Personalization [PDF,Page]

[arxiv 2024.03]FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation [PDF,Page]

[arxiv 2024.03]Block-wise LoRA: Revisiting Fine-grained LoRA for Effective Personalization and Stylization in Text-to-Image Generation [PDF]

[arxiv 2024.03]LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models [PDF,Page]

[arxiv 2024.03]OSTAF: A One-Shot Tuning Method for Improved Attribute-Focused T2I Personalization [PDF]

[arxiv 2024.03]OMG: Occlusion-friendly Personalized Multi-concept Generation in Diffusion Models [PDF, Page]

[arxiv 2024.03]IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models [PDF]

[arxiv 2024.03]Tuning-Free Image Customization with Image and Text Guidance [PDF]

[ariv 2024.03]Harmonizing Visual and Textual Embeddings for Zero-Shot Text-to-Image Customization [PDF,Page]

[arxiv 2024.03]FlashFace: Human Image Personalization with High-fidelity Identity Preservation [PDF,Page]

[arxiv 2024.03]Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation [PDF,Page]

[arxiv 2024.03]Isolated Diffusion: Optimizing Multi-Concept Text-to-Image Generation Training-Freely with Isolated Diffusion Guidance [PDF]

[arxiv 2024.03]Improving Text-to-Image Consistency via Automatic Prompt Optimization [PDF]

[arxiv 2024.03]Attention Calibration for Disentangled Text-to-Image Personalization [PDF,Page]

[arxiv 2024.04]CLoRA: A Contrastive Approach to Compose Multiple LoRA Models [PDF,Page]

[arxiv 2024.04]MuDI: Identity Decoupling for Multi-Subject Personalization of Text-to-Image Models [PDF,Page]

[arxiv 2024.04]Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models [PDF]

[arxiv 2024.04]LCM-Lookahead for Encoder-based Text-to-Image Personalization [PDF]

[arxiv 2024.04]MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation [PDF,Page]

[arxiv 2024.04]MC2: Multi-concept Guidance for Customized Multi-concept Generation [PDF]

[arxiv 2024.04]Strictly-ID-Preserved and Controllable Accessory Advertising Image Generation [PDF]

[arxiv 2024.04]OneActor: Consistent Character Generation via Cluster-Conditioned Guidance [PDF]

[arxiv 2024.04] MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation [PDF,Page]

[arxiv 2024.04]MultiBooth: Towards Generating All Your Concepts in an Image from Text[PDF,Page]

[arxiv 2024.04]Infusion: Preventing Customized Text-to-Image Diffusion from Overfitting [PDF]

[arxiv 2024.04]UVMap-ID: A Controllable and Personalized UV Map Generative Model [PDF]

[arxix 2024.04]ConsistentID: Portrait Generation with Multimodal Fine-Grained Identity Preserving [PDF,Page]

[arxiv 2024.04]PuLID: Pure and Lightning ID Customization via Contrastive Alignment [PDF, Page]

[arxiv 2024.04]CharacterFactory: Sampling Consistent Characters with GANs for Diffusion Models [PDF, Page]

[arxiv 2024.04]TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation [PDF,Page]

[arxiv 2024.05]Customizing Text-to-Image Models with a Single Image Pair[PDF,Page]

[arxiv 2024.05]InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation [PDF]

[arxiv 2024.05]MasterWeaver: Taming Editability and Identity for Personalized Text-to-Image Generation [PDF,Page]

[arxiv 2024.05]Training-free Subject-Enhanced Attention Guidance for Compositional Text-to-image Generation [PDF]

[arxiv 2024.05]Non-confusing Generation of Customized Concepts in Diffusion Models [PDF,Page]

[arxiv 2024.05]Personalized Residuals for Concept-Driven Text-to-Image Generation [PDF,Page]

[arxiv 2024.05] FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition [PDF,Page]

[arxiv 2024.05]AttenCraft: Attention-guided Disentanglement of Multiple Concepts for Text-to-Image Customization [PDF,Page]

[arxiv 2024.05]RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance [PDF,Page]

[arxiv 2024.06] HyperDreamBooth: HyperNetworks for Fast Personalization of Text-to-Image Model [PDF,Page]

[arxiv 2024.06]AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation [PDF,Page]

[arxiv 2024.06] Inv-Adapter: ID Customization Generation via Image Inversion and Lightweight Adapter[PDF]

[arxiv 2024.06] AttnDreamBooth: Towards Text-Aligned Personalized Text-to-Image Generation [PDF,Page]

[arxiv 2024.06]Tuning-Free Visual Customization via View Iterative Self-Attention Control[PDF]

[arxiv 2024.06]PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction[PDF]

[arxiv 2024.06]MS-Diffusion: Multi-subject Zero-shot Image Personalization with Layout Guidance [PDF, Page]

[arxiv 2024.06] Interpreting the Weight Space of Customized Diffusion Models[PDF, Page]

[arxiv 2024.06]DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation [PDF, Page]

[arxiv 2024.06]Character-Adapter: Prompt-Guided Region Control for High-Fidelity Character Customization[PDF]

[arxiv 2024.06]LIPE: Learning Personalized Identity Prior for Non-rigid Image Editing [PDF]

[arxiv 2024.06] AlignIT: Enhancing Prompt Alignment in Customization of Text-to-Image Models [PDF]

[arxiv 2024.07] JeDi: Joint-Image Diffusion Models for Finetuning-Free Personalized Text-to-Image Generation[PDF, Page]

[arxiv 2024.07]LogoSticker: Inserting Logos into Diffusion Models for Customized Generation [PDF, Page]

[arxiv 2024.07] MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence [PDF,Page]

[arxiv 2024.08]Concept Conductor: Orchestrating Multiple Personalized Concepts in Text-to-Image Synthesis [PDF, Page]

[arxiv 2024.08]PreciseControl: Enhancing Text-To-Image Diffusion Models with Fine-Grained Attribute Control [PDF, Page]

[arxiv 2024.08] DiffLoRA: Generating Personalized Low-Rank Adaptation Weights with Diffusion[PDF]

[arxiv 2024.08]RealCustom++: Representing Images as Real-Word for Real-Time Customization [PDF, Page]

[arxiv 2024.08] MagicID: Flexible ID Fidelity Generation System[PDF]

[arxiv 2024.08] CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization[PDF]

[arxiv 2024.09] CustomContrast: A Multilevel Contrastive Perspective For Subject-Driven Text-to-Image Customization[PDF, Page]

[arxiv 2024.09]GroundingBooth: Grounding Text-to-Image Customization [PDF, Page]

[arxiv 2024.09]TextBoost: Towards One-Shot Personalization of Text-to-Image Models via Fine-tuning Text Encoder [PDF, Page]

[arxiv 2024.09]SaRA: High-Efficient Diffusion Model Fine-tuning with Progressive Sparse Low-Rank Adaptation [PDF, Page]

[arxiv 2024.09] Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation[PDF, Page]

[arxiv 2024.09] Imagine yourself: Tuning-Free Personalized Image Generation[PDF]

[arxiv 2024.10] Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models [PDF,]

[arxiv 2024.10] Fusion is all you need: Face Fusion for Customized Identity-Preserving Image Synthesis [PDF]

[arxiv 2024.10]Event-Customized Image Generation[PDF]

[arxiv 2024.10] DisEnvisioner: Disentangled and Enriched Visual Prompt for Customized Image Generation [PDF,Page]

[arxiv 2024.10] HybridBooth: Hybrid Prompt Inversion for Efficient Subject-Driven Generation [PDF,Page]

[arxiv 2024.10] Learning to Customize Text-to-Image Diffusion In Diverse Context [PDF]

[arxiv 2024.10] FaceChain-FACT: Face Adapter with Decoupled Training for Identity-preserved Personalization [PDF,Page]

[arxiv 2024.10] MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models [PDF,Page]

[arxiv 2024.10] Unbounded: A Generative Infinite Game of Character Life Simulation [PDF,Page]

[arxiv 2024.10] How to Continually Adapt Text-to-Image Diffusion Models for Flexible Customization? [PDF,Page]

[arxiv 2024.10] RelationBooth: Towards Relation-Aware Customized Object Generation [PDF,Page]

[arxiv 2024.10] Novel Object Synthesis via Adaptive Text-Image Harmony [PDF,Page]

[arxiv 2024.10] [PDF,Page]

Story-telling

[ECCV 2022] Story Dall-E: Adapting pretrained text-to-image transformers for story continuation [PDF, code]

[arxiv 22.11; Ailibaba] Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models [PDF, code]

[CVPR 2023] Make-A-Story: Visual Memory Conditioned Consistent Story Generation [PDF ]

[arxiv 2023.01]An Impartial Transformer for Story Visualization [PDF]

[arxiv 2023.02]Zero-shot Generation of Coherent Storybook from Plain Text Story using Diffusion Models [PDF]

[arxiv 2023.05]TaleCrafter: Interactive Story Visualization with Multiple Characters [PDF, Page]

[arxiv 2023.06]Intelligent Grimm -- Open-ended Visual Storytelling via Latent Diffusion Models [PDF, Page]

[arxiv 2023.08]Story Visualization by Online Text Augmentation with Context Memory[PDF]

[arxiv 2023.08]Text-Only Training for Visual Storytelling [PDF]

[arxiv 2023.08]StoryBench: A Multifaceted Benchmark for Continuous Story Visualization [PDF, Page]

[arxiv 2023.11]AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort[PDf,Page]

[arxiv 2023.12]Make-A-Storyboard: A General Framework for Storyboard with Disentangled and Merged Control [PDF]

[arxiv 2023.12]CogCartoon: Towards Practical Story Visualization [PDF]

[arxiv 2024.03]TARN-VIST: Topic Aware Reinforcement Network for Visual Storytelling [PDF]

[arxiv 2024.05] Evolving Storytelling: Benchmarks and Methods for New Character Customization with Diffusion Models [PDF]

[arxiv 2024.07] Boosting Consistency in Story Visualization with Rich-Contextual Conditional Diffusion Models [PDF]

[arxiv 2024.07] SEED-Story: Multimodal Long Story Generation with Large Language Model [PDF,Page]

[arxiv 2024.07] MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence [PDF,Page]

[arxiv 2024.08]Story3D-Agent: Exploring 3D Storytelling Visualization with Large Language Models[PDF,Page]

[arxiv 2024.10] Storynizor: Consistent Story Generation via Inter-Frame Synchronized and Shuffled ID Injection [PDF]

[arxiv 2024.10] [PDF,Page]

Layout Generation

[arxiv 2022.08]Layout-Bridging Text-to-Image Synthesis [PDF]

[arxiv 2023.03]Unifying Layout Generation with a Decoupled Diffusion Model [PDF]

[arxiv 2023.02]LayoutDiffuse: Adapting Foundational Diffusion Models for Layout-to-Image Generation [PDF]

[arxiv 2023.03]LayoutDM: Discrete Diffusion Model for Controllable Layout Generation [PDF, Page]

[arxiv 2023.03]LayoutDiffusion: Improving Graphic Layout Generation by Discrete Diffusion Probabilistic Models [PDF]

[arxiv 2023.03]DiffPattern: Layout Pattern Generation via Discrete Diffusion[PDF]

[arxiv 2023.03]Freestyle Layout-to-Image Synthesis [PDF]

[arxiv 2023.04]Training-Free Layout Control with Cross-Attention Guidance [PDF, Page]

->[arxiv 2023.05]LayoutGPT: Compositional Visual Planning and Generation with Large Language Models [PDF]

->[arxiv 2023.05]Visual Programming for Text-to-Image Generation and Evaluation [PDF, Page]

[arxiv 2023.06]Relation-Aware Diffusion Model for Controllable Poster Layout Generation [PDF]

[arxiv 2023.08]LayoutLLM-T2I: Eliciting Layout Guidance from LLM for Text-to-Image Generation [PDF]

[arxiv 2023.08]Learning to Generate Semantic Layouts for Higher Text-Image Correspondence in Text-to-Image Synthesis [PDF]

[arxiv 2023.08]Dense Text-to-Image Generation with Attention Modulation [PDF, Page]

[arxiv 2023.11]Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation [PDF,Page]

[arxiv 2023.11]Check, Locate, Rectify: A Training-Free Layout Calibration System for Text-to-Image Generation [PDF]

[arxiv 2023.12]Reason out Your Layout: Evoking the Layout Master from Large Language Models for Text-to-Image Synthesis [PDF]

[arxiv 2024.02]Layout-to-Image Generation with Localized Descriptions using ControlNet with Cross-Attention Control [PDF]

[arxiv 2024.02]Multi-LoRA Composition for Image Generation [PDF,Page]

[arxiv 2024.03]NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging [PDF]

[arxiv 2024.03]Discriminative Probing and Tuning for Text-to-Image Generation [PDF,Page]

[arxiv 2024.03]DivCon: Divide and Conquer for Progressive Text-to-Image Generation [PDF]

[arxiv 2024.03]LayoutFlow: Flow Matching for Layout Generation [PDF]

[arxiv 2024.05] Enhancing Image Layout Control with Loss-Guided Diffusion Models [PDF]

[arxiv 2024.06]Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis [PDF,Page]

[arxiv 2024.09] Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation[PDF,Page]

[arxiv 2024.09] SpotActor: Training-Free Layout-Controlled Consistent Image Generation[PDF]

[arxiv 2024.09] IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation [PDF,Page]

[arxiv 2024.09] Scribble-Guided Diffusion for Training-free Text-to-Image Generation [PDF,Page]

[arxiv 2024.09] Layout-Corrector: Alleviating Layout Sticking Phenomenon in Discrete Diffusion Model [PDF,Page]

[arxiv 2024.10] Story-Adapter: A Training-free Iterative Framework for Long Story Visualization [PDF,Page]

[arxiv 2024.10] [PDF,Page]

SVG

[arxiv 2022.11; UCB] VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models [PDF]

[arxiv 2023.04]IconShop: Text-Based Vector Icon Synthesis with Autoregressive Transformers [PDF, Page]

[arxiv 2023.06]Image Vectorization: a Review [PDF]

[arxiv 2023.06]DiffSketcher: Text Guided Vector Sketch Synthesis through Latent Diffusion Models[PDF]

[arxiv 2023.09]Text-Guided Vector Graphics Customization [PDF,Page]

[arxiv 2023.09]Deep Geometrized Cartoon Line Inbetweening [PDF,Page]

[arxiv 2023.12]VecFusion: Vector Font Generation with Diffusion [PDF]

[arxiv 2023.12]StarVector: Generating Scalable Vector Graphics Code from Images[PDF, Page]

[arxiv 2023.12]SVGDreamer: Text Guided SVG Generation with Diffusion Model [PDF]

[arxiv 2024.2]StrokeNUWA: Tokenizing Strokes for Vector Graphic Synthesis [PDF]

[arxiv 2024.05] NIVeL: Neural Implicit Vector Layers for Text-to-Vector Generation [PDF]

[arxiv 2024.10] [PDF,Page]

composition & Translation

[arxiv 2022; Google]Sketch-Guided Text-to-Image Diffusion Models [PDF, code]

[arxiv 2022.11; Microsoft]ReCo: Region-Controlled Text-to-Image Generation [PDF, code]

[arxiv 2022.11; Meta]SpaText: Spatio-Textual Representation for Controllable Image Generation [PDF, code]

[arxiv 2022.11; Seoul National University] DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model. [PROJECT]

[arxiv 2022.12]High-Fidelity Guided Image Synthesis with Latent Diffusion Models [PDF]

[arxiv 2022.12]Fine-grained Image Editing by Pixel-wise Guidance Using Diffusion Models [PDF]

[arxiv 2022; MSRA]Paint by Example: Exemplar-based Image Editing with Diffusion Models [PDF, code]

[arxiv 2022.12]Towards Practical Plug-and-Play Diffusion Models [PDF]

[arxiv 2023.01]Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models[PDF]

[arxiv 2023.02]Zero-shot Image-to-Image Translation [PDF, Page]

[arxiv 2023.02]Universal Guidance for Diffusion Models [PDF, Page]

[arxiv 2023.02]DiffFaceSketch: High-Fidelity Face Image Synthesis with Sketch-Guided Latent Diffusion Model [PDF, ]

[arxiv 2023.02]Text-Guided Scene Sketch-to-Photo Synthesis[PDF,]

*[arxiv 2023.02]--T2I-Adapter--: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models[PDF,Code]

[arxiv 2023.02]MultiDiffusion: Fusing Diffusion Paths for Controlled Image Generation [PDF, Page]

*[arxiv 2023.02] --controlNet-- Adding Conditional Control to Text-to-Image Diffusion Models [PDF]

*[arxiv 2023.02] --composer-- Composer: Creative and Controllable Image Synthesis with Composable Conditions [PDF]

[arxiv 2023.02]Modulating Pretrained Diffusion Models for Multimodal Image Synthesis [PDF]

[arxiv 2023.02]Region-Aware Diffusion for Zero-shot Text-driven Image Editing [PDF]

[arxiv 2023.03]Collage Diffusion [PDF]

*[arxiv 2023.01] GLIGEN: Open-Set Grounded Text-to-Image Generation [PDF, Page, Code]

[arxiv 2023.03]GlueGen: Plug and Play Multi-modal Encoders for X-to-image Generation [PDF]

*[arxiv 2023.03]FreeDoM: Training-Free Energy-Guided Conditional Diffusion Model [PDF]

[arxiv 2023.03]DS-Fusion: Artistic Typography via Discriminated and Stylized Diffusion [PDF]

*[arxiv 2023.03]PAIR-Diffusion: Object-Level Image Editing with Structure-and-Appearance Paired Diffusion Models [PDF, code]

[arxiv 2023.03]DiffCollage: Parallel Generation of Large Content with Diffusion Models [PDF,page]

[arxiv 2023.04]SketchFFusion: Sketch-guided image editing with diffusion model [PDF]

[arxiv 2023.04]Training-Free Layout Control with Cross-Attention Guidance [PDF, Page]

[arxiv 2023.04]HumanSD: A Native Skeleton-Guided Diffusion Model for Human Image Generation [PDF, Page]

->[arxiv 2023.04]DreamPose: Fashion Image-to-Video Synthesis via Stable Diffusion [PDF, Page]

-> [arxiv 2023.04]Inpaint Anything: Segment Anything Meets Image Inpainting [PDF, Page]

[arxiv 2023.04]Soundini: Sound-Guided Diffusion for Natural Video Editing [PDF]

->[arxiv 2023.04]CONTROLLABLE IMAGE GENERATION VIA COLLAGE REPRESENTATIONS [PDF]

[arxiv 2023.05]Guided Image Synthesis via Initial Image Editing in Diffusion Model [PDF]

[arxiv 2023.05]DiffSketching: Sketch Control Image Synthesis with Diffusion Models [PDF]

-> [arxiv 2023.05]Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models [PDF, Page]

-> [arxiv 2023.05]Break-A-Scene: Extracting Multiple Concepts from a Single Image [PDF, Page]

[arxiv 2023.05]Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models [PDF, Page]

[arxiv 2023.05]DiffBlender: Scalable and Composable Multimodal Text-to-Image Diffusion Models [PDF]

[arxiv 2023.05]MaGIC: Multi-modality Guided Image Completion [PDF]

[arxiv 2023.05]Text-to-image Editing by Image Information Removal [PDF]

[arxiv 2023.06]Cocktail: Mixing Multi-Modality Controls for Text-Conditional Image Generation [PDF, Page]

[arxiv 2023.06]Grounded Text-to-Image Synthesis with Attention Refocusing [PDF, Page, Code]

[arxiv 2023.06]Energy-Based Cross Attention for Bayesian Context Update in Text-to-Image Diffusion Models [PDF]

[arxiv 2023.06]TryOnDiffusion: A Tale of Two UNets [PDF]

->[arxiv 2023.06]Adding 3D Geometry Control to Diffusion Models [PDF]

[arxiv 2023.06]Linguistic Binding in Diffusion Models: Enhancing Attribute Correspondence through Attention Map Alignment [PDF]

[arxiv 2023.06]Continuous Layout Editing of Single Images with Diffusion Models [PDF]

[arxiv 2023.06]DreamEdit: Subject-driven Image Editing [PDF,Page]

[arxiv 2023.06]Decompose and Realign: Tackling Condition Misalignment in Text-to-Image Diffusion Models [PDF]

[arxiv 2023.06]Zero-shot spatial layout conditioning for text-to-image diffusion models [PDF]

[arxiv 2023.06]MVDiffusion: Enabling Holistic Multi-view Image Generation with Correspondence-Aware Diffusion[[PDF],Page]

[arxiv 2023.07]BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion [PDF]

[arxiv 2023.08]LAW-Diffusion: Complex Scene Generation by Diffusion with Layouts [PDF]

[arxiv 2023.09]DreamCom: Finetuning Text-guided Inpainting Model for Image Composition [PDF]

[arxiv 2023.11]Cross-Image Attention for Zero-Shot Appearance Transfer[PDF, Page]

[arxiv 2023.12]SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained Object Insertion and Layout Control [PDF, Page]

[arxiv 2023.12]DreamInpainter: Text-Guided Subject-Driven Image Inpainting with Diffusion Models [PDF]

[arxiv 2023.12]InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models [PDF,Page]

[arxiv 2023.12]Disentangled Representation Learning for Controllable Person Image Generation [PDF]

[arxiv 2023.12]A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting [PDF, Page]

[arxiv 2023.12]FreeControl: Training-Free Spatial Control of Any Text-to-Image Diffusion Model with Any Condition [PDF,Page]

[arxiv 2023.12]FineControlNet: Fine-level Text Control for Image Generation with Spatially Aligned Text Control Injection [PDF,Page]

[arxiv 2023.12]Local Conditional Controlling for Text-to-Image Diffusion Models [PDF]

[arxiv 2023.12]SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing [PDF, Page]

[arxiv 2023.12]HD-Painter: High-Resolution and Prompt-Faithful Text-Guided Image Inpainting with Diffusion Models [PDF]

[arxiv 2023.12]Semantic Guidance Tuning for Text-To-Image Diffusion Models[PDF,Page]

[arxiv 2024.1]ReplaceAnything as you want: Ultra-high quality content replacement[PDF,Page]

[arxiv 2024.01]Compose and Conquer: Diffusion-Based 3D Depth Aware Composable Image Synthesis [PDF, Page]

[arxiv 2024.01]Spatial-Aware Latent Initialization for Controllable Image Generation [PDF]

[arxiv 2024.02]Repositioning the Subject within Image [PDF,Page]

[arxiv 2024.02]Cross-view Masked Diffusion Transformers for Person Image Synthesis [PDF]

[arxiv 2024.02]Image Sculpting: Precise Object Editing with 3D Geometry Control[PDF,Page]

[arxiv 2024.02]Outline-Guided Object Inpainting with Diffusion Models [PDF]

[arxiv 2024.03]Differential Diffusion: Giving Each Pixel Its Strength [PDF,Page]

[arxiv 2024.03]BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion [PDF,Page]

[arxiv 2024.03]SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint Prior [PDF,Page]

[arxiv 2024.03]One-Step Image Translation with Text-to-Image Models [PDF, Page]

[arxiv 2024.03]LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model [PDF]

[arxiv 2024.03]FlexEdit Flexible and Controllable Diffusion-based Object-centric Image Editing [PDF,Page]

[arxiv 2024.03]U-Sketch: An Efficient Approach for Sketch to Image Diffusion Models [PDF]

[arxiv 2024.03]ECNet: Effective Controllable Text-to-Image Diffusion Models [PDF]

[arxiv 2024.03]ObjectDrop: Bootstrapping Counterfactuals for Photorealistic Object Removal and Insertion [PDF,Page]

[arxiv 2024.04]LayerDiffuse:Transparent Image Layer Diffusion using Latent Transparency [PDF,Page]

[arxiv 2024.04]Move Anything with Layered Scene Diffusion [PDF]

[arxiv 2024.04]ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback [PDF,Page]

[arxiv 2024.04]Salient Object-Aware Background Generation using Text-Guided Diffusion Models [PDF]

[arxiv 2024.04]Ctrl-Adapter: An Efficient and Versatile Framework for Adapting Diverse Controls to Any Diffusion Model [PDF, Page]

[arxiv 2024.04]Enhancing Prompt Following with Visual Control Through Training-Free Mask-Guided Diffusion [PDF]

[arxiv 2024.04]ObjectAdd: Adding Objects into Image via a Training-Free Diffusion Modification Fashion [PDF]

[arxiv 2024.04]Anywhere: A Multi-Agent Framework for Reliable and Diverse Foreground-Conditioned Image Inpainting [PDF,Page]

[arxiv 2024.04]Paint by Inpaint: Learning to Add Image Objects by Removing Them First [PDF, Page]

[arxiv 2024.05]FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation [PDF]

[arxiv 2024.05]CTRLorALTer: Conditional LoRAdapter for Efficient 0-Shot Control & Altering of T2I Models [PDF,Page]

[arxiv 2024.06]Stable-Pose: Leveraging Transformers for Pose-Guided Text-to-Image Generation [PDF]

[arxiv 2024.06] FaithFill: Faithful Inpainting for Object Completion Using a Single Reference Image [PDF]

[arxiv 2024.06] AnyControl: Create Your Artwork with Versatile Control on Text-to-Image Generation [PDF,Page]

[arxiv 2024.07] Magic Insert: Style-Aware Drag-and-Drop [PDF,Page]

[arxiv 2024.07]MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis [PDF,Page]

[arxiv 2024.07] PartCraft: Crafting Creative Objects by Parts [PDF,Page]

[arxiv 2024.07] Every Pixel Has its Moments: Ultra-High-Resolution Unpaired Image-to-Image Translation via Dense Normalization [PDF,Page]

[arxiv 2024.07] FreeCompose: Generating Diverse Storytelling Images with Minimal Human Effort [PDF,Page]

[arxiv 2024.07] Sketch-Guided Scene Image Generation[PDF]

[arxiv 2024.07] Training-free Composite Scene Generation for Layout-to-Image Synthesis [PDF]

[arxiv 2024.07] Diffree: Text-Guided Shape Free Object Inpainting with Diffusion Model [PDF,Page]

[arxiv 2024.08] ControlNeXt: Powerful and Efficient Control for Image and Video Generation [PDF,Page]

[arxiv 2024.08] TraDiffusion: Trajectory-Based Training-Free Image Generation [PDF,Page]

[arxiv 2024.08] RepControlNet: ControlNet Reparameterization[PDF]

[arxiv 2024.08] Foodfusion: A Novel Approach for Food Image Composition via Diffusion Models [PDF]

[arxiv 2024.08]Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation[PDF,Page]

[arxiv 2024.08] GRPose: Learning Graph Relations for Human Image Generation with Pose Priors [PDF,Page]

[arxiv 2024.09] Skip-and-Play: Depth-Driven Pose-Preserved Image Generation for Any Objects [PDF,Page]

[arxiv 2024.09] Diffusion-Based Image-to-Image Translation by Noise Correction via Prompt Interpolation [PDF,Page]

[arxiv 2024.09]PixWizard: Versatile Image-to-Image Visual Assistant with Open-Language Instructions[PDF,Page]

[arxiv 2024.09] InstructDiffusion: A Generalist Modeling Interface for Vision Tasks [PDF,Page]

[arxiv 2024.10] Scene Graph Disentanglement and Composition for Generalizable Complex Image Generation [PDF]

[arxiv 2024.10] OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction [PDF,Page]

[arxiv 2024.10] 3DIS: Depth-Driven Decoupled Instance Synthesis for Text-to-Image Generation [PDF,Page]

[arxiv 2024.10] HiCo: Hierarchical Controllable Diffusion Model for Layout-to-image Generation [PDF,Page]

[arxiv 2024.10] TopoDiffusionNet: A Topology-aware Diffusion Model [PDF]

[arxiv 2024.10] [PDF,Page]

[arxiv 2024.10] [PDF,Page]

Image Variation

[arxiv 2023.08]IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models [PDF, Page]

Super-Resolution & restoration & Higher-resolution generation

[arxiv 2022.12]ADIR: Adaptive Diffusion for Image Reconstruction [PDF

[arxiv 2023.03]Denoising Diffusion Probabilistic Models for Robust Image Super-Resolution in the Wild [PDF]

[arxiv 2023.03]TextIR: A Simple Framework for Text-based Editable Image Restoration [PDF]

[arxiv 2023.03]Unlimited-Size Diffusion Restoration [PDF, code]

[arxiv 2023.03]DiffIR: Efficient Diffusion Model for Image Restoration [PDF]

[arxiv 2023.03]Inversion by Direct Iteration: An Alternative to Denoising Diffusion for Image Restoration [PDF]

[arxiv 2023.03]Implicit Diffusion Models for Continuous Super-Resolution [PDF]

[arxiv 2023.05]UDPM: Upsampling Diffusion Probabilistic Models [PDF]

[arxiv 2023.06]Image Harmonization with Diffusion Model [PDF]

[arxiv 2023.06]PartDiff: Image Super-resolution with Partial Diffusion Models [PDF]

[arxiv 2023.08]Patched Denoising Diffusion Models For High-Resolution Image Synthesis [PDF]

[arxiv 2023.08]DiffBIR: Towards Blind Image Restoration with Generative Diffusion Prior [PDF,Page]

[arxiv 2023.10] ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models [PDF, Page]

[arxiv 2023.11]Image Super-Resolution with Text Prompt Diffusion [PDF,Page]

[arxiv 2023.11]SinSR: Diffusion-Based Image Super-Resolution in a Single Step [PDF]

[arxiv 2023.11]SeeSR: Towards Semantics-Aware Real-World Image Super-Resolution [PDF]

[arxiv 2023.11]LFSRDiff: Light Field Image Super-Resolution via Diffusion Models [PDF]

[arxiv 2023.12]ElasticDiffusion: Training-free Arbitrary Size Image Generation [PDF,Code]

[arxiv 2023.12]UIEDP:Underwater Image Enhancement with Diffusion Prior [PDF]

[arxiv 2023.12]MagicScroll: Nontypical Aspect-Ratio Image Generation for Visual Storytelling via Multi-Layered Semantic-Aware Denoisin [PDF,Page]

[arxiv 2024.01]Diffusion Models, Image Super-Resolution And Everything: A Survey

[arxiv 2024.01]Improving the Stability of Diffusion Models for Content Consistent Super-Resolution [PDF]

[arxiv 2024.01]Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild [PDF, Page]

[arxiv 2024.1]Spatial-and-Frequency-aware Restoration method for Images based on Diffusion Models [PDF]

[arxiv 2024.2]You Only Need One Step: Fast Super-Resolution with Stable Diffusion via Scale Distillation [PDF]

[arxiv 2024.02]Make a Cheap Scaling : A Self-Cascade Diffusion Model for Higher-Resolution Adaptation[PDF,Page]

[arxiv 2024.02]SAM-DiffSR: Structure-Modulated Diffusion Model for Image Super-Resolution [PDF]

[arxiv 2024.03]ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models [PDF, PDF]

[arxiv 2024.03]XPSR: Cross-modal Priors for Diffusion-based Image Super-Resolution [PDF]

[arxiv 2024.03]BlindDiff: Empowering Degradation Modelling in Diffusion Models for Blind Image Super-Resolution [PDF]

[arxiv 2024.04]Upsample Guidance: Scale Up Diffusion Models without Training [PDF]

[arxiv 2024.04]DeeDSR: Towards Real-World Image Super-Resolution via Degradation-Aware Stable Diffusion [PDF]

[arxiv 2024.04]BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion [PDF, Page]

[arxiv 2024.04]LTOS: Layout-controllable Text-Object Synthesis via Adaptive Cross-attention Fusions [PDF]

[arxiv 2024.05]CDFormer:When Degradation Prediction Embraces Diffusion Model for Blind Image Super-Resolution [PDF]

[arxiv 2024.05]Frequency-Domain Refinement with Multiscale Diffusion for Super Resolution [PDF]

[arxiv 2024.05] PatchScaler: An Efficient Patch-independent Diffusion Model for Super-Resolution [PDF, Page]

[arxiv 2024.05]Blind Image Restoration via Fast Diffusion Inversion[PDF,Page]

[arxiv 2024.06] FlowIE: Efficient Image Enhancement via Rectified Flow [PDF,Page]

[arxiv 2024.06] Hierarchical Patch Diffusion Models for High-Resolution Video Generation [PDF]

[arxiv 2024.06] Is One GPU Enough? Pushing Image Generation at Higher-Resolutions with Foundation Models [PDF,Page]

[arxiv 2024.06] Towards Realistic Data Generation for Real-World Super-Resolution[PDF]

[arxiv 2024.06] Crafting Parts for Expressive Object Composition [PDF,Page]

[arxiv 2024.06] LFMamba: Light Field Image Super-Resolution with State Space Model [PDF]

[arxiv 2024.06] ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance [PDF,Page]

[arxiv 2024.07] Layered Diffusion Model for One-Shot High Resolution Text-to-Image Synthesis[PDF]

[arxiv 2024.07] LightenDiffusion: Unsupervised Low-Light Image Enhancement with Latent-Retinex Diffusion Models [PDF,Page]

[arxiv 2024.07] AccDiffusion: An Accurate Method for Higher-Resolution Image Generation[PDF,Page]

[arxiv 2024.07] ∞ -Brush: Controllable Large Image Synthesis with Diffusion Models in Infinite Dimensions [PDF]

[arxiv 2024.08] MegaFusion: Extend Diffusion Models towards Higher-resolution Image Generation without Further Tuning [PDF,Page]

[arxiv 2024.09] HiPrompt: Tuning-free Higher-Resolution Generation with Hierarchical MLLM Prompts [PDF,Page]

[arxiv 2024.09] FreeEnhance: Tuning-Free Image Enhancement via Content-Consistent Noising-and-Denoising Process [PDF,Page]

[arxiv 2024.09] Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs [PDF,Page]

[arxiv 2024.09] Degradation-Guided One-Step Image Super-Resolution with Diffusion Priors [PDF,Page]

[arxiv 2024.09] BurstM: Deep Burst Multi-scale SR using Fourier Space with Optical Flow [PDF]

[arxiv 2024.10] Distillation-Free One-Step Diffusion for Real-World Image Super-Resolution [PDF,Page]

[arxiv 2024.10] AP-LDM: Attentive and Progressive Latent Diffusion Model for Training-Free High-Resolution Image Generation [PDF,Page]

[arxiv 2024.10] Deep Compression Autoencoder for Efficient High-Resolution Diffusion Models [PDF,Page]

[arxiv 2024.10] Hi-Mamba: Hierarchical Mamba for Efficient Image Super-Resolution [PDF,Page]

[arxiv 2024.10] ConsisSR: Delving Deep into Consistency in Diffusion-based Image Super-Resolution [PDF]

[arxiv 2024.10] ClearSR: Latent Low-Resolution Image Embeddings Help Diffusion-Based Real-World Super Resolution Models See Clearer [PDF

[arxiv 2024.10] Multi-Scale Diffusion: Enhancing Spatial Layout in High-Resolution Panoramic Image Generation [PDF]

[arxiv 2024.10] LoRA-IR: Taming Low-Rank Experts for Efficient All-in-One Image Restoration [PDF]

[arxiv 2024.10] FlowDCN: Exploring DCN-like Architectures for Fast Image Generation with Arbitrary Resolution [PDF]

[arxiv 2024.10] [PDF,Page]

translation

[arxiv 2024.10] CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation [PDF,Page]

[arxiv 2024.10] [PDF,Page]

action transfer

[arxiv 2023.11]Learning Disentangled Identifiers for Action-Customized Text-to-Image Generation [PDF]

[arxiv 2024.10] [PDF,Page]

Style transfer

[arxiv 22.11; kuaishou] DiffStyler: Controllable Dual Diffusion for Text-Driven Image Stylization [PDF, code]

[ICLR 23] TEXT-GUIDED DIFFUSION IMAGE STYLE TRANSFER WITH CONTRASTIVE LOSS [Paper]

[arxiv 22.11; kuaishou&CAS] Inversion-Based Creativity Transfer with Diffusion Models [PDF, Code]

[arxiv 2022.12]Diff-Font: Diffusion Model for Robust One-Shot Font Generation [PDF]

[arxiv 2023.02]Structure and Content-Guided Video Synthesis with Diffusion Models [PDF, Page]

[arxiv 2023.03]Design Booster: A Text-Guided Diffusion Model for Image Translation with Spatial Layout Preservation [PDF]

[arxiv 2023.02]DiffFashion: Reference-based Fashion Design with Structure-aware Transfer by Diffusion Models [PDF]

[arxiv 2022.11]Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation[PDF]

[arxiv 2023.03]StyO: Stylize Your Face in Only One-Shot [PDF]

[arxiv 2023.03]Zero-Shot Contrastive Loss for Text-Guided Diffusion Image Style Transfer [PDF]

[arxiv 2023.04] One-Shot Stylization for Full-Body Human Images [[PDF](One-Shot Stylization for Full-Body Human Images)]

[arxiv 2023.06]StyleDrop: Text-to-Image Generation in Any Style [PDF, Page]

[arxiv 2023.07]General Image-to-Image Translation with One-Shot Image Guidance [PDF]

[arxiv 2023.08]DiffColor: Toward High Fidelity Text-Guided Image Colorization with Diffusion Models [PDF]

[arxiv 2023.08] StyleDiffusion: Controllable Disentangled Style Transfer via Diffusion Models [[PDf] (https://arxiv.org/abs/2308.07863)]

[arxiv 2023.08]Scenimefy: Learning to Craft Anime Scene via Semi-Supervised Image-to-Image Translation [PDF, Page]

[arxiv 2023.09]StyleAdapter: A Single-Pass LoRA-Free Model for Stylized Image Generation [PDF]

[arxiv 2023.09]DreamStyler: Paint by Style Inversion with Text-to-Image Diffusion Models [PDF]

[arxiv 2023.11]ControlStyle: Text-Driven Stylized Image Generation Using Diffusion Priors[PDF]

[arxiv 2023.11]ZipLoRA: Any Subject in Any Style by Effectively Merging LoRAs [PDF,Page]

[arxiv 2023.11]Soulstyler: Using Large Language Model to Guide Image Style Transfer for Target Object [PDf]

[arxiv 2023.11]InstaStyle: Inversion Noise of a Stylized Image is Secretly a Style Adviser[PDF]

[arxiv 2023.12]Portrait Diffusion: Training-free Face Stylization with Chain-of-Painting [PDF]

[arxiv 2023.12]StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter[PDF,Page]

[arxiv 2023.12]Style Injection in Diffusion: A Training-free Approach for Adapting Large-scale Diffusion Models for Style Transfer [PDF]

[arxiv 2023.12]Style Aligned Image Generation via Shared Attention [PDF,Page]

[arxiv 2024.01]FreeStyle : Free Lunch for Text-guided Style Transfer using Diffusion Models [PDF, Page]

[arxiv 2024.02]Control Color: Multimodal Diffusion-based Interactive Image Colorization [PDF, Page]

[arxiv 2024.02]One-Shot Structure-Aware Stylized Image Synthesis [PDF]

[arxiv 2024.02]Visual Style Prompting with Swapping Self-Attention [PDF,Page]

[arxiv 2024.03]DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations [PDF,Page]

[arxiv 2024.03]Implicit Style-Content Separation using B-LoRA [PDF,Page]

[arxiv 2024.03]Break-for-Make: Modular Low-Rank Adaptations for Composable Content-Style Customization [PDF]

[arxiv 2024.04]InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation [PDF]

[arxiv 2024.04]Tuning-Free Adaptive Style Incorporation for Structure-Consistent Text-Driven Style Transfer [PDF]

[arxiv 2024.04]DEADiff: An Efficient Stylization Diffusion Model with Disentangled Representations [PDF,Page]

[arxiv 2024.04]Towards Highly Realistic Artistic Style Transfer via Stable Diffusion with Step-aware and Layer-aware Prompt [PDF]

[arxiv 2024.04]StyleBooth: Image Style Editing with Multimodal Instruction [PDF]

[arxiv 2024.04]FilterPrompt: Guiding Image Transfer in Diffusion Models [PDF]

[arxiv 2024.05]FreeTuner: Any Subject in Any Style with Training-free Diffusion[PDF]

[arxiv 2024.05] StyleMaster: Towards Flexible Stylized Image Generation with Diffusion Models [PDF]

[arxiv 2024.06] Stylebreeder: Exploring and Democratizing Artistic Styles through Text-to-Image Models [PDF,Page]

[arxiv 2024.07] StyleShot: A Snapshot on Any Style [PDF,Page]

[arxiv 2024.07] InstantStyle-Plus: Style Transfer with Content-Preserving in Text-to-Image Generation [PDF]

[arxiv 2024.07] Frequency-Controlled Diffusion Model for Versatile Text-Guided Image-to-Image Translation [PDF,Page]

[arxiv 2024.07] Magic Insert: Style-Aware Drag-and-Drop [PDF,Page]

[arxiv 2024.07]Ada-adapter:Fast Few-shot Style Personlization of Diffusion Model with Pre-trained Image Encoder [PDF]

[arxiv 2024.07] Artist: Aesthetically Controllable Text-Driven Stylization without Training [PDF,Page]

[arxiv 2024.08] StyleBrush: Style Extraction and Transfer from a Single Image [PDF]

[arxiv 2024.08] CSGO: Content-Style Composition in Text-to-Image Generation [PDF,Page]

[arxiv 2024.09]StyleTokenizer: Defining Image Style by a Single Instance for Controlling Diffusion Models[PDF,Page]

[arxiv 2024.09]Training-free Color-Style Disentanglement for Constrained Text-to-Image Synthesis[PDF]

[arxiv 2024.09] Mamba-ST: State Space Model for Efficient Style Transfer [PDF]

[arxiv 2024.10] Harnessing the Latent Diffusion Model for Training-Free Image Style Transfer [PDF]

[arxiv 2024.10] [PDF,Page]

downstream apps

[arxiv 2023.11]Text-to-Sticker: Style Tailoring Latent Diffusion Models for Human Expression [PDF]

[arxiv 2023.11]Paragraph-to-Image Generation with Information-Enriched Diffusion Model [PDF,Page]

[arxiv 2024.02]Text2Street: Controllable Text-to-image Generation for Street Views [PDf]

[arxiv 2024.02]FineDiffusion : Scaling up Diffusion Models for Fine-grained Image Generation with 10,000 Classes[PDF,Page]

[arxiv 2024.03]Text-to-Image Diffusion Models are Great Sketch-Photo Matchmakers [PDF]

[arxiv 2024.06] Coherent Zero-Shot Visual Instruction Generation [PDF,Page]

[arxiv 2024.10] Inverse Painting: Reconstructing The Painting Process[PDF,Page]

[arxiv 2024.10] [PDF,Page]

world model

[arxiv 2024.10] AVID: Adapting Video Diffusion Models to World Models [PDF,Page]

[arxiv 2024.10] [PDF,Page]

Mesh generation

[arxiv 2024.09] EdgeRunner: Auto-regressive Auto-encoder for Artistic Mesh Generation [PDF,Page]

[arxiv 2024.10] [PDF,Page]

depth

[arxiv 2024.09] Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction [PDF,Page]

[arxiv 2024.09]Self-Distilled Depth Refinement with Noisy Poisson Fusion [PDF,Page]

[arxiv 2024.10] [PDF,Page]

scaling

[arxiv 2024.10] FINE: Factorizing Knowledge for Initialization of Variable-sized Diffusion Models [PDF]

[arxiv 2024.10] [PDF,Page]

disentanglement

[ICMR 2023]Not Only Generative Art: Stable Diffusion for Content-Style Disentanglement in Art Analysis [PDF]

Face ID

[arxiv 2022.12]HS-Diffusion: Learning a Semantic-Guided Diffusion Model for Head Swapping[PDF]

[arxiv 2023.06]Inserting Anybody in Diffusion Models via Celeb Basis [PDF, Page]

[arxiv 2023.07]DreamIdentity: Improved Editability for Efficient Face-identity Preserved Image Generation [PDF,Page]

[arxiv 2024.10] FuseAnyPart: Diffusion-Driven Facial Parts Swapping via Multiple Reference Images [PDF,Page]

[arxiv 2024.10] [PDF,Page]

scene composition

[arxiv 2023.02]MIXTURE OF DIFFUSERS FOR SCENE COMPOSITION AND HIGH RESOLUTION IMAGE GENERATION [PDF]

[arxiv 2023.02]Cross-domain Compositing with Pretrained Diffusion Models[PDF]

hand writing

[arxiv 2023.03]WordStylist: Styled Verbatim Handwritten Text Generation with Latent Diffusion Models[PDF]

speed

[arxiv 2023.05]FISEdit: Accelerating Text-to-image Editing via Cache-enabled Sparse Diffusion Inference [PDF]

[arxiv 2023.06]Fast Training of Diffusion Models with Masked Transformers [PDF]

[arxiv 2023.06]Fast Diffusion Model [PDF]

[arxiv 2023.06]Masked Diffusion Models are Fast Learners [PDF]

[arxiv 2023.06]Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference [PDF]

[arxiv 2023.11]UFOGen: You Forward Once Large Scale Text-to-Image Generation via Diffusion GANs [PDF]

[arxiv 2023.11]AdaDiff: Adaptive Step Selection for Fast Diffusion [PDF]

[arxiv 2023.11]MobileDiffusion: Subsecond Text-to-Image Generation on Mobile Devices [PDF]

[arxiv 2023.11]Manifold Preserving Guided Diffusion [PDF]

[arxiv 2023.11]LCM-LoRA: A Universal Stable-Diffusion Acceleration Module [PDF,Page]

[arxiv 2023.11]Adversarial Diffusion Distillation [PDF,Page]

[arxiv 2023.12]One-step Diffusion with Distribution Matching Distillation [PDF,Page]

[arxiv 2023.12]SwiftBrush: One-Step Text-to-Image Diffusion Model with Variational Score Distillation [PDF, Page]

[arxiv 2023.12]SpeedUpNet: A Plug-and-Play Hyper-Network for Accelerating Text-to-Image Diffusion Models [PDF]

[arxiv 2023.12]Not All Steps are Equal: Efficient Generation with Progressive Diffusion Models [PDF]

[arxiv 2024.01]Fast Inference Through The Reuse Of Attention Maps In Diffusion Models [PDF]

[arxiv 2024.02]SDXL-Lightning: Progressive Adversarial Diffusion Distillation[PDF,Page]

[arxiv 2024.03]DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models [PDF, Page]

[arxiv 2024.03]Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation [PDF]

[arxiv 2024.03]You Only Sample Once: Taming One-Step Text-To-Image Synthesis by Self-Cooperative Diffusion GANs [PDF]

[arxiv 2024.04]T-GATE: Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models [PDF,Page]

[arxiv 2024.04]BinaryDM: Towards Accurate Binarization of Diffusion Model [PDF, Page]

[arxiv 2024.04]LAPTOP-Diff: Layer Pruning and Normalized Distillation for Compressing Diffusion Models [PDF]

[arxiv 2024.04]Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis [PDF]

[arxiv 2024.05] Improved Distribution Matching Distillation for Fast Image Synthesis [PDF, Page]

[arxiv 2024.05]PaGoDA: Progressive Growing of a One-Step Generator from a Low-Resolution Diffusion Teacher [PDF]

[arxiv 2024.05]PipeFusion: Displaced Patch Pipeline Parallelism for Inference of Diffusion Transformer Models [PDF]

[arxiv 2024.05]Reward Guided Latent Consistency Distillation[PDF, Page]

[arxiv 2024.06]Diffusion Models Are Innate One-Step Generators [PDF]

[arxiv 2024.06] ViDiT-Q: Efficient and Accurate Quantization of Diffusion Transformers for Image and Video Generation[PDF, Page]

[arxiv 2024.06]Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation [PDF, Page]

[arxiv 2024.06]Invertible Consistency Distillation for Text-Guided Image Editing in Around 7 Steps [PDF, Page]

[arxiv 2024.06]Immiscible Diffusion: Accelerating Diffusion Training with Noise Assignment[PDF]

[arxiv 2024.07]Minutes to Seconds: Speeded-up DDPM-based Image Inpainting with Coarse-to-Fine Sampling [PDF, Page]

[arxiv 2024.07]Efficient Training with Denoised Neural Weights [PDF, Page]

[arxiv 2024.07]SlimFlow: Training Smaller One-Step Diffusion Models with Rectified Flow[PDF]

[arxiv 2024.08] TurboEdit: Text-Based Image Editing Using Few-Step Diffusion Models[PDF, Page]

[arxiv 2024.08] A Simple Early Exiting Framework for Accelerated Sampling in Diffusion Models[PDF]

[arxiv 2024.08]Low-Bitwidth Floating Point Quantization for Efficient High-Quality Diffusion Models [PDF]

[arxiv 2024.08]PFDiff: Training-free Acceleration of Diffusion Models through the Gradient Guidance of Past and Future [PDF]

[arxiv 2024.08] SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher[PDF]

[arxiv 2024.08]Distribution Backtracking Builds A Faster Convergence Trajectory for One-step Diffusion Distillation [PDF, Page]

[arxiv 2024.09]VQ4DiT: Efficient Post-Training Vector Quantization for Diffusion Transformers[PDF]

[arxiv 2024.09]Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization [PDF, Page]

[arxiv 2024.09]FlowTurbo: Towards Real-time Flow-Based Image Generation with Velocity Refiner [PDF, Page]

[arxiv 2024.10] Simple and Fast Distillation of Diffusion Models [PDF,Page]

[arxiv 2024.10] Relational Diffusion Distillation for Efficient Image Generation [PDF,Page]

[arxiv 2024.10] Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer [PDF,Page]

[arxiv 2024.10] FasterDiT: Towards Faster Diffusion Transformers Training without Architecture Modification [PDF,Page]

[arxiv 2024.10] Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices[PDF,Page]

[arxiv 2024.10] One Step Diffusion via Shortcut Models [PDF,Page]

[arxiv 2024.10] BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities [PDF,Page]

[arxiv 2024.10] One-Step Diffusion Distillation through Score Implicit Matching [PDF,Page]

[arxiv 2024.10] DiP-GO: A Diffusion Pruner via Few-step Gradient Optimization [PDF]

[arxiv 2024.10] Simplifying, stabilizing, and scaling continuous-time consistency models [PDF,Page]

[arxiv 2024.10] Fast constrained sampling in pre-trained diffusion models [PDF]

[arxiv 2024.10] Flow Generator Matching [PDF,Page]

[arxiv 2024.10] Multi-student Diffusion Distillation for Better One-step Generators [PDF,Page]

[arxiv 2024.10] Diff-Instruct*: Towards Human-Preferred One-step Text-to-image Generative Models [PDF,Page]

[arxiv 2024.10] [PDF,Page]

consistency model

[arxiv 2024.10] Simplifying, stabilizing, and scaling continuous-time consistency models [PDF,Page]

[arxiv 2023.06]Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference [PDF]

[arxiv 2024.10] Stable Consistency Tuning: Understanding and Improving Consistency Models [PDF,Page]

[arxiv 2024.10] Truncated Consistency Models [PDF,Page]

[arxiv 2024.10] [PDF,Page]

limited data

[arxiv 2023.06]Decompose and Realign: Tackling Condition Misalignment in Text-to-Image Diffusion Models [PDF]

Study

[CVPR 2023]Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models [PDF]

[arxiv 2023.06]Understanding and Mitigating Copying in Diffusion Models [PDF, code]

[arxiv 2023.06]Intriguing Properties of Text-guided Diffusion Models [PDF]

[arxiv 2023.06]Stable Diffusion is Untable [PDF]

[arxiv 2023.06]A Geometric Perspective on Diffusion Models [[PDF])(https://arxiv.org/abs/2305.19947)]

[arxiv 2023.06]Emergent Correspondence from Image Diffusion [PDF]

[arxiv 2023.06]Evaluating Data Attribution for Text-to-Image Models [PDf, Page]

[arxiv 2023.06]Norm-guided latent space exploration for text-to-image generation [PDF]

[arxiv 2023.06]Training-free Diffusion Model Adaptation for Variable-Sized Text-to-Image Synthesis [PDF]

[arxiv 2023.07]On the Cultural Gap in Text-to-Image Generation [PDF]

[arxiv 2023.07]How to Detect Unauthorized Data Usages in Text-to-image Diffusion Models [PDF]

[arxiv 2023.08]Manipulating Embeddings of Stable Diffusion Prompts [PDF]

[arxiv 2023.10]Text-image Alignment for Diffusion-based Perception [PDF,Page]

[arxiv 2023.10]What Does Stable Diffusion Know about the 3D Scene? [PDF]

[arxiv 2023.11]Holistic Evaluation of Text-To-Image Models [PDF]

[arxiv 2023.11]On the Limitation of Diffusion Models for Synthesizing Training Datasets [PDF]

[arxiv 2023.12]Rich Human Feedback for Text-to-Image Generation [PDF]

[arxiv 2024.1]Resolution Chromatography of Diffusion Models [PDF]

[arxiv 0224.04]Bigger is not Always Better: Scaling Properties of Latent Diffusion Models [[PDF(https://arxiv.org/abs/2404.01367)]

[arxiv 2024.08] Not Every Image is Worth a Thousand Words: Quantifying Originality in Stable Diffusion [PDF]

[arxiv 2024.08]GradBias: Unveiling Word Influence on Bias in Text-to-Image Generative Models [PDF, Page]

[arxiv 2024.10] Magnet: We Never Know How Text-to-Image Diffusion Models Work, Until We Learn How Vision-Language Models Function [PDF,Page]

[arxiv 2024.10] Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models [PDF]

[arxiv 2024.10] Scaling Laws For Diffusion Transformers[PDF]

[arxiv 2024.10] [PDF,Page]

Evaluation

[arxiv 2024.1]Rethinking FID: Towards a Better Evaluation Metric for Image Generation [PDF]

[arxiv 2024.04]Evaluating Text-to-Visual Generation with Image-to-Text Generation [PDF,Page]

[arxiv 2024.04]Who Evaluates the Evaluations? Objectively Scoring Text-to-Image Prompt Coherence Metrics with T2IScoreScore (TS2) [PDF]

[arxiv 2024.05]FAIntbench: A Holistic and Precise Benchmark for Bias Evaluation in Text-to-Image Models [PDF]

[arxiv 2024.06]GAIA: Rethinking Action Quality Assessment for AI-Generated Videos[PDF]

[arxiv 2024.06]Words Worth a Thousand Pictures: Measuring and Understanding Perceptual Variability in Text-to-Image Generation [PDF]

[arxiv 2024.06]PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models [PDF]

[arxiv 2024.06]Holistic Evaluation for Interleaved Text-and-Image Generation [PDF, ]

[arxiv 2024.08]E-Bench: Subjective-Aligned Benchmark Suite for Text-Driven Video Editing Quality Assessment [PDF]

[arxiv 2024.10] GRADE: Quantifying Sample Diversity in Text-to-Image Models [PDF,Page]

[arxiv 2024.10] [PDF,Page]

Feedback

[arxiv 2023.11]Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model [PDF]

[arxiv 2024.03]AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in Text-to-Image Generation [PDF]

[arxiv 2024.04]RL for Consistency Models: Faster Reward Guided Text-to-Image Generation [PDF]

[arxiv 2024.04]Aligning Diffusion Models by Optimizing Human Utility [PDF]

[arxiv 2024.07]FDS: Feedback-guided Domain Synthesis with Multi-Source Conditional Diffusion Models for Domain Generalization [PDF, Page]

[arxiv 2024.08]Towards Reliable Advertising Image Generation Using Human Feedback [PDF]

[arxiv 2024.08] I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing[PDF, Page]

[arxiv 2024.10] IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation [PDF,Page]

[arxiv 2024.10] Scalable Ranked Preference Optimization for Text-to-Image Generation [PDF,Page]

[arxiv 2024.10] PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference [PDF,Page]

[arxiv 2024.10] [PDF,Page]

Finetuning

[arxiv 2021.07] Low-rank Adaptation for Fast Text-to-Image Diffusion Fine-tuning [PDF, code]

[arxiv 2023.02]DoRA: Weight-Decomposed Low-Rank Adaptation [PDF]

[arxiv 2024.06]Spectrum-Aware Parameter Efficient Fine-Tuning for Diffusion Models [PDF]

Related

[arxiv 2022.04]VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance [PDF, Code]

[arxiv 2022.11]Investigating Prompt Engineering in Diffusion Models [PDF]

[arxiv 2022.11]Versatile Diffusion: Text, Images and Variations All in One Diffusion Model [PDF]

[arxiv 2022.11; ByteDance]Shifted Diffusion for Text-to-image Generation [PDF]

[arxiv 2022.11; ]3DDesigner: Towards Photorealistic 3D Object Generation and Editing with Text-guided Diffusion Models [PDF]

[ECCV 2022; Best Paper] Partial Distance: On the Versatile Uses of Partial Distance Correlation in Deep Learning. [PDF]
[arxiv 2022.12]SinDDM: A Single Image Denoising Diffusion Model [PDF]

[arxiv 2022.12] Diffusion Guided Domain Adaptation of Image Generators [PDF]

[arxiv 2022.12]Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis [PDF]

[arxiv 2022.12]Scalable Diffusion Models with Transformers[PDF]

[arxiv 2022.12] Generalized Decoding for Pixel, Image, and Language [PDF, Page]

[arxiv 2023.03]Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models [PDF]

[arxiv 2023.03]Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models [PDF, Page]

[arxiv 2023.03]Larger language models do in-context learning differently [PDF]

[arxiv 2023.03]One Transformer Fits All Distributions in Multi-Modal Diffusion at Scale [PDF]

[arxiv 2023.03]Align, Adapt and Inject: Sound-guided Unified Image Generation [PDF]

[arxiv 2023.11]ToddlerDiffusion: Flash Interpretable Controllable Diffusion Model [PDF]

[arxiv 2024.04]Many-to-many Image Generation with Auto-regressive Diffusion Models [PDF]

[arxiv 2024.04]On the Scalability of Diffusion-based Text-to-Image Generation [PDF]

[arxiv 2024.06]Frozen CLIP: A Strong Backbone for Weakly Supervised Semantic Segmentation [PDF]

[arxiv 2024.06]Diffusion Models in Low-Level Vision: A Survey [PDF]

[arxiv 2024.06]A Survey of Multimodal-Guided Image Editing with Text-to-Image Diffusion Models [PDF, Page]

[arxiv 2024.07] Replication in Visual Diffusion Models:A Survey and Outlook[PDF, Page]

[arxiv 2024.09]A Survey of Multimodal Composite Editing and Retrieval [PDF]

[arxiv 2024.09]Pushing Joint Image Denoising and Classification to the Edge [PDF]

[arxiv 2024.09] Alignment of Diffusion Models: Fundamentals, Challenges, and Future[PDF]

[arxiv 2024.09]Taming Diffusion Models for Image Restoration: A Review [PDF]

[arxiv 2024.10] [PDF,Page]

architecture

[arxiv 2024.09]HydraViT: Stacking Heads for a Scalable ViT [PDF, Page]

[arxiv 2024.10] MaskMamba: A Hybrid Mamba-Transformer Model for Masked Image Generation[PDF]

[arxiv 2024.10] Dynamic Diffusion Transformer [PDF,Page]

[arxiv 2024.10] DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation [PDF]

[arxiv 2024.10] Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer [PDF,Page]

[arxiv 2024.10] GlobalMamba: Global Image Serialization for Vision Mamba [PDF,Page]

[arxiv 2024.10] MoH: Multi-Head Attention as Mixture-of-Head Attention[PDF,Page]

[arxiv 2024.10] [PDF,Page]

Data

[arxiv 2024.06]What If We Recaption Billions of Web Images with LLaMA-3? [PDF, Page]

Repository

DIFFUSERS Hugging-face sota repository. [DIFFUSERS]

[arxiv 2023.03]Text-to-image Diffusion Model in Generative AI: A Survey [PDF]

[arxiv 2023.04]Synthesizing Anyone, Anywhere, in Any Pose[PDF]

real-to-cg

[arxiv 2024.09]Synergy and Synchrony in Couple Dances [PDF, Page]

[arxiv 2024.10] [PDF,Page]