Combining Evolutionary Computing with Diffusion Models
- 🎨 Aesthetics Maximization/Minimization using LAION Aesthetics Predictor V2
- 📊 Multi-Objective Optimization with CLIP-IQA metrics
- 🛡️ Evading AI-Image Detection by optimizing against a fine-tuned SDXL AI-Image-Detector
- 🧭 Navigating the CLIP-Score Landscape for Prompt-Matching
Notebook | Link |
---|---|
Genetic Algorithm | |
Island Genetic Algorithm | |
NSGA |
Image results will be saved in your Google Drive in the folder evolutionary
. Each generation
creates a new folder where the images will be saved in. You can change the folders in the notebook.
Optimizing the aesthetics predictor as a maximization problem, the algorithm came to a max Aesthetics score of 8.67. This score is higher than the examples from the real LAION English Subset dataset have, with the red line showing the limit. A wide variety of prompts (inspired by parti prompts) was used for the initial population.
ga_200gen_100pop_aesthetic.mp4
Parameters:
population_size = 100
num_generations = 200
batch_size = 1
elitism = 1
creator = SDXLPromptEmbeddingImageCreator(pipeline_factory=setup_pipeline, batch_size=batch_size, inference_steps=3)
evaluator = AestheticsImageEvaluator()
crossover = PooledArithmeticCrossover(0.5, 0.5)
mutation_arguments = UniformGaussianMutatorArguments(mutation_rate=0.1, mutation_strength=2, clamp_range=(-900, 900))
mutation_arguments_pooled = UniformGaussianMutatorArguments(mutation_rate=0.1, mutation_strength=0.3, clamp_range=(-8, 8))
mutator = PooledUniformGaussianMutator(mutation_arguments, mutation_arguments_pooled)
selector = TournamentSelector(tournament_size=3)
Performing an Island GA by creating random embeddings and mixing them with artist embeddings to get mixtures of styles and new ideas.
More detailed results can be found in a separate repository dedicated to the results of the experiments: https://github.com/malthee/evolutionary-diffusion-results
- AestheticsImageEvaluator: Uses the LAION Aesthetics Predictor V2. Blog: https://laion.ai/blog/laion-aesthetics/
- CLIPScoreEvaluator: Using the torchmetrics implementation for CLIP-Score
- (Single/Multi)CLIPIQAEvaluator: Using the torchmetrics implementation for CLIP Image Quality Assessment.
- AIDetectionImageEvaluator: Using the original Version from HuggingFace, or the fine-tuned one for SDXL generated images
Current supported creators working in the prompt embedding space:
- SDXLPromptEmbeddingImageCreator: Supports the SDXL pipeline, creates both prompt- and pooled-prompt-embeddings.
- SDPromptEmbeddingImageCreator: Only has prompt-embeddings, is faster but produces less quality results than SDXL.
There are multiple notebooks exploring the speed and quality of models for generation and fitness-evaluation. These notebooks also allow for simple inference so that any model can be tried out easily.
- diffusion_model_comparison: tries out different diffusion models with varying arguments (inference steps, batch size) to find out the optimal model for image generation in an evolutionary context (generation speed & quality)
- clip_evaluators: uses torch metrics with CLIPScore and CLIP IQA. CLIPScore could define the fitness for "prompt fulfillment" or "image alignment" while CLIP IQA has many possible metrics like "quality, brightness, happiness..."
- ai_detection_evaluator: uses a pre-trained model for AI image detection. This could be a fitness criteria to minimize "AI-likeness" in images.
- aesthetics_evaluator: uses a pre-trained model from the maintainers of the LAION image dataset, which scores an image 0-10 depending on how "aesthetic" it is. Could be used as a maximization criteria for the fitness of images.
- clamp_range: testing the usual prompt-embedding min and max values for different models, so that a CLAMP range can be set in the mutator for example. Using the parti prompts.
- crossover_mutation_experiments: testing different crossover and mutation strategies to see how they work in the prompt embedding space