DILLEMA

DILLEMA: Diffusion Model and Large Language Model for Augmentation, a framework that enhances the robustness of DL-based systems by generating diverse, realistic test images from existing datasets. DILLEMA leverages the recent advances in text and visual models to generate synthetic and accurate images to test DL-based systems in scenarios and conditions that can be not represented in the existing testing suite as data augmentation.

DILLEMA has 5 step processes:

Image Captioning: the process of converting a given image into a detailed textual description.
Keywords Identification: aims to identify which elements of the image can be safely modified without altering the overall meaning.
Alternatives Identification: explore different possibilities for modifying the elements flagged in the previous step, such as changing the color of objects, adjusting environmental conditions (e.g., weather).
Counterfactual Caption Generation: creating new textual descriptions, or counterfactual captions, by applying the alternatives generated in the previous step.
Conterfactual Image Generation: generates a modified image based on the counterfactual caption.

In the deployment, we decided to convert into 3 processes, image captioning, generating counterfactual, and image generation. Generating counterfactual actually combining step 2, 3, and 4. We do that for just the matter of computational resource efficiency.

Getting Started

Environment installation, we are using Conda which tested and work properly using Python >= 3.10. Then install all requirement.

git clone https://github.com/irfanmasoudi/DILLEMA.git
cd DILLEMA
pip install -r requirements.txt

Counterfactual generation need LLMs pretrained model, we use LLaMA-2. We use quantized 5-bit precision. But, you can change with other quantization.

python3 downloadLLAMA.py

Datasets

SHIFT dataset is the Semantic Segmentation task example (synthetic dataset for autonomous driving).

ImageNet1K is the example dataset for image classification which use used to train ResNet18, ResNet50, and ResNet152.

Image Captioning

In the experiment we deployed ImageNet1K and SHIFT dataset. DILLEMA_captioning_imagenet.py for ImageNet1K and DILLEMA_captioning_shift.py. The only different is in the dataset root path, captioning result folder path and the data loader.

cd captioning
python3 DILLEMA_captioning_imagenet.py

Counterfactual Generation

Generating counterfactual will have the same setting with Image Captioning. For the initial setting, counterfactual generation need also set the task specification task = "ImageNet image classification with 1000 labels". The ImageNet1K in our the experiment, adding the label into LLMs prompt getting a better result compare to if we are not put in the prompt. In the counterfactual directory you can see also the setting for limiting the counterfactual result for 25 images per class and with percentage iteration condition with this script DILLEMA_counterfactual_imagenet_limit25.py. If you need to run all the dataset, you can use:

cd counterfactual
python3 DILLEMA_counterfactual_imagenet.py

There is also some possible error due to LLMs response that not comply with the format specifications. Then you can also run the bash script run.sh.

#!/bin/bash
while true
do
 python3 DILLEMA_counterfactual_imagenet.py || echo "Error... restarting..." >&2
 echo "Press Ctrl-C to quit." && sleep 1
done

Image Generation

Controlling the Diffusion Model with conditioning mechanism, spatial context and text descriptions is an important component in the DILLEMA Image generation.

cd imagegen
DILLEMA_imagegen_imagenet.py

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
captioning		captioning
counterfactual		counterfactual
figures		figures
imagegen		imagegen
.DS_Store		.DS_Store
README.md		README.md
downloadLLAMA.py		downloadLLAMA.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DILLEMA

Getting Started

Datasets

Image Captioning

Counterfactual Generation

Image Generation

About

Releases

Packages

Languages

deib-polimi/dillema

Folders and files

Latest commit

History

Repository files navigation

DILLEMA

Getting Started

Datasets

Image Captioning

Counterfactual Generation

Image Generation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages