Skip to content

A novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models.

License

Notifications You must be signed in to change notification settings

taco-group/Re-Align

Repository files navigation

Re-Align

Code License arXiv

Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization

Re-Align is a novel alignment framework that leverages image retrieval, which not only mitigates hallucinations more effectively than previous methods but also yields significant performance gains in general visual question-answering (VQA) tasks. Moreover, Re-Align maintains robustness and scalability across a wide range of VLM sizes and architectures.

EMMA diagram

Figure 1. Benchmark performance comparison (min-max normalized).

Key Highlights

  • Controlled Hallucination Injection: Re-Align deliberately injects controlled hallucinations into chosen responses using image retrieval, generating rejected responses that offer more plausible and natural preference signals regarding hallucinations.

  • Dual Preference Dataset: By incorporating both the retrieved image and the original input image, Re-Align constructs a dual-preference dataset.

  • Alignment via rDPO: Our proposed rDPO objective—an extension of DPO that includes an additional visual preference optimization objective, further enhancing the alignment process with valuable visual preference signals.

News

  • [2025/2/18] 🔥We released Re-Align, a novel alignment framework that leverages image retrieval to mitigate hallucinations in Vision Language Models. Explore our paper and website for more details.

Installation

  1. Create a virtual environment with Conda and activate.
conda create -n re-align python=3.10 -y
conda activate re-align
  1. Install packages
pip install --upgrade pip  
pip install -e .
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
pip install trl
  1. Replace dpo_trainer.py with Re-Align/dpo_trainer. py.
rm /home/username/anaconda3/envs/re-align/lib/python3.10/site-packages/trl/trainer/dpo_trainer.py 
cp ./Re-Align/dpo_trainer.py /home/username/anaconda3/envs/re-align/lib/python3.10/site-packages/trl/trainer/

Usage

After setting up the environment, you can start using Re-Align with the following instructions:

  1. Download the entire train2014 split from MSCOCO.
cd dataset
wget http://images.cocodataset.org/zips/train2014.zip 
unzip train2014.zip
  1. Run Re-Align
    Use the following command to train llava-v1.6-vicuna-7b with Re-Align:
    • Quick Start:
    bash trian_lora_rdpo.sh
    • Command:
    deepspeed --include=localhost:0,1,2,3 --master_port 60000 train_rdpo.py \
      --model_name_or_path liuhaotian/llava-v1.6-vicuna-7b \
      --data_path "./preference_data/pref_data.json" \
      --deepspeed "./deepspeed/zero2.json" \
      --per_device_train_batch_size 1 \
      --per_device_eval_batch_size 1 \
      --gradient_accumulation_steps 8 \
      --evaluation_strategy "no" \
      --save_strategy "epoch" \
      --save_total_limit 1 \
      --learning_rate 1e-6 \
      --weight_decay 0. \
      --warmup_ratio 0.03 \
      --lr_scheduler_type "cosine" \
      --bf16 True \
      --lisa_enable False \
      --lora_enable True \
      --beta 0.1 \
      --output_dir "./output/llava-vicuna-7b-rdpo-lora-1e-6-beta-0.1" \
    

Citation

We are more than happy if this code is helpful to your work. If you use our code or extend our work, please consider citing our paper:

@article{Xing2025Feb,
	author = {Xing, Shuo and Wang, Yuping and Li, Peiran and Bai, Ruizheng and Wang, Yueqi and Qian, Chengxuan and Yao, Huaxiu and Tu, Zhengzhong},
	title = {{Re-Align: Aligning Vision Language Models via Retrieval-Augmented Direct Preference Optimization}},
	journal = {arXiv},
	year = {2025},
	month = feb,
	eprint = {2502.13146},
	doi = {10.48550/arXiv.2502.13146}
}