Prepare different data according to the following dataset card.
Stage | Data Name | Data Path |
---|---|---|
MiniGPT-4 DPO Training | Visual Genome, HA-DPO data, CCSBUAlign | ha_dpo/data/VG , ha_dpo/data/hadpo/minigpt4 , ha_dpo/data/cc_sbu_align |
LLaVA-1.5 DPO Training | Visual Genome, HA-DPO data | ha_dpo/data/VG , ha_dpo/data/ha_dpo/llava-v1.5 |
InstructBLIP DPO Training | Visual Genome, HA-DPO data | ha_dpo/data/VG , ha_dpo/data/ha_dpo/InstructBLIP |
SHR Evaluation | Visual Genome, SHR annotation | ha_dpo/data/VG |
POPE Evaluation | COCO val2014, POPE annotation | ha_dpo/data/coco2014 , ha_dpo/data/POPE |
For example, if you want to perform MiniGPT-4 model training, following data are needed:
-
MiniGPT-4 hallucination-aware positive-negative data
-
Visual Genome
-
CCSBUAlign
We provide hallucination-aware positive-negative data targeted at each LVLM. Download data from the following and put data under ha_dpo/data/hadpo
:
MiniGPT-4 | LLaVA-1.5 | InstructBLIP |
---|---|---|
huggingface, opendatalab | huggingface, opendatalab | huggingface, opendatalab |
data structure
ha_dpo/data/hadpo
├── llava-v1.5
│ ├── desc_data.json
│ └── pope_data.json
├── InstructBLIP
│ ├── desc_data.json
│ └── pope_data.json
└── minigpt4
├── desc_data.json
└── pope_data.json
Download these data from Visual Genome v2 and put data under ha_dpo/data/VG
:
- images (both part1 and part2)
- image meta-data
- region descriptions
data structure
ha_dpo/data/VG
├── image_data.json
├── region_descriptions.json
├── VG_100K
│ └──...
└── VG_100K_2
└──...
Download SHR human-annotated factual annotation from download and put data under ha_dpo/data/shr
.
data structure
ha_dpo/data/shr
├── shr_factual_part1.jsonl
├── shr_factual_part2.jsonl
└── val_images_final.json
Clone POPE repo from official website and put data under ha_dpo/data
.
data structure
ha_dpo/data/
└── POPE
└── ...
For POPE evaluation, coco2014 images are required. Download COCO2014 validation images from COCO and put data under ha_dpo/data/shr
.
data structure
ha_dpo/data/coco2014
└── val2014
└── ...
CCSBUAlign is the SFT data used in MiniGPT-4. We use this data only during MiniGPT-4 model training to ensure the stability of preference learning. Refer to official website to obtain CCSBUAlign data. Put data under ha_dpo/data/cc_sbu_align
.
data structure
ha_dpo/data/cc_sbu_align
├── filter_cap.json
└── image
└── ...