DPO performance on other models #9

thusharakart · 2024-02-20T14:13:30Z

Do you have data on the performance of DPO with models other than Qwen-VL-Chat? I found that it degrades both perception and cognition in MME when used with LLaVA-1.5.

TobiasLee · 2024-02-25T09:00:35Z

We did not explore the DPO with LLaVA models. Could you share your results and example outputs before/after DPO so we can dig into it?

thusharakart · 2024-03-02T10:53:05Z

The following are the results for MME benchmark.

MME score { perception, cognition, ocr }
LLaVA-v1.5-7B with DPO {1342, 313, 125}
LLaVA-v1.5-13B with DPO {1425, 312, 130}

TobiasLee · 2024-03-03T04:51:21Z

How many epochs have your trained with DPO?

thusharakart · 2024-03-03T05:40:02Z

Above results are from 1 epoch training for 7B model and 3 epoch training for 13B model.

TobiasLee · 2024-03-30T12:52:59Z

I'm sorry for not getting back to you sooner. We also recently explored performing DPO training on the LLaVA backbone and observed degraded MME performance. However, the scores on other benchmarks have consistently improved.

Model	MM-Vet	MMHal	MMBench
LLaVA-v1.5-7B	30.5	2.42	63.0
LLaVA-v1.5-7B + DPO	31.7	2.62	63.9

We attribute that the simple answer format required by MME cannot be followed by the model after DPO training, and would like to investigate it later.

choucaicai · 2024-05-20T02:44:21Z

may be you can add a prompt like this query = f'<img>{img_path}</img>\n{question} you can only use "Yes" or "No" as your responses without adding any extra text or explanation.

TobiasLee · 2024-06-05T06:38:52Z

Hi all, we found a great repo with the support/results of many other models: https://github.com/TideDra/VL-RLHF

The performance can be boosted almost consistently for LLaVA-Next series models. So my guess is that the current LLaVA-v1.5 series model is too weak to serve as a starting model for DPO ( possibly due to its lower resolution 336 v.s. Qwen-VL). LLaVA-Next series is more powerful with the image tiling mechanism.

Check it out if you want to further explore the DPO/RLHF with VLFeedback!

TobiasLee added the enhancement New feature or request label Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DPO performance on other models #9

DPO performance on other models #9

thusharakart commented Feb 20, 2024

TobiasLee commented Feb 25, 2024

thusharakart commented Mar 2, 2024

TobiasLee commented Mar 3, 2024

thusharakart commented Mar 3, 2024

TobiasLee commented Mar 30, 2024

choucaicai commented May 20, 2024 •

edited

Loading

TobiasLee commented Jun 5, 2024 •

edited

Loading

DPO performance on other models #9

DPO performance on other models #9

Comments

thusharakart commented Feb 20, 2024

TobiasLee commented Feb 25, 2024

thusharakart commented Mar 2, 2024

TobiasLee commented Mar 3, 2024

thusharakart commented Mar 3, 2024

TobiasLee commented Mar 30, 2024

choucaicai commented May 20, 2024 • edited Loading

TobiasLee commented Jun 5, 2024 • edited Loading

choucaicai commented May 20, 2024 •

edited

Loading

TobiasLee commented Jun 5, 2024 •

edited

Loading