-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DPO performance on other models #9
Comments
We did not explore the DPO with LLaVA models. Could you share your results and example outputs before/after DPO so we can dig into it? |
The following are the results for MME benchmark. MME score { perception, cognition, ocr } |
How many epochs have your trained with DPO? |
Above results are from 1 epoch training for 7B model and 3 epoch training for 13B model. |
I'm sorry for not getting back to you sooner. We also recently explored performing DPO training on the LLaVA backbone and observed degraded MME performance. However, the scores on other benchmarks have consistently improved.
We attribute that the simple answer format required by MME cannot be followed by the model after DPO training, and would like to investigate it later. |
may be you can add a prompt like this |
Hi all, we found a great repo with the support/results of many other models: https://github.com/TideDra/VL-RLHF The performance can be boosted almost consistently for LLaVA-Next series models. So my guess is that the current LLaVA-v1.5 series model is too weak to serve as a starting model for DPO ( possibly due to its lower resolution 336 v.s. Qwen-VL). LLaVA-Next series is more powerful with the image tiling mechanism. Check it out if you want to further explore the DPO/RLHF with VLFeedback! |
Do you have data on the performance of DPO with models other than Qwen-VL-Chat? I found that it degrades both perception and cognition in MME when used with LLaVA-1.5.
The text was updated successfully, but these errors were encountered: