This is an unofficial repo for the paper: Improve Vision Language Model Chain-of-thought Reasoning
- [12/24 - 01/25] sft, dpo pipeline, distill gpt, inference + eval.
- [10.22] we will provide third party implementation for arxiv paper
ShareGPT4o-reasoning 193k cot prediction + filtered direct prediction ShareGPT4o-reasoning-dpo 66k DPO data on 3 domains: aokvqa, math and chartqa
Open-LLaVA-NeXT: same as https://github.com/xiaoachen98/Open-LLaVA-NeXT, used as our base model
LLaVA-Reasoner-SFT-preview: SFT with direct + CoT
LLaVA-Reasoner-SFT: SFT with direct + CoT (additional math than above)
LLaVA-Reasoner-DPO-preview: DPO from SFT-preview
# setup environment, need to fill in the required fields
source setup/setup_env.sh
# data
source setup/setup_train_data.sh
cd llava_reasoner
bash scripts_sft/sft_direct+cot_preview.sh \
$SAVE_DIR/sft/LLaVA-Reasoner-SFT-preview
cd llava_reasoner
bash scripts_dpo/dpo_llava_reasoner_preview.sh \
$SAVE_DIR/dpo/LLaVA-Reasoner-DPO-preview
##citation @article{zhang2024improve, title={Improve vision language model chain-of-thought reasoning}, author={Zhang, Ruohong and Zhang, Bowen and Li, Yanghao and Zhang, Haotian and Sun, Zhiqing and Gan, Zhe and Yang, Yinfei and Pang, Ruoming and Yang, Yiming}, journal={arXiv preprint arXiv:2410.16198}, year={2024} }
Thanks to
(open-llava-next)[https://github.com/xiaoachen98/Open-LLaVA-NeXT]: for base model and sft training
(LLaVA-Hound)[https://github.com/RifleZhang/LLaVA-Hound-DPO/tree/main]: for dpo related