Official repository of "Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning".
[📖 Paper] [🤗 Models] [🤗 Datasets]
conda create -n mathpuma python=3.9 -y
conda activate mathpuma
pip install -r requirements.txt
The model weights for this project are hosted on Hugging Face.
Model | Download |
---|---|
Math-PUMA_Qwen2VL-1.5B | 🤗 Hugging Face |
Math-PUMA_Qwen2VL-7B | 🤗 Hugging Face |
Math-PUMA_DeepSeek-Math-VL-7B | 🤗 Hugging Face |
The training data used for this model is also available on Hugging Face. You can find the dataset by visiting this link.
We leverage the fine-tuning code from two repositories:
In ./train/deepseek_math/train_script.py
or ./train/qwen2/train_script.py
:
-
Set
USE_KL
to"true"
, and set KL hyperparametersALPHA_KL
,LAMBDA_KL
, andTEMP_KL
. -
Set
TRAINABLE_PARTS
to"aligner, vision_tower_low, vision_tower_high"
. -
Set
DATA_PATH
, it is worth noting that the data files must contain keysimage_url_2
,instruction_2
, andoutput_2
. -
Run
./train/deepseek_math/train_script.py
or./train/qwen2/train_script.py
.
In ./train/deepseek_math/train_script.py
or ./train/qwen2/train_script.py
:
-
Set
USE_KL
to"false"
. -
Set
TRAINABLE_PARTS
to"all"
. -
Set
DATA_PATH
. -
Run
./train/deepseek_math/train_script.py
or./train/qwen2/train_script.py
.
Download images of MathVerse, MathVista, and We-Math, and put them into ./eval/data/<benchmark>/images
.
In ./eval/evaluate/benchmark.py
:
-
Set
benchmark
to one of["mathverse", "mathvista", "wemath"]
. -
To evaluate
DeepSeek-Math
based MLLM, setmodel_type
todeepseek-vl
,is_customvlm
to"false"
, and providemodel_path
; to evaluateQwen2
based MLLM or other customized MLLMs, setis_customvlm
to"true"
, and providemodel_path
. -
Run
./eval/evaluate/benchmark.py
.
If you find Math-PUMA useful for your research and applications, please kindly cite using this BibTeX:
@article{zhuang2024math,
title={Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning},
author={Zhuang, Wenwen and Huang, Xin and Zhang, Xiantao and Zeng, Jin},
journal={arXiv preprint arXiv:2408.08640},
year={2024}
}