$Logo$ Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning

Official repository of "Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning".

Install

conda create -n mathpuma python=3.9 -y
conda activate mathpuma
pip install -r requirements.txt

Model Weights

The model weights for this project are hosted on Hugging Face.

Model	Download
Math-PUMA_Qwen2VL-1.5B	🤗 Hugging Face
Math-PUMA_Qwen2VL-7B	🤗 Hugging Face
Math-PUMA_DeepSeek-Math-VL-7B	🤗 Hugging Face

Data for Training

The training data used for this model is also available on Hugging Face. You can find the dataset by visiting this link.

Train

Stage 1: Enhancing the Language Model's Mathematical Reasoning Abilities

We leverage the fine-tuning code from two repositories:

QwenLM
DeepSeek-LLM

Stage 2: Progressive Upward Multimodal Alignment (PUMA)

In ./train/deepseek_math/train_script.py or ./train/qwen2/train_script.py:

Set USE_KL to "true", and set KL hyperparameters ALPHA_KL, LAMBDA_KL, and TEMP_KL.
Set TRAINABLE_PARTS to "aligner, vision_tower_low, vision_tower_high".
Set DATA_PATH, it is worth noting that the data files must contain keys image_url_2, instruction_2, and output_2.
Run ./train/deepseek_math/train_script.py or ./train/qwen2/train_script.py.

Stage 3: Multimodal Instruction Tuning

In ./train/deepseek_math/train_script.py or ./train/qwen2/train_script.py:

Set USE_KL to "false".
Set TRAINABLE_PARTS to "all".
Set DATA_PATH.
Run ./train/deepseek_math/train_script.py or ./train/qwen2/train_script.py.

Evaluate

Download images of MathVerse, MathVista, and We-Math, and put them into ./eval/data/<benchmark>/images.

In ./eval/evaluate/benchmark.py:

Set benchmark to one of ["mathverse", "mathvista", "wemath"].
To evaluate DeepSeek-Math based MLLM, set model_type to deepseek-vl, is_customvlm to "false", and provide model_path; to evaluate Qwen2 based MLLM or other customized MLLMs, set is_customvlm to "true", and provide model_path.
Run ./eval/evaluate/benchmark.py.

Citation

If you find Math-PUMA useful for your research and applications, please kindly cite using this BibTeX:

@article{zhuang2024math,
  title={Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning},
  author={Zhuang, Wenwen and Huang, Xin and Zhang, Xiantao and Zeng, Jin},
  journal={arXiv preprint arXiv:2408.08640},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

$Logo$ Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning

Install

Model Weights

Data for Training

Train

Stage 1: Enhancing the Language Model's Mathematical Reasoning Abilities

Stage 2: Progressive Upward Multimodal Alignment (PUMA)

Stage 3: Multimodal Instruction Tuning

Evaluate

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning

Install

Model Weights

Data for Training

Train

Stage 1: Enhancing the Language Model's Mathematical Reasoning Abilities

Stage 2: Progressive Upward Multimodal Alignment (PUMA)

Stage 3: Multimodal Instruction Tuning

Evaluate

Citation

$Logo$ Math-PUMA: Progressive Upward Multimodal Alignment to Enhance Mathematical Reasoning