This repo inculdes the code in the paper PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning (NAACL 2024 Long Paper).
torch
>= 2.0transformers
- download pre-trained models (CodeT5_small/base/large on Hugging Face)
├── Data
└── GSM8K
└── train-enhanced.json # pad-augmented gsm8k training data by gpt-3.5-turbo
└── test_add_code.json # test data with pad-augmented label code
└── MultiArith # test data with pad-augmented label code
└── SVAMP # test data with pad-augmented label code
└── ASDiv # test data with pad-augmented label code
The data of self-refine task is here
Execute the following command to re-produce our models:
sh run_seq2seq.sh
run the following scripts to generate your results:
sh run_seq2seq_test.sh