Reference code for ACL22 paper - Answer-level Calibration for Free-form Multiple Choice Question Answering.
The code was written with, or depends on:
- Python 3.8
- Pytorch 1.9.1
- Transformers 4.11.3
- Create a virtualenv and install dependecies
virtualenv -p python3.8 venv source env/bin/activate pip3 install -r requirements.txt
- Set up the environment
bash setup_env.sh
- Run zero-shot experiment using
Run k-shot experiment using
bash run_zs.sh ${gpudev} ${dataname} ${split}
Valid datanames are COPA, commonsenseqa, mctaco, piqa, socialiqa, winogrande, arc_easy, arc_challenge, dream, swag and hendrycks_test. For hendrycks_test, additionally pass the category:bash run_fs.sh ${gpudev} ${dataname} ${split} {k}
where the category is in humanities, social_sciences, STEM, otherbash run_zs.sh ${gpudev} ${dataname} ${split} "-data_config ${category}"
If you use this code, please consider citing:
[1] Sawan Kumar. 2022. Answer-level Calibration for Free-form Multiple Choice Question Answering. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 665–679, Dublin, Ireland. Association for Computational Linguistics. [bibtex]
For any clarification, comments, or suggestions please create an issue or contact [email protected]