This is the official repository for the CUPCase paper and dataset.
This repository is comprised of three main components:
- lm_eval - for the evaluation of on premise models like llama3.1, Meditron, BioMistral
- gpt_medlm_evaluation - for the evaluation of API based LLMs like GPT-4o and Medlm-large
- utils, preprocess - for general utilis and preprocessing of the CUPCase dataset To use any of the above, follow the specific readme files in each.
The dataset is available on huggingface - https://huggingface.co/datasets/ofir408/CupCase
The CUPCase paper was accepted to AAAI 2025, and will be published soon.