This is the codebase of the paper: CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought (arXiv).
Author List: Boxuan Zhang, Ruqi Zhang
[2024/02/21]🔥 We are releasing the CoT-UQ version 1.0 for running on Llama Family models.
Update your environment for the required dependency.
pip install -r requirement.txt
-
Datasets adopted in the paper are listed in
CoT-UQ/dataset/
-
You can also upload your json version of dataset on
CoT-UQ/dataset/
-
Setting for loading your dataset on
CoT-UQ/utils.py
.
if args.dataset.lower() == 'gsm8k':
for idx, line in enumerate(json_data):
q = line['question']
a = float(line['answer'])
id = 'temp_{}'.format(idx)
questions.append(q)
answers.append(a)
ids.append(id)
Get your Llama Family weight in https://huggingface.co/meta-llama
run_llama_pipeline.sh
is a script that executes all steps of our pipeline on the Llama
Family.
The components of our pipeline are:
inference_refining.py
focuses on refining the multi-step inference by extracting keywords and their corresponding importance scores to the final answer.stepuq.py
integrates the crucial reasoning information into the two common UQ strategies, aggregated probabilities and self-evaluation, respectively.
For instance, running the code on Llama3.1-8B
:
sh run_llama_pipeline.sh llama3-1_8B probas_mean hotpotQA output/llama-3.1-8B/
After running the pipeline, use analyze_result.py
to compute performance metrics, such as the AUROC.
python analyze_result.py --uq_engine probas_mean --dataset hotpotQA --output_path output/llama-3.1-8B/
- CoT-UQ consistently improves UQ performance across all tasks and datasets.
- This demonstrates that incorporating reasoning into uncertainty quantification enables LLMs to provide more calibrated assessments of the trustworthiness of their generated outputs.
- In general, CoT-UQ achieves greater improvements when applied to AP strategies compared to SE strategies, particularly for Probas-min, where it increases AUROC by up to 16.8%.
If you find our paper and repo useful, please cite our paper:
@article{zhang2025cot,
title={CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought},
author={Zhang, Boxuan and Zhang, Ruqi},
journal={arXiv preprint arXiv:2502.17214},
year={2025}
}