JudgeBlender

JudgeBlender: Ensembling Judgments for Automatic Relevance Assessment

This repo is updating, more codes and data will be available soon!

Abstract

The effective training and evaluation of retrieval systems require a substantial amount of relevance judgments, which are traditionally collected from human assessors – a process that is both costly and time-consuming. Large Language Models (LLMs) have shown promise in generating relevance labels for search tasks, offering a potential alternative to manual assessments. Current approaches often rely on a single LLM, such as GPT-4, which, despite being effective, are expensive and prone to intra-model biases that can favour systems leveraging similar models. In this work, we introduce JudgeBlender, a framework that employs smaller, open-source models to provide relevance judgments by combining evaluations across multiple LLMs (LLMBlender) or multiple prompts (PromptBlender). By leveraging the LLMJudge benchmark [18], we compare JudgeBlender with state-of-the-art methods and the top performers in the LLMJudge challenge. Our results show that JudgeBlender achieves competitive performance, demonstrating that very large models are often unnecessary for reliable relevance assessments.

Methods

PromptBlender

LLMBlender

Baselines

The baseline models jusgments and prompts are available at https://llm4eval.github.io/LLM-as-a-rel/

Results

Cite

@article{rahmani2024judgeblender,
  title={JudgeBlender: Ensembling Judgments for Automatic Relevance Assessment},
  author={Rahmani, Hossein A and Yilmaz, Emine and Craswell, Nick and Mitra, Bhaskar},
  journal={arXiv preprint arXiv:2412.13268},
  year={2024}
}

Acknowledgments

This research is supported by the Engineering and Physical Sciences Research Council [EP/S021566/1] and the EPSRC Fellowship titled "Task Based Information Retrieval" [EP/P024289/1].

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
figs		figs
README.md		README.md
corr-analysis.ipynb		corr-analysis.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JudgeBlender

Abstract

Methods

PromptBlender

LLMBlender

Baselines

Results

Cite

Acknowledgments

About

Releases

Packages

Languages

rahmanidashti/JudgeBlender

Folders and files

Latest commit

History

Repository files navigation

JudgeBlender

Abstract

Methods

PromptBlender

LLMBlender

Baselines

Results

Cite

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages