Skip to content

Latest commit

 

History

History
22 lines (17 loc) · 1.64 KB

File metadata and controls

22 lines (17 loc) · 1.64 KB

Benchmark hallucination detection models on RAG datasets

This folder contains notebooks used to evaluate popular hallucination detection models on various RAG (Context, Question, LLM Response) datasets.

The datasets used in the benchmark include:

The following table lists the models evaluated:

Notebook Description
Patronus Lynx Evaluates Patronux Lynx 70B model
Vectara HHEM Evaluates Vectara's HHEM v2.1 model
Prometheus 2 Evaluates Prometheus2 8x7B model
LLM as judge and TLM Evaluates LLM-as-judge and the Trustworthy Language Model on ELI5 dataset
LLM as judge and TLM Evaluates LLM-as-judge and the Trustworthy Language Model on Halubench datasets