Skip to content

Commit

Permalink
description at top
Browse files Browse the repository at this point in the history
  • Loading branch information
jwmueller authored Jul 2, 2024
1 parent 87f0c06 commit 5cd8e19
Showing 1 changed file with 5 additions and 1 deletion.
6 changes: 5 additions & 1 deletion llm_evals_w_crowdlab/llm_evals_w_crowdlab.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,11 @@
"source": [
"# LLM Evals with humans, AI judges, and GPT token probabilities\n",
"\n",
"This notebook demonstrates how to use Cleanlab's CROWDLAB to produce high-quality annotations from a mix of human and LLM evaluators, although this technique is useful anytime you are dealing with multiple fallible annotators. For a dataset, we use the famous MT-Bench collection of pairwise LLM completions, annotated by the authors of the papers and \"experts\" (graduate students). We show how you can use CROWDLAB to produce high-quality final labels as well as measure the quality of the annotators.\n"
"This notebook demonstrates how to use Cleanlab's [CROWDLAB](https://cleanlab.ai/blog/multiannotator/) method for reliable LLM Evaluation with multiple judges (a mix of human and LLM evaluators).\n",
"\n",
"Here we consider the MT-Bench dataset, which contains: many user requests, two possible responses for each request from different LLM models, and annotations regarding which of the two responses is considered better. Each example has a varying number of judge annotations provided by authors of the original paper and other \"experts\" (graduate students). We use CROWDLAB to: produce high-quality final consensus annotations (to enable accurate LLM Evals) as well as measure the quality of the annotators. CROWDLAB relies on probabilistic predictions from any ML model -- here we use logprobs from GPT-4 applied in the LLM-as-judge framework.\n",
"\n",
"You can use the same technique for any LLM Evals involving multiple human/AI judges, to help your team better evaluate models.\n"
]
},
{
Expand Down

0 comments on commit 5cd8e19

Please sign in to comment.