README.md

TLM-SimpleQA-Benchmark

This folder contains the dataset and the code to reproduce the SimpleQA benchmark we published in our blog post.

API keys:

A Cleanlab API key is required to run this benchmark, get a Cleanlab API key at https://app.cleanlab.ai/tlm
An OpenAI API key is also required to run this benchmark, get an OpenAI API key at https://platform.openai.com/api-keys

To reproduce the benchmarks:

Use the get_tlm_responses.ipynb notebook to get and save the TLM responses and trustworthiness scores.
Use the evaluate_responses.ipynb notebook to the evaluate the TLM responses.