This folder contains the dataset and the code to reproduce the SimpleQA benchmark we published in our blog post.
API keys:
- A Cleanlab API key is required to run this benchmark, get a Cleanlab API key at https://app.cleanlab.ai/tlm
- An OpenAI API key is also required to run this benchmark, get an OpenAI API key at https://platform.openai.com/api-keys
To reproduce the benchmarks:
- Use the
get_tlm_responses.ipynb
notebook to get and save the TLM responses and trustworthiness scores. - Use the
evaluate_responses.ipynb
notebook to the evaluate the TLM responses.