Skip to content

Commit

Permalink
feat: add rag evaluation (#136) (#136)
Browse files Browse the repository at this point in the history
Closes #136
  • Loading branch information
billmetangmo authored Jun 27, 2024
1 parent 828000d commit 965f9a4
Show file tree
Hide file tree
Showing 5 changed files with 382 additions and 0 deletions.
30 changes: 30 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,36 @@ You can for example define icons by category (social object); ours are in `html/
devspace deploy
```

## Evaluation


### RAG base evaluation dataset

The list of runs `runs.csv` has been built by getting all the runs from the beginning using:
```
export LANGCHAIN_API_KEY=<key>
cd evals/
python3 rag-evals.py save_runs --days 400
```

Then we use [lilac](https://docs.lilacml.com/) to get the most interesting questions by clustering them per topic/category. "Associations in France" was the one chosen, and we also deleted some rows due to irrelevance.

> The clustering repartition is available here: [Clustering Repartition](https://github.com/mongulu-cm/tchoung-te/pull/127#issuecomment-2174444629)
Finally, you just need to do:
```
export LANGCHAIN_API_KEY=<key>
cd evals/
python3 rag.py ragas_eval tchoung-te --run_ids_file=runs.csv
python3 rag.py deepeval tchoung-te --run_ids_file=runs.csv
```

### RAG offline evaluation


Whenever you change a parameter that can affect RAG, you can execute all inputs present in evals/base_ragas_evaluation.csv using langsmith to track them. Then you just have to get the runs and execute above command. As it's just 27 elements, you will be able to compare results manually.


## Contributors ✨

Thanks goes to these wonderful people ([emoji key](https://allcontributors.org/docs/en/emoji-key)):
Expand Down
1 change: 1 addition & 0 deletions etl/experiments/evals/.deepeval-cache.json

Large diffs are not rendered by default.

Loading

0 comments on commit 965f9a4

Please sign in to comment.