Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[rag evals][2/n] add more braintrust scoring fns for RAG eval #666

Merged
merged 7 commits into from
Jan 2, 2025

Conversation

yanxi0830
Copy link
Contributor

@yanxi0830 yanxi0830 commented Dec 20, 2024

What does this PR do?

  • add more braintrust scoring functions for RAG eval
  • add tests for evaluating against context

Test Plan

pytest -v -s -m braintrust_scoring_together_inference scoring/test_scoring.py
image

Example Output

image image

Sources

Please link relevant resources if necessary.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Ran pre-commit to handle lint / formatting issues.
  • Read the contributor guideline,
    Pull Request section?
  • Updated relevant documentation.
  • Wrote necessary unit or integration tests.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 20, 2024
@yanxi0830 yanxi0830 changed the base branch from main to rag_scoring_fn_1 December 20, 2024 00:57
@yanxi0830 yanxi0830 marked this pull request as ready for review December 20, 2024 01:11
…c eval pipeline (#668)

# What does this PR do?

- This PR adds the ability s.t. users can evaluate on both retrieval +
generation separately & as a whole by passing an AgentConfig to the
/eval API
- The memory_retrieval context will be stored in the "context" column
used for scoring functions that can evaluate the retrieved context.

## Test Plan
- E2E Test RAG Agent Notebook:
https://gist.github.com/yanxi0830/0377594d29958f9b6f9317ab049fa836

<img width="758" alt="image"
src="https://github.com/user-attachments/assets/58ed9db7-f07b-400a-931b-923b0d612902"
/>

<img width="682" alt="image"
src="https://github.com/user-attachments/assets/9ebd7fbd-2a6d-4c93-92fa-a9456fae2378"
/>



## Sources

Please link relevant resources if necessary.


## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
@yanxi0830 yanxi0830 merged commit 2da455f into rag_scoring_fn_1 Jan 2, 2025
2 checks passed
@yanxi0830 yanxi0830 deleted the rag_scoring_fn_2 branch January 2, 2025 19:19
yanxi0830 added a commit that referenced this pull request Jan 2, 2025
…agentic eval pipeline (#664)

# What does this PR do?

- See #666 &
#668

- Refactor BaseScoringFn to be just a minimal interface, add new
RegistrableBaseScoring
- Refactor data schema check
- To separately evaluate retrieval component in RAG, we will have
scoring functions needing "context" column additionally.
- Refactor braintrust eval (more scoring fn added & tested in following
PR)

## Test Plan

```
pytest -v -s -m llm_as_judge_scoring_together_inference scoring/test_scoring.py --judge-model meta-llama/Llama-3.2-3B-Instruct
pytest -v -s -m basic_scoring_together_inference scoring/test_scoring.py
pytest -v -s -m braintrust_scoring_together_inference scoring/test_scoring.py
```

<img width="847" alt="image"
src="https://github.com/user-attachments/assets/d099cb2d-6f9c-4bdf-9d0d-f388cf758c0f"
/>

```
pytest -v -s -m meta_reference_eval_together_inference eval/test_eval.py
pytest -v -s -m meta_reference_eval_together_inference_huggingface_datasetio eval/test_eval.py
```
<img width="850" alt="image"
src="https://github.com/user-attachments/assets/dce28fc3-0493-4d34-820a-567260873cc8"
/>



## Before submitting

- [ ] This PR fixes a typo or improves the docs (you can dismiss the
other checks if that's the case).
- [ ] Ran pre-commit to handle lint / formatting issues.
- [ ] Read the [contributor
guideline](https://github.com/meta-llama/llama-stack/blob/main/CONTRIBUTING.md),
      Pull Request section?
- [ ] Updated relevant documentation.
- [ ] Wrote necessary unit or integration tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants