Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement]: Handle non-string outputs gracefully in auto_contains_json evaluator #1987

Conversation

aybruhm
Copy link
Member

@aybruhm aybruhm commented Aug 13, 2024

Description

This PR enhances the auto_contains_json evaluator to handle cases where the output is not a string more gracefully, ensuring that any untracked errors lead to a clear and informative failure.

Related Issue

Closes AGE-573

What to QA

  • Evaluation Run:
    • Run a contains_json evaluator and run an evaluation and ensure it completes successfully
    • Verify that evaluation results are accurate and consistent across multiple runs.

Acceptance Tests

Test 1: Evaluator Handles Non-String Output Gracefully

  • Precondition: Ensure the auto_contains_json evaluator is set up.
  • Action:
    1. Run the evaluator with an output (where output is the LLM response of the application) that is not a string (e.g., a dictionary or list).
  • Expected Outcome:
    • The evaluator should fail gracefully, providing a clear and informative error message.
    • The error message should indicate that the output was not a string and suggest possible resolutions.

Test 2: Evaluation Completes Successfully

  • Precondition: Ensure the auto_contains_json evaluator is set up.
  • Action:
    1. Run a typical evaluation using the auto_contains_json evaluator with a string output (where output is the LLM response of the application).
  • Expected Outcome:
    • The evaluation should complete without errors.
    • The evaluation process should not be interrupted, and all expected results should be produced.

Copy link

vercel bot commented Aug 13, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
agenta ✅ Ready (Inspect) Visit Preview 💬 Add feedback Aug 23, 2024 2:06pm
agenta-documentation ✅ Ready (Inspect) Visit Preview 💬 Add feedback Aug 23, 2024 2:06pm

@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Aug 13, 2024
@aybruhm aybruhm requested a review from jp-agenta August 13, 2024 14:37
@dosubot dosubot bot added Backend enhancement New feature or request labels Aug 13, 2024
Copy link
Contributor

@jp-agenta jp-agenta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @aybruhm, quick question 👇
Shouldn't this apply to all non-RAG evaluators (not just contains_json) ?

…i-to-playground' into feature/age-573-evaluators-fail-gracefully-when-we-send-a-dict-to-a-str-only
…i-to-playground' into feature/age-573-evaluators-fail-gracefully-when-we-send-a-dict-to-a-str-only
@aybruhm
Copy link
Member Author

aybruhm commented Aug 20, 2024

Thanks @aybruhm, quick question 👇 Shouldn't this apply to all non-RAG evaluators (not just contains_json) ?

We already are doing that. The contains_json evaluator requires that the value of data is a str-JSON, and not another type. See comment here.

…i-to-playground' into feature/age-573-evaluators-fail-gracefully-when-we-send-a-dict-to-a-str-only
…i-to-playground' into feature/age-573-evaluators-fail-gracefully-when-we-send-a-dict-to-a-str-only
…flect changes in test cases

- Added parameters in 'test_auto_json_diff' for BaseResponse compatibility
- Updated parameters in 'test_auto_contains_json' to align with recent changes
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Aug 22, 2024
@jp-agenta
Copy link
Contributor

QA'd in oss-local.
QA in cloud-staging pending.

@jp-agenta jp-agenta merged commit 532a4bb into feature/age-491-poc-1e-expose-running-evaluators-via-api-to-playground Aug 23, 2024
11 checks passed
@jp-agenta jp-agenta deleted the feature/age-573-evaluators-fail-gracefully-when-we-send-a-dict-to-a-str-only branch August 23, 2024 14:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Backend enhancement New feature or request lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants