Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Content safety evals aggregate max from conversations #39083

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

MilesHolland
Copy link
Member

Adds a new private argument to the base and rai base evaluator classes which determines how numeric results from conversations are evaluated into a single overall result. Note that this has no impact on the across-multiple-evaluations aggregation performed by the evaluate function.

Current options are mean (the default), max, and min. Content safety evaluators have their default aggregation set to max.

Added tests to ensure that the aggregation options all work as intended, and that the content safety evals have the proper value set.

Note that the aggregate content safety evaluator does not need to set anything, since its internal evaluators handle their own aggregation methods.

@MilesHolland MilesHolland requested a review from a team as a code owner January 8, 2025 20:50
@MilesHolland MilesHolland changed the title Jan25/eval/improvement/cs convo takes max Content safety evals aggregate max from conversations Jan 8, 2025
@github-actions github-actions bot added the Evaluation Issues related to the client library for Azure AI Evaluation label Jan 8, 2025
@azure-sdk
Copy link
Collaborator

API change check

API changes are not detected in this pull request.

@nagkumar91 nagkumar91 requested a review from Copilot January 9, 2025 21:18

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 5 out of 10 changed files in this pull request and generated no comments.

Files not reviewed (5)
  • sdk/evaluation/azure-ai-evaluation/tests/unittests/data/evaluate_test_data_conversation.jsonl: Language not supported
  • sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_violence.py: Evaluated as low risk
  • sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_hate_unfairness.py: Evaluated as low risk
  • sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_sexual.py: Evaluated as low risk
  • sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_self_harm.py: Evaluated as low risk
Comments suppressed due to low confidence (3)

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_eval.py:77

  • [nitpick] Update the docstring to reflect the correct class name if it is renamed to ConversationNumericAggregationType.
:type conversation_aggregation_type: ~azure.ai.evaluation._constants._ConversationNumericAggregationType

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_rai_svc_eval.py:41

  • Missing period at the end of the docstring.
Default is ~azure.ai.evaluation._constants.ConversationNumericAggregationType.MEAN.

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_rai_svc_eval.py:42

  • The conversation_aggregation_type parameter should be explicitly mentioned in the constructor's docstring.
:type conversation_aggregation_type: ~azure.ai.evaluation._constants.ConversationNumericAggregationType
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Evaluation Issues related to the client library for Azure AI Evaluation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants