Content safety evals aggregate max from conversations #39083

MilesHolland · 2025-01-08T20:50:39Z

Adds a new private argument to the base and rai base evaluator classes which determines how numeric results from conversations are evaluated into a single overall result. Note that this has no impact on the across-multiple-evaluations aggregation performed by the evaluate function.

Current options are mean (the default), max, and min. Content safety evaluators have their default aggregation set to max.

Added tests to ensure that the aggregation options all work as intended, and that the content safety evals have the proper value set.

Note that the aggregate content safety evaluator does not need to set anything, since its internal evaluators handle their own aggregation methods.

azure-sdk · 2025-01-09T20:09:43Z

API change check

API changes are not detected in this pull request.

Copilot reviewed 5 out of 10 changed files in this pull request and generated no comments.

Files not reviewed (5)

sdk/evaluation/azure-ai-evaluation/tests/unittests/data/evaluate_test_data_conversation.jsonl: Language not supported
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_violence.py: Evaluated as low risk
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_hate_unfairness.py: Evaluated as low risk
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_sexual.py: Evaluated as low risk
sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_content_safety/_self_harm.py: Evaluated as low risk

Comments suppressed due to low confidence (3)

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_eval.py:77

[nitpick] Update the docstring to reflect the correct class name if it is renamed to ConversationNumericAggregationType.

:type conversation_aggregation_type: ~azure.ai.evaluation._constants._ConversationNumericAggregationType

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_rai_svc_eval.py:41

Missing period at the end of the docstring.

Default is ~azure.ai.evaluation._constants.ConversationNumericAggregationType.MEAN.

sdk/evaluation/azure-ai-evaluation/azure/ai/evaluation/_evaluators/_common/_base_rai_svc_eval.py:42

The conversation_aggregation_type parameter should be explicitly mentioned in the constructor's docstring.

:type conversation_aggregation_type: ~azure.ai.evaluation._constants.ConversationNumericAggregationType

MilesHolland added 2 commits January 8, 2025 15:44

add convo agg type, and have harm evals use max

cb3a6aa

analysis

3ee1a9f

MilesHolland requested a review from a team as a code owner January 8, 2025 20:50

MilesHolland changed the title ~~Jan25/eval/improvement/cs convo takes max~~ Content safety evals aggregate max from conversations Jan 8, 2025

github-actions bot added the Evaluation Issues related to the client library for Azure AI Evaluation label Jan 8, 2025

correct enum name in docs

a0caaf0

nagkumar91 approved these changes Jan 8, 2025

View reviewed changes

nagkumar91 requested a review from Copilot January 9, 2025 21:18

Copilot AI reviewed Jan 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Content safety evals aggregate max from conversations #39083

Content safety evals aggregate max from conversations #39083

MilesHolland commented Jan 8, 2025

azure-sdk commented Jan 9, 2025

Content safety evals aggregate max from conversations #39083

Are you sure you want to change the base?

Content safety evals aggregate max from conversations #39083

Conversation

MilesHolland commented Jan 8, 2025

azure-sdk commented Jan 9, 2025

Choose a reason for hiding this comment