Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: SummarizationAccuracyMetrics transform to handle multiple target outputs more efficiently #317

Merged
merged 41 commits into from
Aug 8, 2024

Conversation

kirupang-code
Copy link
Contributor

@kirupang-code kirupang-code commented Jul 31, 2024

  • Edited the logic in SummarizationAccuracyMetrics so that the target_output field wouldn't have to be duplicated in order to be evaluated for each model_output. If we have multiple target_outputs we take the max score from self.compute_metric.
  • Added an additional target_output_keys_provider parameter to support a way to compute scores given a key to a list of possible target outputs.
  • Updated the ordering of subclass transforms in order to keep everything consistent with their parent transforms, otherwise parameters would get mixed up. I had to add a default value for BertScore because it follows a parameter with a default value. input_keys + model_output_keys, TypeError: can only concatenate str (not "list") to str

@xiaoyi-cheng xiaoyi-cheng changed the title fix: refactor SummarizationAccuracyMetrics transform to handle multiple target outputs more efficiently refactor: SummarizationAccuracyMetrics transform to handle multiple target outputs more efficiently Jul 31, 2024
@kirupang-code kirupang-code changed the title refactor: SummarizationAccuracyMetrics transform to handle multiple target outputs more efficiently WIP - refactor: SummarizationAccuracyMetrics transform to handle multiple target outputs more efficiently Jul 31, 2024
@kirupang-code kirupang-code changed the title WIP - refactor: SummarizationAccuracyMetrics transform to handle multiple target outputs more efficiently refactor: SummarizationAccuracyMetrics transform to handle multiple target outputs more efficiently Jul 31, 2024
src/fmeval/transforms/summarization_accuracy_metrics.py Outdated Show resolved Hide resolved
allow_duplicate_input_keys: bool,
target_output_keys: Optional[List[str]] = None,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this as a positional argument so that 1) the argument order doesn't change, which would break things for existing users, and 2) the default configuration isn't invalid; currently, if you use the default options, you'll get a validation error.

For point 1), I really mean that we don't want the signatures of the concrete subclasses to change. Obviously, no one is directly calling this ABC's init method.

src/fmeval/transforms/summarization_accuracy_metrics.py Outdated Show resolved Hide resolved
bertscore_model: Union[BertscoreHelperModel, ActorHandle],
target_output_keys: Optional[List[str]] = None,
target_output_keys_provider: str = "",
bertscore_model: Union[BertscoreHelperModel, ActorHandle] = BertscoreHelperModel(BERTSCORE_DEFAULT_MODEL),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to put target_output_keys_provider as the last argument, or else existing calls to this method will break.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @kirupang-code offline. We need to use this ordering (which is backwards-incompatible) in order for deserialization of these object instances to work properly (since we rely on a specific ordering of the arguments to __init__). The existing calls to BertScore.__init__ in our own code have been updated as necessary, but users who interact with BertScore objects directly will be impacted.

We should keep track of this before the next release.

output_keys,
allow_duplicates=allow_duplicate_input_keys,
)
self.target_output_keys = target_output_keys
self.model_output_keys = model_output_keys
self.target_output_keys_provider = str(target_output_keys_provider)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

bertscore_model: Union[BertscoreHelperModel, ActorHandle],
target_output_keys: Optional[List[str]] = None,
target_output_keys_provider: str = "",
bertscore_model: Union[BertscoreHelperModel, ActorHandle] = BertscoreHelperModel(BERTSCORE_DEFAULT_MODEL),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with @kirupang-code offline. We need to use this ordering (which is backwards-incompatible) in order for deserialization of these object instances to work properly (since we rely on a specific ordering of the arguments to __init__). The existing calls to BertScore.__init__ in our own code have been updated as necessary, but users who interact with BertScore objects directly will be impacted.

We should keep track of this before the next release.

@@ -128,6 +128,31 @@ def test_bert_score_call_with_bertscore_model_object():
mock_bertscore_model.get_helper_scores.assert_called_once_with("Hello there!", "Hi")


def test_bert_score_call_with_target_output_keys_provider():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a unit test where there are multiple target outputs, and verify that the max score is returned.

Copy link
Contributor Author

@kirupang-code kirupang-code Aug 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to add # type: ignore to target_output_keys_provider because of its Optional type since we validate that its value is not None. Discussed with Daniel. Errors without the # type: ignore:

3. Mypy
=======
src/fmeval/transforms/summarization_accuracy_metrics.py:89:69: error: List item 0 has incompatible type "str | None"; expected "str"  [list-item]
src/fmeval/transforms/summarization_accuracy_metrics.py:111:25: error: Invalid index type "str | None" for "dict[str, Any]"; expected type "str"  [index]

xiaoyi-cheng
xiaoyi-cheng previously approved these changes Aug 6, 2024
Copy link
Contributor

@xiaoyi-cheng xiaoyi-cheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@xiaoyi-cheng xiaoyi-cheng merged commit bc5a15f into aws:main Aug 8, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants