-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: SummarizationAccuracyMetrics
transform to handle multiple target outputs more efficiently
#317
Conversation
SummarizationAccuracyMetrics
transform to handle multiple target outputs more efficientlySummarizationAccuracyMetrics
transform to handle multiple target outputs more efficiently
SummarizationAccuracyMetrics
transform to handle multiple target outputs more efficientlySummarizationAccuracyMetrics
transform to handle multiple target outputs more efficiently
SummarizationAccuracyMetrics
transform to handle multiple target outputs more efficientlySummarizationAccuracyMetrics
transform to handle multiple target outputs more efficiently
allow_duplicate_input_keys: bool, | ||
target_output_keys: Optional[List[str]] = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep this as a positional argument so that 1) the argument order doesn't change, which would break things for existing users, and 2) the default configuration isn't invalid; currently, if you use the default options, you'll get a validation error.
For point 1), I really mean that we don't want the signatures of the concrete subclasses to change. Obviously, no one is directly calling this ABC's init method.
bertscore_model: Union[BertscoreHelperModel, ActorHandle], | ||
target_output_keys: Optional[List[str]] = None, | ||
target_output_keys_provider: str = "", | ||
bertscore_model: Union[BertscoreHelperModel, ActorHandle] = BertscoreHelperModel(BERTSCORE_DEFAULT_MODEL), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to put target_output_keys_provider
as the last argument, or else existing calls to this method will break.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with @kirupang-code offline. We need to use this ordering (which is backwards-incompatible) in order for deserialization of these object instances to work properly (since we rely on a specific ordering of the arguments to __init__
). The existing calls to BertScore.__init__
in our own code have been updated as necessary, but users who interact with BertScore
objects directly will be impacted.
We should keep track of this before the next release.
output_keys, | ||
allow_duplicates=allow_duplicate_input_keys, | ||
) | ||
self.target_output_keys = target_output_keys | ||
self.model_output_keys = model_output_keys | ||
self.target_output_keys_provider = str(target_output_keys_provider) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
bertscore_model: Union[BertscoreHelperModel, ActorHandle], | ||
target_output_keys: Optional[List[str]] = None, | ||
target_output_keys_provider: str = "", | ||
bertscore_model: Union[BertscoreHelperModel, ActorHandle] = BertscoreHelperModel(BERTSCORE_DEFAULT_MODEL), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with @kirupang-code offline. We need to use this ordering (which is backwards-incompatible) in order for deserialization of these object instances to work properly (since we rely on a specific ordering of the arguments to __init__
). The existing calls to BertScore.__init__
in our own code have been updated as necessary, but users who interact with BertScore
objects directly will be impacted.
We should keep track of this before the next release.
@@ -128,6 +128,31 @@ def test_bert_score_call_with_bertscore_model_object(): | |||
mock_bertscore_model.get_helper_scores.assert_called_once_with("Hello there!", "Hi") | |||
|
|||
|
|||
def test_bert_score_call_with_target_output_keys_provider(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a unit test where there are multiple target outputs, and verify that the max score is returned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had to add # type: ignore
to target_output_keys_provider because of its Optional type since we validate that its value is not None. Discussed with Daniel. Errors without the # type: ignore
:
3. Mypy
=======
src/fmeval/transforms/summarization_accuracy_metrics.py:89:69: error: List item 0 has incompatible type "str | None"; expected "str" [list-item]
src/fmeval/transforms/summarization_accuracy_metrics.py:111:25: error: Invalid index type "str | None" for "dict[str, Any]"; expected type "str" [index]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
SummarizationAccuracyMetrics
so that thetarget_output
field wouldn't have to be duplicated in order to be evaluated for eachmodel_output
. If we have multipletarget_outputs
we take the max score fromself.compute_metric
.target_output_keys_provider
parameter to support a way to compute scores given a key to a list of possible target outputs.BertScore
because it follows a parameter with a default value.input_keys + model_output_keys, TypeError: can only concatenate str (not "list") to str