-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add AnswerRecallSingleHitEvaluator
#7394
Conversation
Pull Request Test Coverage Report for Build 8363291949Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For precision, recall and F1 metrics on answers we need the tokenized versions of answers. That way we can calculate how many tokens of the ground truth answer are also mentioned in the predicted answer and vice versa.
["10th century", "the first half of the 10th century", "10th", "10th"], | ||
], | ||
) | ||
assert result == {"scores": [1.0, 1.0, 0.5, 1.0, 0.75, 1.0], "average": 0.875} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the SingleHit metric we assume that there is only one correct output. Only for MultiHit we allow multiple correct outputs. This makes more sense for documents and retrieval. If it is enough to retrieve a single relevant document per query from your database and you want to evaluate retrieval, the SingleHit metric is used. Following our naming that would be DocumentRecallSingleHitEvaluator. If there is more than one relevant document and they should all be retrieved, the MultiHit metric is used.
Closing in favor of #7399. |
Related Issues
Part of #6064.
Proposed Changes:
Add
AnswerRecallSingleHitEvaluator
Component. It can ben used to calculate Recall single-hit metric.How did you test it?
I added tests.
Notes for the reviewer
I didn't add the component in the package
__init__.py
on purpose to avoid conflicts with future PRs.When all the evaluators are done I'll update it.
Checklist
fix:
,feat:
,build:
,chore:
,ci:
,docs:
,style:
,refactor:
,perf:
,test:
.