Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add AnswerRecallSingleHitEvaluator #7394

Closed
wants to merge 1 commit into from

Conversation

silvanocerza
Copy link
Contributor

Related Issues

Part of #6064.

Proposed Changes:

Add AnswerRecallSingleHitEvaluator Component. It can ben used to calculate Recall single-hit metric.

How did you test it?

I added tests.

Notes for the reviewer

I didn't add the component in the package __init__.py on purpose to avoid conflicts with future PRs.
When all the evaluators are done I'll update it.

Checklist

@silvanocerza silvanocerza self-assigned this Mar 20, 2024
@silvanocerza silvanocerza requested review from a team as code owners March 20, 2024 17:26
@silvanocerza silvanocerza requested review from dfokina and masci and removed request for a team March 20, 2024 17:26
@github-actions github-actions bot added type:documentation Improvements on the docs topic:tests 2.x Related to Haystack v2.0 labels Mar 20, 2024
@silvanocerza silvanocerza requested review from shadeMe, julian-risch, a team and davidsbatista and removed request for masci, a team and davidsbatista March 20, 2024 17:26
@coveralls
Copy link
Collaborator

Pull Request Test Coverage Report for Build 8363291949

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.03%) to 89.289%

Totals Coverage Status
Change from base Build 8362383368: 0.03%
Covered Lines: 5427
Relevant Lines: 6078

💛 - Coveralls

Copy link
Member

@julian-risch julian-risch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For precision, recall and F1 metrics on answers we need the tokenized versions of answers. That way we can calculate how many tokens of the ground truth answer are also mentioned in the predicted answer and vice versa.

["10th century", "the first half of the 10th century", "10th", "10th"],
],
)
assert result == {"scores": [1.0, 1.0, 0.5, 1.0, 0.75, 1.0], "average": 0.875}
Copy link
Member

@julian-risch julian-risch Mar 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the SingleHit metric we assume that there is only one correct output. Only for MultiHit we allow multiple correct outputs. This makes more sense for documents and retrieval. If it is enough to retrieve a single relevant document per query from your database and you want to evaluate retrieval, the SingleHit metric is used. Following our naming that would be DocumentRecallSingleHitEvaluator. If there is more than one relevant document and they should all be retrieved, the MultiHit metric is used.

@silvanocerza
Copy link
Contributor Author

Closing in favor of #7399.

@silvanocerza silvanocerza deleted the recall-single-hit-evaluator branch March 21, 2024 16:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants