feat: Add `AnswerRecallSingleHitEvaluator` #7394

silvanocerza · 2024-03-20T17:26:31Z

Related Issues

Part of #6064.

Proposed Changes:

Add AnswerRecallSingleHitEvaluator Component. It can ben used to calculate Recall single-hit metric.

How did you test it?

I added tests.

Notes for the reviewer

I didn't add the component in the package __init__.py on purpose to avoid conflicts with future PRs.
When all the evaluators are done I'll update it.

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
I documented my code
I ran pre-commit hooks and fixed any issue

coveralls · 2024-03-20T17:42:02Z

Pull Request Test Coverage Report for Build 8363291949

Details

0 of 0 changed or added relevant lines in 0 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.03%) to 89.289%

Totals
Change from base Build 8362383368:	0.03%
Covered Lines:	5427
Relevant Lines:	6078

💛 - Coveralls

julian-risch

For precision, recall and F1 metrics on answers we need the tokenized versions of answers. That way we can calculate how many tokens of the ground truth answer are also mentioned in the predicted answer and vice versa.

julian-risch · 2024-03-21T07:55:01Z

test/components/evaluators/test_answer_recall_single_hit.py

+            ["10th century", "the first half of the 10th century", "10th", "10th"],
+        ],
+    )
+    assert result == {"scores": [1.0, 1.0, 0.5, 1.0, 0.75, 1.0], "average": 0.875}


For the SingleHit metric we assume that there is only one correct output. Only for MultiHit we allow multiple correct outputs. This makes more sense for documents and retrieval. If it is enough to retrieve a single relevant document per query from your database and you want to evaluate retrieval, the SingleHit metric is used. Following our naming that would be DocumentRecallSingleHitEvaluator. If there is more than one relevant document and they should all be retrieved, the MultiHit metric is used.

silvanocerza · 2024-03-21T15:00:52Z

Closing in favor of #7399.

Add AnswerRecallSingleHitEvaluator

e9487d7

silvanocerza self-assigned this Mar 20, 2024

silvanocerza requested review from a team as code owners March 20, 2024 17:26

silvanocerza requested review from dfokina and masci and removed request for a team March 20, 2024 17:26

github-actions bot added type:documentation Improvements on the docs topic:tests 2.x Related to Haystack v2.0 labels Mar 20, 2024

silvanocerza requested review from shadeMe, julian-risch, a team and davidsbatista and removed request for masci, a team and davidsbatista March 20, 2024 17:26

julian-risch reviewed Mar 21, 2024

View reviewed changes

This was referenced Mar 21, 2024

feat: Add AnswerRecallMultiHitEvaluator #7393

Closed

feat: Add AnswerF1Evaluator #7073

Closed

silvanocerza closed this Mar 21, 2024

silvanocerza deleted the recall-single-hit-evaluator branch March 21, 2024 16:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add `AnswerRecallSingleHitEvaluator` #7394

feat: Add `AnswerRecallSingleHitEvaluator` #7394

silvanocerza commented Mar 20, 2024

coveralls commented Mar 20, 2024

julian-risch left a comment

julian-risch Mar 21, 2024 •

edited

Loading

silvanocerza commented Mar 21, 2024

feat: Add AnswerRecallSingleHitEvaluator #7394

feat: Add AnswerRecallSingleHitEvaluator #7394

Conversation

silvanocerza commented Mar 20, 2024

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

coveralls commented Mar 20, 2024

Pull Request Test Coverage Report for Build 8363291949

Details

💛 - Coveralls

julian-risch left a comment

Choose a reason for hiding this comment

julian-risch Mar 21, 2024 • edited Loading

Choose a reason for hiding this comment

silvanocerza commented Mar 21, 2024

feat: Add `AnswerRecallSingleHitEvaluator` #7394

feat: Add `AnswerRecallSingleHitEvaluator` #7394

julian-risch Mar 21, 2024 •

edited

Loading