-
Hi, Looking for some help to clear up my lack of full understanding on the evaluation. In the Metrics section of A Replication Study of Dense Passage Retriever, it says that "we measured effectiveness in terms of top-k retrieval accuracy, defined as the fraction of questions that have a correct answer span in the top-k retrieved contexts at least once." What is the definition of a "correct answer"? Or to put it another way, for the NQ dataset how can we determine whether the passages retrieved from the Wikipedia corpus are accurate, considering it's highly improbable that there will be a passage that precisely matches the annotated short or long answer? Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Yes, the passage does not precisely have to match the annotated short/long answer. We process passages to find those that contain some variant of one of the gold answers, up to normalization, as an exact match. This set is by no means exhaustive or perfect but it is the best one can do in a compute-friendly way. I hope that clarifies things! |
Beta Was this translation helpful? Give feedback.
Yes, the passage does not precisely have to match the annotated short/long answer. We process passages to find those that contain some variant of one of the gold answers, up to normalization, as an exact match. This set is by no means exhaustive or perfect but it is the best one can do in a compute-friendly way. I hope that clarifies things!