Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unified getter for the relevance level #254

Open
TheMrSheldon opened this issue Jan 22, 2024 · 1 comment
Open

Unified getter for the relevance level #254

TheMrSheldon opened this issue Jan 22, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@TheMrSheldon
Copy link

Is your feature request related to a problem? Please describe.
ir_datasets centralizes a lot of information about datasets. However, when using evaluation measures with binary levels (like MAP, MRR, ...), one sometimes needs to find the correct relevance level, which may be missed easily. Is it correct that ir_datasets currently does not track the minimum relevance level?

Describe the solution you'd like
Would it be possible to add a function document.get_relevance_level() -> int that returns the minimum relevance level for the dataset (e.g., 1 for TREC DL '19 doc and 2 for TREC DL '19 passage)? Some datasets (e.g., ANTIQUE) also recommend a remapping of the graded relevance labels. Could this be automatically performed. For example that during the download of ANTIQUE the qrels get remapped from the 1-4 range to 0-3 and for ANTIQUE the relevance level would be returned as 2 (the standard relevance level of 3 also reduced by 1).

Describe alternatives you've considered
To my knowledge, this currently has to be done manually.

Additional context
Such a function could then be used in conjunction with pyterrier or pytrec_eval such that the user does not need to manually find and hardcode the relevance_level for every dataset they use. Such a feature could greatly reduce the risk of incomparable evaluation results if some people forget to set the correct relevance_level and others don't.

@TheMrSheldon TheMrSheldon added the enhancement New feature or request label Jan 22, 2024
@seanmacavaney
Copy link
Collaborator

This sounds like a good addition, and I'm in favor of adding a dataset.qrels.binary_relevance_cutoff() function (or similar). Especially considering how frequently this causes folks problems.

The current solution is to provide this information as the "official measures". The sister project, ir-measures, specifies the minimum relevance threshold directly in the measure's name and passes it down when invoking pytrec_eval. See an example here: https://ir-datasets.com/msmarco-passage#msmarco-passage/trec-dl-2019

image

However, the official measure documentation isn't very complete (e.g., it's not documented for ANTIQUE), and in some cases, datasets don't have measure(s) that can be clearly marked as official.

I'm far more hesitant to perform any mapping of the data directly. From a software design perspective, this seems like the job of the evaluator, not the data provider. This would also be a breaking change for anybody who is already using an unmapped version of the qrels.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants