You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When running Fisher's Randomization Test multiple times using the same data and same parameters, the significance results vary between runs. This happens even when the random seed is fixed and also when just a single thread is used.
To Reproduce
Run ranx.compare(qrels, runs, metrics=["precision@1", "recall@20"], stat_test="fisher", max_p=0.05, n_permutations=1000, make_comparable=True, threads=1, random_seed=0). qrels has a few thousand entries, each element in runs about a thousand entries.
When running compare() multiple times, the significance assessments are slightly different between runs. On my data, this can be observed after three to five compare() runs.
Expected behavior
I expect multiple compare() runs on the same data and with the same parameters to always show the exact same significance results.
The text was updated successfully, but these errors were encountered:
Describe the bug
When running Fisher's Randomization Test multiple times using the same data and same parameters, the significance results vary between runs. This happens even when the random seed is fixed and also when just a single thread is used.
To Reproduce
Run
ranx.compare(qrels, runs, metrics=["precision@1", "recall@20"], stat_test="fisher", max_p=0.05, n_permutations=1000, make_comparable=True, threads=1, random_seed=0)
.qrels
has a few thousand entries, each element inruns
about a thousand entries.When running
compare()
multiple times, the significance assessments are slightly different between runs. On my data, this can be observed after three to fivecompare()
runs.Expected behavior
I expect multiple
compare()
runs on the same data and with the same parameters to always show the exact same significance results.The text was updated successfully, but these errors were encountered: