Issue with MVBench Evaluation #227

Backdrop9019 · 2024-08-24T05:59:19Z

It seems that there is an issue with the evaluation method in MVBench. Currently, the process of verifying correctness involves splitting the prediction and extracting only the first segment(word) to compare it with the correct answer. However, this approach causes any prediction that includes just a closing parenthesis “)” to be treated as correct.

I believe it is essential to add a step that verifies whether the alphabet of the answer option is correct.

While running your code, I mistakenly added an extra space in the answer prompt, making it “best option: ( “ and noticed a significant increase in performance.

It would be great if the evaluation method could be made more robust!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with MVBench Evaluation #227

Issue with MVBench Evaluation #227

Backdrop9019 commented Aug 24, 2024

Issue with MVBench Evaluation #227

Issue with MVBench Evaluation #227

Comments

Backdrop9019 commented Aug 24, 2024