You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that there is an issue with the evaluation method in MVBench. Currently, the process of verifying correctness involves splitting the prediction and extracting only the first segment(word) to compare it with the correct answer. However, this approach causes any prediction that includes just a closing parenthesis “)” to be treated as correct.
I believe it is essential to add a step that verifies whether the alphabet of the answer option is correct.
While running your code, I mistakenly added an extra space in the answer prompt, making it “best option: ( “ and noticed a significant increase in performance.
It would be great if the evaluation method could be made more robust!
The text was updated successfully, but these errors were encountered:
It seems that there is an issue with the evaluation method in MVBench. Currently, the process of verifying correctness involves splitting the prediction and extracting only the first segment(word) to compare it with the correct answer. However, this approach causes any prediction that includes just a closing parenthesis “)” to be treated as correct.
I believe it is essential to add a step that verifies whether the alphabet of the answer option is correct.
While running your code, I mistakenly added an extra space in the answer prompt, making it “best option: ( “ and noticed a significant increase in performance.
It would be great if the evaluation method could be made more robust!
The text was updated successfully, but these errors were encountered: