We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
teval.evaluators.review_evaluator.py代码中,根据“:”(英文半角符判断答案),但是测试样本中的的指令却是“Answer:”(中文全角符)。Qwen1.5-14B-Chat的大多数结果都是"Answer:A"、"Answer:B"、"Answer:C"...这样的。根据下面的代码,截取出来的结果就是“Answer”的第一字符“A”。也就是说Review指标上Qwen1.5-14B-Chat 基本都是A,与事实不符。这样写死的判断代码,测出的结果失真。
代码: pred_data = pred_data[pred_data.find(":") + 1:] pred_data = pred_data.strip() if len(pred_data) > 0 and pred_data[0] in ['A', 'B', 'C', 'D', 'E']:
测试样本指令:“你的输出应遵循以下格式:\n```\nAnswer:[在此处插入你的选择,从A、B、C、D和E中选择。这应该是一个字符。]\”
The text was updated successfully, but these errors were encountered:
感谢您指出问题,我们将会在下一版数据中fix这个问题
Sorry, something went wrong.
No branches or pull requests
teval.evaluators.review_evaluator.py代码中,根据“:”(英文半角符判断答案),但是测试样本中的的指令却是“Answer:”(中文全角符)。Qwen1.5-14B-Chat的大多数结果都是"Answer:A"、"Answer:B"、"Answer:C"...这样的。根据下面的代码,截取出来的结果就是“Answer”的第一字符“A”。也就是说Review指标上Qwen1.5-14B-Chat 基本都是A,与事实不符。这样写死的判断代码,测出的结果失真。
代码:
pred_data = pred_data[pred_data.find(":") + 1:]
pred_data = pred_data.strip()
if len(pred_data) > 0 and pred_data[0] in ['A', 'B', 'C', 'D', 'E']:
测试样本指令:“你的输出应遵循以下格式:\n```\nAnswer:[在此处插入你的选择,从A、B、C、D和E中选择。这应该是一个字符。]\”
The text was updated successfully, but these errors were encountered: