Questions about ARC datasets #71

Zoeyyao27 · 2023-12-07T08:31:39Z

When reproducting, I found Llama2-7b can not output the answer in the desired format and hence failed when using exact match. How do you generate the answer in the desired format? Do you add any prompt or just use the past_key_values to do that?
In section 4.3, you said "We first concatenate all question-answer pairs from the ARC-[Challenge, Easy] datasets, feed the
continuous stream to Llama-2-[7,13,70]B-Chat models, and assess model completions at each answer position using an exact match criterion" Do you do it by input:[q1] ->output: [a1] -> past_key_value[q1+a1] +input:[q2] -> output:[a2]->... ?
Can you provide the evaluation script for ARC dataset?

Provide feedback