Low output score - reasoning output written to final output #29

devishree23 · 2024-06-24T14:48:25Z

I am trying to reproduce your results from the paper. I am using the Llama3 70B GPTQ model for WebQSP dataset with Freebase KG. However, I am getting much lower results than ones reported in the paper. I got an exact match score of just 0.189.

Though one reason for the difference in scores could be due to the LLM used but based on the error analysis we performed, it seems that the reasoning done using LLM is also being written to the final output score. Is this by design or is it because it is a bug? Most of the output from reasoning is just "yes" or "no" but it doesn't contain the answer to the question. In the reasoning chains however, we see the required answer is derived from the KG.

Please let us know your thoughts. Any help would be appreciated.
Thank you

GasolSun36 · 2024-07-31T09:06:01Z

I am trying to reproduce your results from the paper. I am using the Llama3 70B GPTQ model for WebQSP dataset with Freebase KG. However, I am getting much lower results than ones reported in the paper. I got an exact match score of just 0.189.

Though one reason for the difference in scores could be due to the LLM used but based on the error analysis we performed, it seems that the reasoning done using LLM is also being written to the final output score. Is this by design or is it because it is a bug? Most of the output from reasoning is just "yes" or "no" but it doesn't contain the answer to the question. In the reasoning chains however, we see the required answer is derived from the KG.

Please let us know your thoughts. Any help would be appreciated. Thank you

Can you send me the exact model and commands you ran? The llama2-70b-chat model we use here doesn't work that badly. Did you make any changes to the exemplars?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low output score - reasoning output written to final output #29

Low output score - reasoning output written to final output #29

devishree23 commented Jun 24, 2024

GasolSun36 commented Jul 31, 2024

Low output score - reasoning output written to final output #29

Low output score - reasoning output written to final output #29

Comments

devishree23 commented Jun 24, 2024

GasolSun36 commented Jul 31, 2024