You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently looking at creating sample direct templates using the gsm symbolic data. I will also like to comment here that I noticed that it would be difficult to use python script to create our own samples considering the script would have to be smart about what the different types of proper names are in the questions and assigning relevant random alternatives to those names.
Based on the discussions with @andreaskoepf, I agree the less difficult approach at creating our own templates would be stuffing the gsm templates onto an LLM and allow it to generate samples similar to that for us. The challenge here is we cannot guarantee it would not hallucinate which means perhaps we want to ensure we have a way to automate the validation of the samples we get from the LLM.
In the light of the tediousness of these two approaches. I am first thinking of writing a script to create a direct template from the original gsm template just for fun and perhaps wait to hear from the main author of the paper to see if we can use their template wholesale. I have sent him a mail. I am hoping he would reply before the end of the week.
We created a PoC for a llm (Sonnet) based conversion of the Apple GSM-symbolic dataset entries into python code, it can be found in the gsm_symbolic.ipynb notebook.
Next steps:
try conversion of all 100 templates with retry on error (possibly with error feedback), inspect some outputs, tune prompt
verify generated scripts for 3 random entries manually
verify that original questions are close to the generated ones (small edit distance) and have same solution
100 entries per generator and verify that results are always int
Apple research released their code for GSM8K-symbolic in the apple/ml-gsm-symbolic. The dataset construction was described in GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models.
Unfortunately Apple released their code under a special Apple license.
All code included in reasoning-gym should be open-source under Apache 2.0.
We will therefore replicate GSM-symbolic again for reasoning-gym.
GSM8K on HF: openai/gsm8k
The text was updated successfully, but these errors were encountered: