Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GSM-symbolic (Apache 2.0 conform replication) #13

Closed
andreaskoepf opened this issue Jan 26, 2025 · 2 comments
Closed

Add GSM-symbolic (Apache 2.0 conform replication) #13

andreaskoepf opened this issue Jan 26, 2025 · 2 comments
Assignees

Comments

@andreaskoepf
Copy link
Contributor

andreaskoepf commented Jan 26, 2025

Apple research released their code for GSM8K-symbolic in the apple/ml-gsm-symbolic. The dataset construction was described in GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models.

Unfortunately Apple released their code under a special Apple license.
All code included in reasoning-gym should be open-source under Apache 2.0.

We will therefore replicate GSM-symbolic again for reasoning-gym.

GSM8K on HF: openai/gsm8k

@andreaskoepf andreaskoepf changed the title Add GSM8K-symbolic Add GSM-symbolic (Apache 2.0 conform replication) Jan 26, 2025
@Adefioye
Copy link
Contributor

I am currently looking at creating sample direct templates using the gsm symbolic data. I will also like to comment here that I noticed that it would be difficult to use python script to create our own samples considering the script would have to be smart about what the different types of proper names are in the questions and assigning relevant random alternatives to those names.

Based on the discussions with @andreaskoepf, I agree the less difficult approach at creating our own templates would be stuffing the gsm templates onto an LLM and allow it to generate samples similar to that for us. The challenge here is we cannot guarantee it would not hallucinate which means perhaps we want to ensure we have a way to automate the validation of the samples we get from the LLM.

In the light of the tediousness of these two approaches. I am first thinking of writing a script to create a direct template from the original gsm template just for fun and perhaps wait to hear from the main author of the paper to see if we can use their template wholesale. I have sent him a mail. I am hoping he would reply before the end of the week.

@andreaskoepf
Copy link
Contributor Author

We created a PoC for a llm (Sonnet) based conversion of the Apple GSM-symbolic dataset entries into python code, it can be found in the gsm_symbolic.ipynb notebook.

Next steps:

  • try conversion of all 100 templates with retry on error (possibly with error feedback), inspect some outputs, tune prompt
  • verify generated scripts for 3 random entries manually
  • verify that original questions are close to the generated ones (small edit distance) and have same solution
  • 100 entries per generator and verify that results are always int

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants