Add GSM-symbolic (Apache 2.0 conform replication) #13

andreaskoepf · 2025-01-26T22:54:00Z

Apple research released their code for GSM8K-symbolic in the apple/ml-gsm-symbolic. The dataset construction was described in GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models.

Unfortunately Apple released their code under a special Apple license.
All code included in reasoning-gym should be open-source under Apache 2.0.

We will therefore replicate GSM-symbolic again for reasoning-gym.

GSM8K on HF: openai/gsm8k

Adefioye · 2025-01-29T03:46:40Z

I am currently looking at creating sample direct templates using the gsm symbolic data. I will also like to comment here that I noticed that it would be difficult to use python script to create our own samples considering the script would have to be smart about what the different types of proper names are in the questions and assigning relevant random alternatives to those names.

Based on the discussions with @andreaskoepf, I agree the less difficult approach at creating our own templates would be stuffing the gsm templates onto an LLM and allow it to generate samples similar to that for us. The challenge here is we cannot guarantee it would not hallucinate which means perhaps we want to ensure we have a way to automate the validation of the samples we get from the LLM.

In the light of the tediousness of these two approaches. I am first thinking of writing a script to create a direct template from the original gsm template just for fun and perhaps wait to hear from the main author of the paper to see if we can use their template wholesale. I have sent him a mail. I am hoping he would reply before the end of the week.

andreaskoepf · 2025-01-30T10:09:14Z

We created a PoC for a llm (Sonnet) based conversion of the Apple GSM-symbolic dataset entries into python code, it can be found in the gsm_symbolic.ipynb notebook.

Next steps:

try conversion of all 100 templates with retry on error (possibly with error feedback), inspect some outputs, tune prompt
verify generated scripts for 3 random entries manually
verify that original questions are close to the generated ones (small edit distance) and have same solution
100 entries per generator and verify that results are always int

andreaskoepf changed the title ~~Add GSM8K-symbolic~~ Add GSM-symbolic (Apache 2.0 conform replication) Jan 26, 2025

andreaskoepf assigned andreaskoepf and Adefioye Jan 29, 2025

andreaskoepf closed this as completed Feb 6, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GSM-symbolic (Apache 2.0 conform replication) #13

Add GSM-symbolic (Apache 2.0 conform replication) #13

andreaskoepf commented Jan 26, 2025 •

edited

Loading

Adefioye commented Jan 29, 2025

andreaskoepf commented Jan 30, 2025

Add GSM-symbolic (Apache 2.0 conform replication) #13

Add GSM-symbolic (Apache 2.0 conform replication) #13

Comments

andreaskoepf commented Jan 26, 2025 • edited Loading

Adefioye commented Jan 29, 2025

andreaskoepf commented Jan 30, 2025

andreaskoepf commented Jan 26, 2025 •

edited

Loading