Skip to content

Commit 9a52e8a

Browse files
authored
Grade School Math example (#694)
* Grade School Math example Signed-off-by: Ed Snible <[email protected]> * Be resiliant to malformed JSON Signed-off-by: Ed Snible <[email protected]> --------- Signed-off-by: Ed Snible <[email protected]>
1 parent e66e0de commit 9a52e8a

File tree

3 files changed

+153
-0
lines changed

3 files changed

+153
-0
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -153,6 +153,7 @@ pdl-live/package-lock.json
153153

154154
# Demo files
155155
pdl-rag-demo.db
156+
test.jsonl
156157

157158
# Built docs
158159
_site

examples/gsm8k/README.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
2+
# Grade School Math
3+
4+
This demo measures success with
5+
[Grade School Math](https://github.com/openai/grade-school-math),
6+
an open source AI dataset from 2021.
7+
8+
Before running the example, you must download the dataset:
9+
10+
```bash
11+
curl https://raw.githubusercontent.com/openai/grade-school-math/refs/heads/master/grade_school_math/data/test.jsonl > test.jsonl
12+
```
13+
14+
To run, do `pdl --stream none gsm8.pdl`.
15+
16+
The example version attempts to do the first 50 questions in that example
17+
using `ollama/granite3.2:8b`. If you are using Ollama, you should first do
18+
19+
```bash
20+
ollama pull granite3.2:8b
21+
```
22+
23+
To get the model.
24+
25+
You may change the model and model host used, and the number of questions tested, by editing the file.

examples/gsm8k/gsm8.pdl

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
#!/usr/bin/env pdl
2+
3+
# Grade School Math https://github.com/openai/grade-school-math is an
4+
# open source AI dataset from 2021.
5+
#
6+
# https://github.com/openai/grade-school-math/blob/master/grade_school_math/data/test.jsonl
7+
# is a file with 1319 questions and answers.
8+
#
9+
#
10+
11+
description: Grade School Math example
12+
defs:
13+
# The Grade School Math Dataset
14+
ALL_TESTS:
15+
read: ./test.jsonl
16+
parser: jsonl
17+
18+
# How many problems to evaluate. The entire dataset is 1319 problems.
19+
# MAX_ITERATIONS: 1319
20+
MAX_ITERATIONS: 50
21+
22+
# PDL variables that hold statistics
23+
SUCCESSES: 0
24+
FAILURES: 0
25+
TESTS: ${ ALL_TESTS[:MAX_ITERATIONS] }
26+
text:
27+
# First phase: ask LLM the Grade School Math questions
28+
- for:
29+
TEST: ${ TESTS }
30+
repeat:
31+
# Ask the LLM for the answer
32+
# - model: ollama/granite-code:8b
33+
model: ollama/granite3.2:8b
34+
# First, get LLM to answer the question
35+
input: |
36+
Question: ${ TEST.question }
37+
Answer:
38+
join:
39+
as: array
40+
contribute: []
41+
def: ALL_LLM_FULL_A
42+
# For debugging, print first phase result
43+
#- lang: python
44+
# code: |
45+
# print(f"ALL_LLM_FULL_A={ALL_LLM_FULL_A}")
46+
# result = "dummy"
47+
# contribute: []
48+
49+
# Second phase: Simplify the results
50+
- for:
51+
LLM_FULL_ANSWER: ${ ALL_LLM_FULL_A }
52+
repeat:
53+
# Next, get LLM to convert its answer into a single JSON key/value
54+
# - model: ollama/granite-code:8b
55+
model: ollama/granite3.2:8b
56+
input: | # 'input' is the prompt
57+
Generate the final answer from the conclusion of this text as JSON with a single key named answer.
58+
${ LLM_FULL_ANSWER }
59+
join:
60+
as: array
61+
contribute: []
62+
def: SIMPLIFIED_LLM_ANSWERS
63+
64+
# Third phase: Compare with Grade School Math ground truth
65+
- for:
66+
TEST: ${ TESTS }
67+
LLM_FULL_ANSWER: ${ ALL_LLM_FULL_A }
68+
SIMPLIFIED_LLM_ANSWER: ${ SIMPLIFIED_LLM_ANSWERS }
69+
repeat:
70+
lastOf:
71+
# Convert the JSON string to JSON. (We do this in a separate step so
72+
# we have access to the original for debugging.)
73+
- data: ${ SIMPLIFIED_LLM_ANSWER }
74+
parser: json
75+
def: JSON_SIMPLIFIED_LLM_ANSWER
76+
# - lang: python
77+
# code: |
78+
# print(f"JSON_SIMPLIFIED_LLM_ANSWER={JSON_SIMPLIFIED_LLM_ANSWER}")
79+
# result = "dummy"
80+
81+
# Strip off any prefix or suffix off the number (dollar signs, units, etc)
82+
# and place it in of the JSON format { "answer": ... }
83+
- data: ${ JSON_SIMPLIFIED_LLM_ANSWER.answer|string if 'answer' in JSON_SIMPLIFIED_LLM_ANSWER else ("MISSING 'answer' in " + LLM_FULL_ANSWER) }
84+
parser:
85+
regex: "[^0-9]*(?P<answer>[0-9]+).*$"
86+
spec:
87+
answer: str
88+
def: EXTRACTED_SIMPLIFIED_LLM_ANSWER
89+
# (In case the simplified answer did not contain digits.)
90+
- if: ${ EXTRACTED_SIMPLIFIED_LLM_ANSWER == None }
91+
then:
92+
def: EXTRACTED_SIMPLIFIED_LLM_ANSWER
93+
data:
94+
answer: "none"
95+
#- lang: python
96+
# code: |
97+
# print(f"EXTRACTED_SIMPLIFIED_LLM_ANSWER={EXTRACTED_SIMPLIFIED_LLM_ANSWER}")
98+
# result = "dummy"
99+
# contribute: []
100+
101+
# Extract the expected answer, which in this test data always follows "#### "
102+
# into { "answer": ... }
103+
- data: ${ TEST.answer }
104+
parser:
105+
regex: "(.|\n)*#### (?P<answer>([0-9])*)\n*"
106+
spec:
107+
answer: str
108+
def: EXTRACTED_GROUND_TRUTH
109+
#- lang: python
110+
# code: |
111+
# print(f"EXTRACTED_GROUND_TRUTH={EXTRACTED_GROUND_TRUTH}")
112+
# result = "dummy"
113+
# contribute: []
114+
115+
# Did we get the expected answer?
116+
- if: ${ EXTRACTED_SIMPLIFIED_LLM_ANSWER.answer == EXTRACTED_GROUND_TRUTH.answer}
117+
then:
118+
lastOf:
119+
- defs:
120+
SUCCESSES: ${ SUCCESSES + 1 }
121+
- "LLM got right answer for '${ LLM_FULL_ANSWER }' which was simplified to '${ SIMPLIFIED_LLM_ANSWER }' which was extracted to '${ EXTRACTED_SIMPLIFIED_LLM_ANSWER.answer }'\n"
122+
else:
123+
lastOf:
124+
- defs:
125+
FAILURES: ${ FAILURES + 1 }
126+
- "WRONG! Wanted ${ EXTRACTED_GROUND_TRUTH.answer} } / LLM said '${ LLM_FULL_ANSWER }' which was simplified to '${ SIMPLIFIED_LLM_ANSWER }' which was extracted to '${ EXTRACTED_SIMPLIFIED_LLM_ANSWER.answer }'\n"
127+
- "Finished, ${ SUCCESSES } successes and ${ FAILURES } failures"

0 commit comments

Comments
 (0)