We are building a python library of procedural dataset generators and algorithmically verifiable reasoning environments for training Reasoning Models with reinforcement learning (RL).
The goal is to generate virtually infinite data with adjustable complexity.
Algorithmic verification allows to train on tasks like Rubik‘s cube or Countdown which have many correct solutions.
- Clone the project
git clone https://github.com/open-thought/reasoning-gym.git
- Create a virtual environment (here we use conda)
conda create --name reasoning_gym python=3.11 -y
conda activate reasoning_gym
- Link project and install dependencies
pip install -e .
- Install development dependencies
pip install -r requirements-dev.txt
NOTE: To consume the APIs in reasoning_gym, just install from pip using the following
pip install reasoning-gym
Example:
import reasoning_gym
data = reasoning_gym.create_dataset('leg_counting', size=10, seed=42)
for i, x in enumerate(data):
print(f'{i}: q="{x['question']}", a="{x['answer']}"')
print('metadata:', x['metadata'])
# use the dataset's `score_answer` method for algorithmic verification
assert data.score_answer(answer=x['answer'], entry=x) == 1.0
Output:
0: q="How many legs are there in total if you have 1 sea slug, 1 deer?", a="4"
metadata: {'animals': {'sea slug': 1, 'deer': 1}, 'total_legs': 4}
1: q="How many legs are there in total if you have 2 sheeps, 2 dogs?", a="16"
metadata: {'animals': {'sheep': 2, 'dog': 2}, 'total_legs': 16}
2: q="How many legs are there in total if you have 1 crab, 2 lobsters, 1 human, 1 cow, 1 bee?", a="42"
...
See the Dataset Gallery for a complete list of available datasets with examples.
SimpleEquationsDataset
: Generate linear equations with one variable to solve (e.g. "3*x + 2 = 14")PolynomialEquationsDataset
: Generate polynomial equations with one variable to solve (e.g. "-6h**4 + 4h*2 - 5h = 0")
BasicArithmeticDataset
: Generate arithmetic expressions with configurable complexity and operators (+, -, *, /)CalendarArithmeticDatset
: Generate arithmetic problems around calendar navigation logicChainSum
: Generate addition/subtraction chains with configurable length and digit countsFractionSimplificationDataset
: Generate fraction simplification tasks with configurable complexityGCDDataset
: Generate Greatest Common Divisor problems with configurable number of integersLCMDataset
: Generate Least Common Multiple problems with configurable number of integersLegCountingDataset
: Generate animal leg counting word problems with various animalsPrimeFactorizationDataset
: Generate prime factorization tasks with configurable number rangesTimeIntervalsDataset
: Generate time interval calculation tasks with various formats (time, date, datetime) and complexities
BaseConversionDataset
: Convert numbers between different bases (binary, hex, etc.)CaesarCipherDataset
: Encrypt/decrypt text using Caesar cipher with configurable rotationLetterCountingDataset
: Count letter occurrences in text spansNumberFilteringDataset
: Filter numbers based on comparison with thresholdNumberSortingDataset
: Sort lists of numbers in ascending or descending orderWordSortingDataset
: Sort words in ascending or descending order using ASCII/Unicode orderingLetterJumbleDataset
: Unscramble words that have had their letters randomly jumbledSentenceReorderingDataset
: Reorder sentence after words in it have been randomly shuffledSpellBackwardDataset
: Spell individual words backward (e.g. "sun" -> "nus")WordSequenceReversalDataset
: Reverse word order in text spansWordLadderDataset
: Generate word ladder puzzles where one word is transformed into another by changing one letter at a time
BFDataset
: Generates BF programs of various difficult, from simple string printing to loops and conditional logic
NumberSequenceDataset
: Generate number sequences with discoverable patternsColorCubeRotationDataset
: Generate 3D spatial reasoning tasks with colored cube rotations and orientation trackingRubiksCubeDataset
: Generate Rubik's Cube configurations and check correct solutionsFigletFontDataset
: Generate random words in different "Figlet" fonts for reasoning about the structure of letters
PropositionalLogicDataset
: Generate propositional logic reasoning problems
FamilyRelationshipsDataset
: Generate family relationship reasoning tasks with family treesQuantumLockDataset
: Generates puzzles which involve stateful arithmetic and a correct sequence of operations
SudokuDataset
: Generate 9x9 Sudoku puzzles with configurable number of empty cellsMiniSudokuDataset
: Generate 4x4 Mini Sudoku puzzles with configurable difficultyMazeDataset
: Generate a maze with a start and a goalCountdownDataset
: Generate number game tasks where numbers and operators must be combined to reach a target value
- More complex math tasks (algebra, geometry)
- Algorithmic tasks (counting, sorting, re-ordering)
- Logic riddles
- Logic inductive programming tasks
- ARC-AGI synthetic riddles
If you have ideas for additional procedural dataset generators please create an issue here or contact us in the #reasoning-gym
channel of the GPU-Mode discord server.