Herb-AI · sebdumancic · Jul 9, 2024 · Jul 5, 2024 · Jul 5, 2024 · Jul 5, 2024
diff --git a/README.md b/README.md
@@ -5,20 +5,33 @@
 
 # HerbBenchmarks.jl
 
-A collection of useful program synthesis benchmarks. Each folder contains a different benchmark and a README.
+A collection of useful program synthesis benchmarks. 
 
-A benchmark has:
-- A readme, including description of data fields and dimensionality
-- (multiple) sets of input output examples
-- file that references where we found that
-- evaluation function/interpretation function
-- program grammar
+## Benchmark structure
 
-Optional:
-- a data grammar or data generation scripts, ideally with a notion of increasing complexity
-- download script to download that specific dataset, if too big for Github
-- test-/train-split getter functions if specified in the original data
-- landmarks for planning problems
+Each benchmark has its own folder, including:
+- A README with relevant resources for the benchmark (papers, code repos, etc.) and a brief description of the data structure used in the data set.
+- `data.jl`: Contains the benchmark data set. A data set consists of one or more problems (or tasks). Each problem is made up of one or more input-output examples. For more details on how a data set should look like, see [Benchmark data](#benchmark-data). 
+- `citation.bib`: Reference to cite the benchmark.
+- `grammar.jl`: One or more program grammar(s).
+- `<benchmark_name>_primitives.jl`: Implementation of all primitives and the custom interpret function (if exists).
+
+## Benchmark data
+
+In `data.jl`, a data set follows a specific structure:
+- Each data set is represented by  `Problem`s.
+- A problem has a unique **identifier**, e.g., `"problem_100"`.
+- A problem contains a list of `IOExample`s. The input `in` is of type `Dict{Symbol, Any}`, with `Symbol`s following the naming convention `_arg_1_`, `_arg_2_`, etc.
+
+```julia
+# Example 
+problem_100 = Problem("problem_100", [
+	IOExample(Dict{Symbol, Any}(:_arg_1_ => StringState("369K 16 Oct 17:30 JCR-Menu.ppt", 1)), StringState("16 Oct", nothing)), 
+	IOExample(Dict{Symbol, Any}(:_arg_1_ => StringState("732K 11 Oct 17:59 guide.pdf", 1)), StringState("11 Oct", nothing)), 
+	IOExample(Dict{Symbol, Any}(:_arg_1_ => StringState("582K 18 Oct 12:13 make-01.pdf", 1)), StringState("18 Oct", nothing))
+])
+
+```
 
 ## How to use:
 HerbBenchmarks is still not yet complete and is lacking crucial benchmarking functionality. However, if you want to test on a single problem and grammar, you can do the following
@@ -27,15 +40,15 @@ Select your favourite benchmark, we use the string transformation benchmark from
 ```Julia
 using HerbSpecification, HerbGrammar
 
-using HerbBenchmarks.PBE_SLIA_Track_2019
+using HerbBenchmarks.String_transformations_2020
 
 # The id has to be matching
-grammar = PBE_SLIA_Track_2019.grammar_11604909
-problem = PBE_SLIA_Track_2019.problem_11604909
+grammar = String_transformations_2020.grammar_string
+problem = String_transformations_2020.problem_100
 
 # Print out the grammar and problem in readable format
 println("grammar:", grammar)
 println("problem:", problem.examples)
 ```
 
-For some benchmarks there is only a single grammar for all problems. 
+Some benchmarks there is only a single grammar for all problems. 
diff --git a/src/data/Abstract_Reasoning_2019/README.md b/src/data/Abstract_Reasoning_2019/README.md
@@ -1,5 +1,14 @@
 # Abstraction and Reasoning Corpus (ARC) 2019
 
-A benchmark on ARC. For more information please see the full description mentioned in the repo [https://github.com/fchollet/ARC](https://github.com/fchollet/ARC/tree/master).
+A benchmark on ARC. ARC tasks are pairs of coloured input-output grids. The size of a grid vary from a single cell to 30 x 30 cells, with each cell having one of 10 possible values (colours).
+
+The `Grid` contains:
+- `width` of the grid.
+- `height` of the grid.
+- `data`: A two-dimensional matrix representing the grid. Grid cells can have values from 0 to 9.
+
+Each `Problem` is a list of examples, with each example consisting of an input `Grid` and an output `Grid`.
+
+For more information please, see the full description mentioned in the [ARC repo](https://github.com/fchollet/ARC/tree/master).
 
 
diff --git a/src/data/Pixels_2020/README.md b/src/data/Pixels_2020/README.md
@@ -1,7 +1,15 @@
 # Pixels
 
-In the pixels dataset, every problem has a single example. The input is a blank canvas, represented by a 2d grid with boolean values.
+In the pixels dataset, 
+every problem has a single example. The input is a blank canvas, represented by a 2d grid with boolean values.
 The output is a drawing of (a combination of) ASCII characters on the same size grid.
 
+The pixels data set contains problems on learning to draw ASCII art on a canvas.
+A `PixelState` has the fields:
+- `matrix`: a two-dimensional grid of boolean values to represent the canvas.
+- `position`: A tuple (x, y) representing a cursor that points to the current position in the grid. 
+
+Each problem is represented by one input-output example. The input is a `PixelState` with a blank canvas (matrix of zeros) and the cursor pointing to the top left position of the canvas. The output is a `PixelState` with a drawing of ASCII characters. The canvas size is the same for input and output. 
+
 See
 > Cropper, Andrew, and Sebastijan Dumančić. "Learning large logic programs by going beyond entailment." arXiv preprint arXiv:2004.09855 (2020).