Skip to content

Commit

Permalink
Merge pull request #67 from Herb-AI/dev
Browse files Browse the repository at this point in the history
Finally merge recent progress to `master` from `dev`
  • Loading branch information
ReubenJ authored Jan 30, 2025
2 parents c37099d + 0bd62ba commit 47aba81
Show file tree
Hide file tree
Showing 18 changed files with 5,448 additions and 9,036 deletions.
6 changes: 3 additions & 3 deletions Project.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
name = "HerbBenchmarks"
uuid = "eadf8b74-d38a-4b1a-a063-8d36e493d376"
authors = ["jaapjong <[email protected]>", "Tilman Hinnerichs <[email protected]>", "Sebastijan Dumancic <[email protected]>"]
version = "0.2.1"
version = "0.2.3"

[deps]
FilePathsBase = "48062228-2e41-5def-b9a4-89aafe57970f"
Expand All @@ -16,8 +16,8 @@ SExpressions = "eaa8e424-c5f6-11e8-1b3d-d576ba0eee97"

[compat]
HerbCore = "^0.3.0"
HerbGrammar = "^0.3.0"
HerbSpecification = "^0.1.0"
HerbGrammar = "0.5"
HerbSpecification = "^0.2.0"
julia = "^1.8"

[extras]
Expand Down
45 changes: 29 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,20 +5,33 @@

# HerbBenchmarks.jl

A collection of useful program synthesis benchmarks. Each folder contains a different benchmark and a README.
A collection of useful program synthesis benchmarks.

A benchmark has:
- A readme, including description of data fields and dimensionality
- (multiple) sets of input output examples
- file that references where we found that
- evaluation function/interpretation function
- program grammar
## Benchmark structure

Optional:
- a data grammar or data generation scripts, ideally with a notion of increasing complexity
- download script to download that specific dataset, if too big for Github
- test-/train-split getter functions if specified in the original data
- landmarks for planning problems
Each benchmark has its own folder, including:
- A README with relevant resources for the benchmark (papers, code repos, etc.) and a brief description of the data structure used in the data set.
- `data.jl`: Contains the benchmark data set. A data set consists of one or more problems (or tasks). Each problem is made up of one or more input-output examples. For more details on how a data set should look like, see [Benchmark data](#benchmark-data).
- `citation.bib`: Reference to cite the benchmark.
- `grammar.jl`: One or more program grammar(s).
- `<benchmark_name>_primitives.jl`: Implementation of all primitives and the custom interpret function (if exists).

## Benchmark data

In `data.jl`, a data set follows a specific structure:
- Each data set is represented by `Problem`s.
- A problem has a unique **identifier**, e.g., `"problem_100"`.
- A problem contains a list of `IOExample`s. The input `in` is of type `Dict{Symbol, Any}`, with `Symbol`s following the naming convention `_arg_1_`, `_arg_2_`, etc.

```julia
# Example
problem_100 = Problem("problem_100", [
IOExample(Dict{Symbol, Any}(:_arg_1_ => StringState("369K 16 Oct 17:30 JCR-Menu.ppt", 1)), StringState("16 Oct", nothing)),
IOExample(Dict{Symbol, Any}(:_arg_1_ => StringState("732K 11 Oct 17:59 guide.pdf", 1)), StringState("11 Oct", nothing)),
IOExample(Dict{Symbol, Any}(:_arg_1_ => StringState("582K 18 Oct 12:13 make-01.pdf", 1)), StringState("18 Oct", nothing))
])

```

## How to use:
HerbBenchmarks is still not yet complete and is lacking crucial benchmarking functionality. However, if you want to test on a single problem and grammar, you can do the following
Expand All @@ -27,15 +40,15 @@ Select your favourite benchmark, we use the string transformation benchmark from
```Julia
using HerbSpecification, HerbGrammar

using HerbBenchmarks.PBE_SLIA_Track_2019
using HerbBenchmarks.String_transformations_2020

# The id has to be matching
grammar = PBE_SLIA_Track_2019.grammar_11604909
problem = PBE_SLIA_Track_2019.problem_11604909
grammar = String_transformations_2020.grammar_string
problem = String_transformations_2020.problem_100

# Print out the grammar and problem in readable format
println("grammar:", grammar)
println("problem:", problem.examples)
```

For some benchmarks there is only a single grammar for all problems.
Some benchmarks there is only a single grammar for all problems.
11 changes: 10 additions & 1 deletion src/data/Abstract_Reasoning_2019/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
# Abstraction and Reasoning Corpus (ARC) 2019

A benchmark on ARC. For more information please see the full description mentioned in the repo [https://github.com/fchollet/ARC](https://github.com/fchollet/ARC/tree/master).
A benchmark on ARC. ARC tasks are pairs of coloured input-output grids. The size of a grid vary from a single cell to 30 x 30 cells, with each cell having one of 10 possible values (colours).

The `Grid` contains:
- `width` of the grid.
- `height` of the grid.
- `data`: A two-dimensional matrix representing the grid. Grid cells can have values from 0 to 9.

Each `Problem` is a list of examples, with each example consisting of an input `Grid` and an output `Grid`.

For more information please, see the full description mentioned in the [ARC repo](https://github.com/fchollet/ARC/tree/master).


10 changes: 9 additions & 1 deletion src/data/Pixels_2020/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,15 @@
# Pixels

In the pixels dataset, every problem has a single example. The input is a blank canvas, represented by a 2d grid with boolean values.
In the pixels dataset,
every problem has a single example. The input is a blank canvas, represented by a 2d grid with boolean values.
The output is a drawing of (a combination of) ASCII characters on the same size grid.

The pixels data set contains problems on learning to draw ASCII art on a canvas.
A `PixelState` has the fields:
- `matrix`: a two-dimensional grid of boolean values to represent the canvas.
- `position`: A tuple (x, y) representing a cursor that points to the current position in the grid.

Each problem is represented by one input-output example. The input is a `PixelState` with a blank canvas (matrix of zeros) and the cursor pointing to the top left position of the canvas. The output is a `PixelState` with a drawing of ASCII characters. The canvas size is the same for input and output.

See
> Cropper, Andrew, and Sebastijan Dumančić. "Learning large logic programs by going beyond entailment." arXiv preprint arXiv:2004.09855 (2020).
Loading

0 comments on commit 47aba81

Please sign in to comment.