Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve readme #57

Merged
merged 5 commits into from
Jul 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 29 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,20 +5,33 @@

# HerbBenchmarks.jl

A collection of useful program synthesis benchmarks. Each folder contains a different benchmark and a README.
A collection of useful program synthesis benchmarks.

A benchmark has:
- A readme, including description of data fields and dimensionality
- (multiple) sets of input output examples
- file that references where we found that
- evaluation function/interpretation function
- program grammar
## Benchmark structure

Optional:
- a data grammar or data generation scripts, ideally with a notion of increasing complexity
- download script to download that specific dataset, if too big for Github
- test-/train-split getter functions if specified in the original data
- landmarks for planning problems
Each benchmark has its own folder, including:
- A README with relevant resources for the benchmark (papers, code repos, etc.) and a brief description of the data structure used in the data set.
- `data.jl`: Contains the benchmark data set. A data set consists of one or more problems (or tasks). Each problem is made up of one or more input-output examples. For more details on how a data set should look like, see [Benchmark data](#benchmark-data).
- `citation.bib`: Reference to cite the benchmark.
- `grammar.jl`: One or more program grammar(s).
- `<benchmark_name>_primitives.jl`: Implementation of all primitives and the custom interpret function (if exists).

## Benchmark data

In `data.jl`, a data set follows a specific structure:
- Each data set is represented by `Problem`s.
- A problem has a unique **identifier**, e.g., `"problem_100"`.
- A problem contains a list of `IOExample`s. The input `in` is of type `Dict{Symbol, Any}`, with `Symbol`s following the naming convention `_arg_1_`, `_arg_2_`, etc.

```julia
# Example
problem_100 = Problem("problem_100", [
IOExample(Dict{Symbol, Any}(:_arg_1_ => StringState("369K 16 Oct 17:30 JCR-Menu.ppt", 1)), StringState("16 Oct", nothing)),
IOExample(Dict{Symbol, Any}(:_arg_1_ => StringState("732K 11 Oct 17:59 guide.pdf", 1)), StringState("11 Oct", nothing)),
IOExample(Dict{Symbol, Any}(:_arg_1_ => StringState("582K 18 Oct 12:13 make-01.pdf", 1)), StringState("18 Oct", nothing))
])

```

## How to use:
HerbBenchmarks is still not yet complete and is lacking crucial benchmarking functionality. However, if you want to test on a single problem and grammar, you can do the following
Expand All @@ -27,15 +40,15 @@ Select your favourite benchmark, we use the string transformation benchmark from
```Julia
using HerbSpecification, HerbGrammar

using HerbBenchmarks.PBE_SLIA_Track_2019
using HerbBenchmarks.String_transformations_2020

# The id has to be matching
grammar = PBE_SLIA_Track_2019.grammar_11604909
problem = PBE_SLIA_Track_2019.problem_11604909
grammar = String_transformations_2020.grammar_string
problem = String_transformations_2020.problem_100

# Print out the grammar and problem in readable format
println("grammar:", grammar)
println("problem:", problem.examples)
```

For some benchmarks there is only a single grammar for all problems.
Some benchmarks there is only a single grammar for all problems.
11 changes: 10 additions & 1 deletion src/data/Abstract_Reasoning_2019/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,14 @@
# Abstraction and Reasoning Corpus (ARC) 2019

A benchmark on ARC. For more information please see the full description mentioned in the repo [https://github.com/fchollet/ARC](https://github.com/fchollet/ARC/tree/master).
A benchmark on ARC. ARC tasks are pairs of coloured input-output grids. The size of a grid vary from a single cell to 30 x 30 cells, with each cell having one of 10 possible values (colours).

The `Grid` contains:
- `width` of the grid.
- `height` of the grid.
- `data`: A two-dimensional matrix representing the grid. Grid cells can have values from 0 to 9.

Each `Problem` is a list of examples, with each example consisting of an input `Grid` and an output `Grid`.

For more information please, see the full description mentioned in the [ARC repo](https://github.com/fchollet/ARC/tree/master).


10 changes: 9 additions & 1 deletion src/data/Pixels_2020/README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,15 @@
# Pixels

In the pixels dataset, every problem has a single example. The input is a blank canvas, represented by a 2d grid with boolean values.
In the pixels dataset,
every problem has a single example. The input is a blank canvas, represented by a 2d grid with boolean values.
The output is a drawing of (a combination of) ASCII characters on the same size grid.

The pixels data set contains problems on learning to draw ASCII art on a canvas.
A `PixelState` has the fields:
- `matrix`: a two-dimensional grid of boolean values to represent the canvas.
- `position`: A tuple (x, y) representing a cursor that points to the current position in the grid.

Each problem is represented by one input-output example. The input is a `PixelState` with a blank canvas (matrix of zeros) and the cursor pointing to the top left position of the canvas. The output is a `PixelState` with a drawing of ASCII characters. The canvas size is the same for input and output.

See
> Cropper, Andrew, and Sebastijan Dumančić. "Learning large logic programs by going beyond entailment." arXiv preprint arXiv:2004.09855 (2020).
Loading
Loading