Skip to content

comby-tools/comby-reducer

Repository files navigation

comby-reducer

A program and data format reducer for arbitrary language syntax. Produces human comprehensible output. Define declarative transformations with ease.

reduce-gif-900

Quick start

Install the comby-reducer binary on your path with npm:

npm install -g @comby-tools/comby-reducer

Invoke it like this:

comby-reducer <file-to-reduce> --transforms ./transforms -- <crashing-program> @@

To feed the file input to <crasing-=program> via stdin, invoke it like this instead:

comby-reducer <file-to-reduce> --stdin --transforms ./transforms -- <crashing-program>

Alternative local install

Install comby-reducer in a local directory at ./node_modules/.bin/comby-reducer. If you see some warnings just ignore them.

npm install @comby-tools/comby-reducer

Example

Let's say you just ran a program that crashed a compiler and want to find a smaller example program that triggers the same crash. We'll simulate how to find a smaller example program with comby-reducer.

Step 1. Clone the repository

git clone https://github.com/comby-tools/comby-reducer

In example/program.c you'll find the program we'll reduce:

#include<stdio.h>
#include<string.h>

int main(int argc, char **argv) {
  if (argv[1]) {
      printf("I can't believe it's not butter");
  }
  // But I want to believe it's not butter...
  memset(NULL, 1, 1);
}

The memset statement causes a crash when we run this program. There's some junk in there that we don't need to trigger the crash. Let's get started.

Step 2. Go into the example directory

cd example

Next, we'll use a "pretend compiler" that crashes when it "compiles" our program (in reality, our "compiler" crashes when it runs a valid C program, not when actually compiling it, but we'll suspend our greater knowledge for now).

Step 3: Run this command to crash the compiler

./compiler.sh program.c

You'll see something like this at the end: ./compiler.sh: line 7: 41936 Segmentation fault: 11 ./program

Step 4: Reduce the program

comby-reducer program.c --file /tmp/in.c --lang .c --transforms ../transforms -- ./compiler.sh @@

You should see:

[+] Loaded 22 transformation rules
[+] Did pass 0 pass
[+] Did pass 1 pass
[+] Did pass 2 pass
[+] Did pass 3 pass
[+] Result:
#include<string.h>
void main() {
  memset(NULL, 1, 1);
}

Nice, our program is smaller! comby-reduce found that a smaller valid program keeps crashing our "compiler", but without the cruft.

Let's break down the command invocation:

  • The part after -- is the command we want to run that causes a crash. In our case, ./compiler.sh @@

    • The @@ part is substituted with a file containing a program (like program.c)
    • To feed input from stdin, remove the @@ and add the --stdin command line flag.
  • --file /tmp/in.c says that the @@ we substitute should be named /tmp/in.c. The .c extension may matter if our compiler expects a file with a .c extesion, for example. comby-reducer will try borrow the extension of the original file but --file exists to give you control over the file name that your program sees.

  • --lang .c says that the language we want to reduce is C-like. comby-reducer uses language definitions to parse input according to some language. This matters so that our transforms can accurately match strictly code blocks and avoids bothering with not-actually-code-syntax that come up in comments and strings. This may not be a big deal. You can use --lang .generic if you have some DSL or smart contract language. Here's the list of specific language parsers.

  • --transforms <dir> loads transform definitions from .toml files in the specified dir (default dir is transforms). Transforms are specified in a TOML format using comby syntax. See Usage below for more details.

Usage

Transformations

comby-reducer makes it easy to write rules for transformation using comby syntax. A handful of defaults are included in transforms/config.toml that will probably get you very far already. Here are some examples.

[empty_paren]
match='(:[1])'
rewrite='()'
rule='where nested'

This transform matches any content between balanced parentheses (including newlines) and deletes the content. The :[1] is a variable that can be used in the rewrite part. By default, comby-reducer will try to apply this transformation at the top-level of a file, wherever it sees (...). The rule='where nested' tells comby-reducer that it should also attempt to reduce nested matches of (...) inside other matched (...). In general, parentheses are a common syntax to nest expressions in programs, so it makes sense to add rule='where nested'.

Another transform preserves the first comma-separated element inside parentheses:

[preserve_first_paren_element]
match='(:[1],:[2])'
rewrite='(:[1])'

Program syntax often use call or function-like syntax that comma-separate parameters or arguments inside parenthes. This transformation attempts to remove elements in such syntax. This transform doesn't have a rule part, since it might not be as fruitful to attempt nested reductions inside of :[1] or :[2]. But, we could easily add it.

A last example uses a special form :[var:e] which matches "expression-like" syntax.

[remove_first_expression_for_colon_sep_space]
match=':[1:e], :[2:e]'
rewrite=':[2]'

Expression-like syntax matches contiguous non-whitespace characters like foo or foo.bar, as well as contiguous character sequences that include valid code block structures like balanced parentheses in function(foo, bar) (notice how whitespace is allowed inside the parentheses). The transform above will attempt to remove expression-like syntax between commas, which often separate expressions inside objects, records, or lists.

More info. You can learn more about the underlying matching engine at comby.dev. You can try out transformations on comby.live to check that a transformation behaves the way you want it to.

Limitations and known issues.

  • Although regular expression matching is possible with :[...] syntax in comby, it's not yet possible to write regular expression holes in comby-reducer transforms.

  • Some inputs may trigger a stack overflow in node. Post a GH issue with the input if you run into this.

Tips

Customize crash criteria with scripts

comby-reducer expects a program to exit with signal 139 or 134 to consider it a crash. Many programs that crash won't exit with these values, however. For example, the Solidity compiler exits with a signal 1. Even more challenging, the exit signal 1 may mean that the program crashes, or that the program doesn't compile (and we want the program to still compile). The exit signal 1 is not a reliable way to know that the program crashed "for real". What to do?

It'll depend on your program, but you generally want to define some criteria that constitutes a valid crash, and wrap that logic in a script. For Solidity, a valid program that crashes the compiler will emit something like:

Internal compiler error during compilation:
/solidity/libsolidity/ast/Types.h(797): Throw in function virtual std::unique_ptr<ReferenceType>

We can use this information in a script, and exit with the expected crash code to signal a crash. Here's one I used before, called compile.sh, that will exit the script with signal 139 when it sees the Internal compiler error message:

#!/bin/bash

RESULT=$(~/solidity/build/solc/solc $1 2>&1)
MATCH=$(echo $RESULT | grep -c "Internal compiler error")
if [ $MATCH == 0 ]; then
        exit 0 # no match, program doesn't cause expected crash
fi

exit 139

You can get very fancy with your script, and can use it further refine program reduction. For more inspiration read up on interestingness tests covered by C-reduce.

Output

Output the final reduced program by piping the comby-reducer command to a file. The final program is sent to stdout, the informative messages are printed to stderr.

Options

Some additional command line flags:

--record is an optional flag that emits the program at each step of a successful reduction, in the form <num>.step, in the current directory. You can replay the transformations by running comby-reducer-replay in the current directory. See more on comby-reducer-replay below.

--lang <extension> is a flag that determines how the source file is parsed. Using an extension like .c or .go will make comby-reducer parse the input according to that language.

click to expand the list of accepted extensions
.s        Assembly
.sh       Bash
.c        C
.cs       C#
.css      CSS
.dart     Dart
.dyck     Dyck
.clj      Clojure
.elm      Elm
.erl      Erlang
.ex       Elixir
.f        Fortran
.fsx      F#
.go       Go
.html     HTML
.hs       Haskell
.java     Java
.js       JavaScript
.jsx      JSX
.json     JSON
.jsonc    JSONC
.gql      GraphQL
.dhall    Dhall
.jl       Julia
.kt       Kotlin
.tex      LaTeX
.lisp     Lisp
.nim      Nim
.ml       OCaml
.paren    Paren
.pas      Pascal
.php      PHP
.py       Python
.re       Reason
.rb       Ruby
.rs       Rust
.scala    Scala
.sql      SQL
.swift    Swift
.txt      Text
.ts       TypeScript
.tsx      TSX
.xml      XML
.generic  Generic

--transforms <dir> will use .toml transform definitions in the specified dir.

--debug will emit the reduced program after each step, and the transformation that succeeded to stderr.

comby-reducer-replay

comby-reducer-replay is the answer to "How was my program reduced?". comby-reducer-replay is installed along with comby-reducer and should be available based on how you installed it.

After running comby-reducer with --record, simply run comby-reducer-replay in the current directory, and step through the transformed program at each step (left and right arrow keys). Try running comby-reducer-replay inside replay-example to step through a recording of a previous crash reduction for a Solidity compiler bug.

By default replays will use git diff to render changes. To override the default, a custom diff command can be entered on the command-line like this:

comby-reducer-replay colordiff -y

where colordiff -y shows a side-by-side colored diff of changes. Underneath the hood, the .step files will be appended to the command, like colordiff -y 000.step 001.step

Some sensible default flags are included for common diff tools, which you can explore by entering only the name of the tool and no other extra command line flags:

comby-reducer-replay git               # the default
comby-reducer-replay patdiff           # an enhanced patience diff tool
comby-reducer-replay colordiff         # colordiff, configured to render side-by-side
comby-reducer-replay diff              # plain old diff, configured to render side-by-side

I recommend installing patdiff for an enhanced viewing experience. patdiff simply understands diffs a bit better. To get patdiff, you'll have to:

  • Install opam with

    sh <(curl -sL https://raw.githubusercontent.com/ocaml/opam/master/shell/install.sh)

  • Run eval $(opam env)

  • Run opam install patdiff

And patdiff should now be available on your path.

Development

  • Install npm.
  • Install npx.
npm i typescript @types/node minimist @types/minimist @iarna/toml @types/iarna__toml
npx tsc