Name		Name	Last commit message	Last commit date
parent directory ..
boa		boa
buildtools		buildtools
data		data
tasks		tasks
tests		tests
utils		utils
README.md		README.md
benchmark.py		benchmark.py
crossproject_create_index.py		crossproject_create_index.py
crossproject_create_project_list.py		crossproject_create_project_list.py
crossproject_prepare.py		crossproject_prepare.py
load_spec_eval_dataset		load_spec_eval_dataset
python_tests		python_tests
requirements.py		requirements.py
requirements.txt		requirements.txt

README.md

MUBench : Pipeline

The MUBench Pipeline allows running experiments on API-misuse detectors, to measure their precision and recall. To enable platform-independent execution of experiments, we recommend using the pipeline command within the MUBench Interactive Shell. Check pipeline -h for details about the available subcommands and options.

Computing Resources

Specific requirements depend on the detector you evaluate, but our minimum recommendations are:

CPUs: ≥2
Memory: ≥8.0 GB

Hint: Docker limits the computing resources available to experiment runs. You can adjust this in the advanced preferences.

Hint: You also have to make the memory available to the JVM for the detector run, by passing --java-options Xmx8G to the pipeline command.

Experiments

The MUBench Pipeline supports the following experiments to measure the precision and recall of API-misuse detectors:

Precision Experiment (ex2)

Measures the precision of the detectors. It runs the detector on real-world projects and requires reviews of the top-N findings per target project, in order to determine the precision.
Recall Upper Bound Experiment (ex1)

Measures the recall upper bound of detectors. It provides detector with hand-crafted examples of correct usage corresponding to the known misuses in the dataset, as a reference for identifying misuses. Requires reviews of all potential hits, i.e., findings in the same method as a known misuse, in order to determine the recall upper bound.
Recall Experiment (ex3)

Measures the recall of detectors. It runs the detector on real-world projects and requires reviews of all potential hits, i.e., findings in the same method as a known misuse, in order to determine the recall.

Run Experiments

The base command to run an experiment is

mubench> pipeline run <E> <D> --datasets <DS>

Where

<E> is the id of the experiment to run,
<D> is the id of the detector, and
<DS> specifies the dataset to use for the experiment.

Example: mubench> pipeline run ex2 DemoDetector --datasets TSE17-ExPrecision

Hint: The --datasets filter is optional. We recommend to always use a filter, since running on the entire benchmark requires much disk space and time.

The first time the pipeline runs a detector on a certain project, the project is cloned from version control and compiled. These preparation steps may take a while. Subsequently, MUBench uses the local clone and the previously compiled classes, such that experiments may run offline and need very little preparation time.

Hint: You may run the preparation steps individually. See pipeline -h for details.

The pipeline will store detector findings after execution and, subsequently, skip running a detector on a project version it ran on before, unless the detector or the project version changed in the meantime. To force the pipeline to rerun the detector use the --force-detect option.

Check pipeline run -h for further details.

If you want publish detector findings to a review site, you may run

mubench> pipeline publish <E> <D> --datasets <DS> -s <R> -u <RU> -p <RP>

Where

<E>, <D>, and <DS> are as above.
<R> is the URL of your review site, and
<RU> and <RP> are the username and password to access your review site with.

Example: pipeline publish ex2 DemoDetector --datasets TSE17-ExPrecision -s http://artifact.page/tse17/ -u sven -p 1234

Running pipeline publish implicitly calls pipeline run.

Check pipeline publish -h for further details.

Experiment Data

When running experiments, MUBench persists experiment data on the host machine. This data is stored in Docker Volumes, which are mounted into the experiment environment:

mubench-checkouts stores the project checkouts and compiled classes (mount point: /mubench/checkouts)
mubench-findings stores the detector-run information and detector findings (mount point: /mubench/findings)

You can manually browse this data in a MUBench Interactive Shell at the designated mount points.

Hint: If you do not mount a volume to these mount points, any respective data is lost when you exit the MUBench Interactive Shell.

Default Configuration

You can specify defaults for command-line arguments by creating a ./default.config in the YAML format. Values for all command-line arguments that begin with -- may be specified in this file using the argument's full name as the key. To set command-line flags by default, use True as their value. For an example on how to do this, see our example default.config.

Argument values specified on the command line always take precedence over respective default values.

To use your default.config for the MUBench Pipeline, add -v /.../your.default.config:/mubench/mubench.pipeline/default.config to the Docker command running MUBench.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mubench.pipeline

mubench.pipeline

README.md

MUBench : Pipeline

Computing Resources

Experiments

Run Experiments

Experiment Data

Default Configuration

Files

mubench.pipeline

Directory actions

More options

Directory actions

More options

Latest commit

History

mubench.pipeline

Folders and files

parent directory

README.md

MUBench : Pipeline

Computing Resources

Experiments

Run Experiments

Experiment Data

Default Configuration