This project contains a corpus of data simulating API evolution in Java. The corpus includes two versions of a library, which contain synthetic "dummy" API classes that have evolved after through these versions.
Backward compatibility is an issue to cope with today. To assess the backward compatibility, a lot of tools exist, but the question is how the tools perform. It is not easy to answer without a proper benchmark. Any benchmarks, however, need a test data. They are provided here.
In this updated version, several new features were introduced to Kamil Jezek and Jens Dietrich's original benchmark.
These improvements include the atomization of all the breaking change files for a more granular evaluation, and the incorporation of a new tool named roseau
alongside the still maintained ones.
In addition to that, the precision, recall and execution time for each tool are now assessed, providing insights into its ability to detect both true and false positives, while also evaluating the overall tool performance.
The results are not only presented in the benchmark.csv
file but equally displayed in the bar charts precision_recall_chart.png
& execution_times_chart.png
to facilitate visual comparison.
These metrics are also organized in separate .csv
files for convenient data access, namely precisions.csv
, recalls.csv
and execution_times.csv
.
All said files are automatically generated upon running the ./benchmark.sh
command, maintaining the same user-friendly benchmark usage.
The corpus contains two main directories: lib-v1 and lib-v2 with API examples in one version and its evolution in a following version. Each directory contains thousands of packages, where each package simulates one API change. The change is projected in the package name.
Directory client contains a simple client application that invokes API from the library. Its purpose is to simulate usage of the API.
One can build the corpus simply by typing:
ant jar
It produces three JAR files: lib-v1.jar
, lib-v2.jar
and client.jar
containing all API classes and the client compiled in byte-code.
The data may be used as such to benchmark third-party tools or any other experiments. We provide benchmark of some tools bellow.
A set of other materials is provided to help with using the corpus. First, a table with API incompatibility results may be generated. The generation is invoked simply by typing:
./build_ground_truth.sh
This script tries to compile and run the client with both library versions and generates a CSV
file with results.
The file lists one-by-one each API change and informs if the change is source
or binary
compatible.
Example showing unboxing
of a constant and access modifier
change, where "1" means compatible while "0" incompatible:
Change | Source | Binary |
---|---|---|
dataTypeIfazeConstantUnboxing | 1 | 0 |
accessModifierClazzAccessDecrease | 0 | 0 |
Third party tools may be benchmarked. Run:
./benchmark.sh
It will invoke tested tools that are fed with the corpus data and their ability to discover API changes is checked.
The results are stored in output/benchmark.csv
table, where "1" means that a change was detected while "0" means that it was not.
The set of tested tools may be extended.
Just edit the script tools/run.sh
and add lines invoking the tool.
Make sure the output of the tool is stored in directory output/reports/
in a textual form.
The benchmark script can then grap the report and parse results.
Note that a detected API change is obtained simply by string matching.
Similarly, the set of test data may be extended simply by adding new changes to the src
dirs.
The data should only follow the naming pattern already used.
Tools included in the benchmark:
- Japicmp (https://github.com/siom79/japicmp)
- Revapi (http://revapi.org/)
- Roseau (https://github.com/alien-tools/roseau)
japicmp | revapi | roseau | |
---|---|---|---|
Precision | 97.24% | 84.13% | 98.36% |
Recall | 58.24% | 96.15% | 98.9% |