This is an artifact package for ISSTA 2023 paper "To Kill A Mutant — An Empirical Study of Mutation Testing Kills".
For reusability, general data structure and other documentation for the experiment are introduced here.
Requirement:
- Docker must be installed on your system.
- Python 3 is required, with the following packages installed: matplotlib, pandas, and numpy. (Please note, some systems may have both Python 2 and Python 3 installed; be sure to use Python 3 in the command line for this project.)
- To illustrate the experimental procedure, we have prepared 10 distinct Docker images, each corresponding to the 10 subject programs discussed in the study. The time required to run the experiment varies across different subject projects, as estimated in the table provided below.
To demonstrate how the experiment was run, we configured 10 separate Docker images for all 10 subject programs in the paper. Different subject projects' experiment requires a different amount of time to run, which is estimated in the table below.
Upon completion of these steps, two types of outputs will be generated in the current directory: a data file (named as 'ProjectName.csv') and a Venn diagram (named as 'ProjectName.png').
Please be aware that these time estimations are based on machines utilizing ARM architecture, as the experiment is expected to run significantly faster on such systems with the provided arm-based docker images. These estimates are derived from trials conducted in an arm-based Docker container on a 2021 MacBook Pro utilizing the M1 Pro chip.
If you're using other machines using x86 architecture, such as a 2016 MacBook Pro equipped with an Intel chip or Windows machine, you could still run the provided 10 arm-based docker images with the latest version of Docker. However, you may find that the required time to run the experiment can be considerably longer, because docker runs these in an emulated mode. For instance, it may take approximately 80 minutes to run the experiment for 'commons-validator' on Macbook Pro 2021.
To demonstrate the experiment we conducted in not limited to arm-based machines, we also configured one x86-based docker image for x86-based machines in the Detailed Instructions section for commons-validator. Procedures of building docker images for other subject programs are provided in the Data Structure section.
subject name | estimated amount of time |
---|---|
gson | 10min |
commons-io | 1h 40min |
commons-lang | 2h |
commons-csv | 7min |
joda-time | 1h 2min |
jsoup | 44min |
xmlgraphics | 7min |
commons-text | 35min |
commons-jexl | 2h 40min (30 GB memory are expected to be occupied) |
commons-validator | 4min (8min for surface book 2 machine) |
pull the image:
docker pull qinfendeheichi/validator-replication:latest
run the container with a name (exp_container)
docker run --name exp_container qinfendeheichi/validator-replication
get the csv file and venn graph from the container to the current directory
docker cp exp_container:/commons-validator/project/commons-validator.csv .
docker cp exp_container:/commons-validator/project/commons-validator.png .
pull the image:
docker pull qinfendeheichi/validator-replication-amd:latest
run the container with a name (exp_container)
docker run --name exp_container_amd qinfendeheichi/validator-replication-amd
get the csv file and venn graph from the container to the current directory
docker cp exp_container_amd:/commons-validator/project/commons-validator.csv .
docker cp exp_container_amd:/commons-validator/project/commons-validator.png .
docker pull qinfendeheichi/gson-replication:latest
docker run --name gson qinfendeheichi/gson-replication
docker cp gson:/gson/gson.csv .
docker cp gson:/gson/gson.png .
docker pull qinfendeheichi/commons-io-replication:latest
docker run --name commons-io qinfendeheichi/commons-io-replication
docker cp commons-io:/commons-io/project/commons-io.csv .
docker cp commons-io:/commons-io/project/commons-io.png .
docker pull qinfendeheichi/commons-lang-replication:latest
docker run --name commons-lang qinfendeheichi/commons-lang-replication
docker cp commons-lang:/commons-lang/project/commons-lang.csv .
docker cp commons-lang:/commons-lang/project/commons-lang.png .
docker pull qinfendeheichi/commons-csv-replication:latest
docker run --name commons-csv qinfendeheichi/commons-csv-replication
docker cp commons-csv:/commons-csv/project/commons-csv.csv .
docker cp commons-csv:/commons-csv/project/commons-csv.png .
docker pull qinfendeheichi/joda-time-replication:latest
docker run --name joda-time qinfendeheichi/joda-time-replication
docker cp joda-time:/joda-time/project/joda-time.csv .
docker cp joda-time:/joda-time/project/joda-time.png .
docker pull qinfendeheichi/jsoup-replication:latest
docker run --name jsoup qinfendeheichi/jsoup-replication
docker cp jsoup:/jsoup/project/jsoup.csv .
docker cp jsoup:/jsoup/project/jsoup.png .
docker pull qinfendeheichi/xmlgraphics:latest
docker run --name xmlgraphics qinfendeheichi/xmlgraphics-replication
docker cp xmlgraphics:/xmlgraphics/project/xmlgraphics.csv .
docker cp xmlgraphics:/xmlgraphics/project/xmlgraphics.png .
docker pull qinfendeheichi/commons-text:latest
docker run --name commons-text qinfendeheichi/commons-text-replication
docker cp commons-text:/commons-text/project/commons-text.csv .
docker cp commons-text:/commons-text/project/commons-text.png .
docker pull qinfendeheichi/commons-jexl:latest
docker run --name commons-jexl qinfendeheichi/commons-jexl-replication
docker cp commons-jexl:/commons-jexl/project/commons-jexl.csv .
docker cp commons-jexl:/commons-jexl/project/commons-jexl.png .
In this section, we use this artifact to back up key claims described in the paper.
From Getting Started section, we get a csv file (details explained here) and a venn graph. Now we are going to extract relevant data from the csv file to replicate the claims we made in the research paper. (The original data presented in the paper are here)
The essential Python scripts for this project can be found in the 'tools/RQs' directory. Please ensure that the CSV file and Python scripts are situated in the same directory.
It's important to note that the experimental process may exhibit some level of flakiness, which could be due to:
- Flaky tests or tests that result in flaky coverage.
- Constraints in PIT resources, leading to flakiness when labeling certain TIMED_OUT or MEMORY_ERROR mutants. The results obtained from the data analysis, as detailed in the study, are anticipated to be very similar, but not necessarily identical, to those achieved in a separate experimental run. This consistency underscores the overall conclusions and assertions presented in the paper.
How many tests execute the mutation and what are their failure rates?
to get such data per project, run
python RQ1.py commons-csv.csv
Replace "commons-csv.csv" with another csv file's name for another subject project.
Results are expected to be generated for the relevant subject program corresponding to Table 2 in the paper.
As stated above, due to test flakiness, some data differences are expected but they should not deviate too much from what we reported in the paper.
How does a test kill a mutant?
python RQ2.py commons-csv.csv
corresponding to Table 3 in paper.
The Venn Graph is generated in the previous section that corresponds to Figure 3 in the paper.
Are certain mutation operators more prone to certain test failure causes?
python RQ3.py commons-csv.csv
corresponding to Table 4 in paper.
What types of uncaught exceptions kill mutants
python RQ4.py commons-csv.csv
A figure corresponding to Figure 4 in the paper is expected to be first presented. After closing the graph, data are expected to be shown corresponding to Table 5 in the paper.