This repository contains python scripts required to extract metrics for DEFECTS4J Project. Two '.csv' files containing the training data. A Machine Learning model built using Weka v3.8.
- Eclipse Plugin CodePro, SourceMonitor
- Weka version 3.8
Each python script contains relevant comments on its usage in the beginning of the file.
In general, most of the Python scripts need to be provided with the path of the folders as argument.
example:-python DDU.py filepath
Two .csv files :
TrainingData_2.csv
contains overall data. Used for testing Static, Dynamic,Test and Bug metrics seperately. Contains labels - Good, Bad and UnknownTraining_2_GoodBad.csv
contains only relevant metrics. Can be used to test the final best model. Labels- Good and Bad.
- Load the training set
Training_2_GoodBad.csv
on Weka. - Parameter list check the File attribute and press 'Remove'. Its not a relevant attribute for modeling.
- Classify tab. Load the 'BestModel' in the model folder. Right click on the model and choose the option 'Reapply this models configuration'.
- Choose K fold. Set the number of folds to 7( for a good result)
- Press Start.
Model: Random Forests Attributes: Based on correlation matrix Labels: Three labels – Good , Bad 7 fold cross validation
Result
Correctly Classified 90.2736%
Incorrectly Classified 8.7349 %
1.Lines
2.Max Complexity
3.Max Depth
4.num_of_tests
5.num_of_passed_tests
6.num_of_failed_tests
7.CBO(Coupling between Objects)
8.IFC(Information Flow complexity)
9.Density
10.Diversity
11.Uniqueness
12.DDU
13.No of Modified
14.No of Chunks
15.No of Failing tests
16.No of Repair Actions
17.Exception Type
- Collect more metrics esp. Dynamic.
- Try normalizing or using PCA visualization to choose the label thresholds.