Exploring the Relationship between Design Metrics and Software Diagnostics using Machine Learning

This repository contains python scripts required to extract metrics for DEFECTS4J Project. Two '.csv' files containing the training data. A Machine Learning model built using Weka v3.8.

Requirement:

Eclipse Plugin CodePro, SourceMonitor
Weka version 3.8

Python Scripts:

Each python script contains relevant comments on its usage in the beginning of the file. In general, most of the Python scripts need to be provided with the path of the folders as argument. example:-python DDU.py filepath

Datasets:

Two .csv files :

TrainingData_2.csv contains overall data. Used for testing Static, Dynamic,Test and Bug metrics seperately. Contains labels - Good, Bad and Unknown
Training_2_GoodBad.csv contains only relevant metrics. Can be used to test the final best model. Labels- Good and Bad.

How to run the Model:

Load the training set Training_2_GoodBad.csv on Weka.
Parameter list check the File attribute and press 'Remove'. Its not a relevant attribute for modeling.
Classify tab. Load the 'BestModel' in the model folder. Right click on the model and choose the option 'Reapply this models configuration'.
Choose K fold. Set the number of folds to 7( for a good result)
Press Start.

Model: Random Forests Attributes: Based on correlation matrix Labels: Three labels – Good , Bad 7 fold cross validation

Result
Correctly Classified 90.2736%
Incorrectly Classified 8.7349 %

List of Important Metrics

1.Lines
2.Max Complexity
3.Max Depth
4.num_of_tests
5.num_of_passed_tests
6.num_of_failed_tests
7.CBO(Coupling between Objects)
8.IFC(Information Flow complexity)
9.Density
10.Diversity
11.Uniqueness
12.DDU
13.No of Modified
14.No of Chunks
15.No of Failing tests
16.No of Repair Actions
17.Exception Type

TODO

Collect more metrics esp. Dynamic.
Try normalizing or using PCA visualization to choose the label thresholds.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Data		Data
Model		Model
Scripts		Scripts
Fault Localization using Machine Learning.pptx		Fault Localization using Machine Learning.pptx
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exploring the Relationship between Design Metrics and Software Diagnostics using Machine Learning

Requirement:

Python Scripts:

Datasets:

How to run the Model:

List of Important Metrics

TODO

About

Releases

Packages

Languages

krishnanpooja/Fault-Localization

Folders and files

Latest commit

History

Repository files navigation

Exploring the Relationship between Design Metrics and Software Diagnostics using Machine Learning

Requirement:

Python Scripts:

Datasets:

How to run the Model:

List of Important Metrics

TODO

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages