An R package of JIRA defect datasets and tool suites for explainable software analytics.
the JIRA defect datasets can be referenced as:
Author={Yatish, Suraj and Jiarpakdee, Jirayus and Thongtanunam, Patanamon and Tantithamthavorn, Chakkrit},
Title = {Mining Software Defects: Should We Consider Affected Releases?},
Booktitle = {The International Conference on Software Engineering (ICSE)},
Year = {2019}
To prepare execution enrionment, please run the command below in terminal.
Then, install Rnalytica
with the devtools
R package:
To load the library:
To list all 131 defect datasets:
To load a defect dataset from the Rnalytica R package:
Data = loadDefectDataset('groovy-1_5_7','jira')
To visualize pair-wise correlations among input metrics and presents using a visualization of the hierarchical cluster analysis:
plotVarClus(dataset = Data$data, metrics = Data$indep)
To automatically remove irrelevant metrics and mitigate correlated metrics with AutoSpearman:
AutoSpearman(dataset = Data$data, metrics = Data$indep)
import rpy2
from rpy2.robjects.packages import importr
from rpy2.robjects import r, pandas2ri, StrVector
import pandas as pd
Rnalytica = importr('Rnalytica')
features_names = ["CountDeclMethodPrivate","AvgLineCode","CountLine","MaxCyclomatic","CountDeclMethodDefault",
data_train = pd.read_csv("datasets/activemq-5.0.0.csv")
X_train = data_train[features_names]
results = Rnalytica.AutoSpearman(dataset = X_train, metrics = rpy2.robjects.StrVector(features_names))
['CountDeclMethodPrivate' 'CountDeclMethodDefault' 'AvgEssential'
'CountDeclClassVariable' 'CountDeclClassMethod' 'AvgLineComment'
'AvgCyclomaticModified' 'CountDeclClass' 'CountDeclMethodProtected'
'CountDeclInstanceVariable' 'CountDeclMethodPublic' 'RatioCommentToCode'
'AvgLineBlank' 'PercentLackOfCohesion' 'MaxInheritanceTree'
'CountClassDerived' 'CountClassCoupled' 'CountClassBase'
'CountInput_Mean' 'CountInput_Min' 'CountOutput_Min' 'MaxNesting_Min'