Skip to content

Latest commit

 

History

History
108 lines (88 loc) · 6.81 KB

README.md

File metadata and controls

108 lines (88 loc) · 6.81 KB

TRIAD

This repository contains the implementation and evaluation program for our ICSE'2024 paper "TRIAD: Automated Traceability Recovery based on Biterm-enhanced Deduction of Transitive Links among Artifacts".

Traceability allows stakeholders to extract and comprehend the trace links among software artifacts introduced across the software life cycle, to provide significant support for software engineering tasks. Despite its proven benefits, software traceability is challenging to recover and maintain manually. Hence, plenty of approaches for automated traceability have been proposed. Most rely on textual similarities among software artifacts, such as those based on Information Retrieval (IR). However, artifacts in different abstraction levels usually have different textual descriptions, which can greatly hinder the performance of IR-based approaches (e.g., a requirement in natural language may have a small textual similarity to a Java class). In this work, we leverage the consensual biterms and transitive relationships (i.e., inner- and outer-transitive links) based on intermediate artifacts to improve IR-based traceability recovery.

  • We first extract and filter biterms from all source, intermediate, and target artifacts.
  • We then use the consensual biterms from the intermediate artifacts to enrich the texts of both source and target artifacts,
  • and finally deduce outer and inner-transitive links to adjust text similarities between source and target artifacts.

The framework of TRIAD

We conducted a comprehensive empirical evaluation based on five systems widely used in other literature to show that our approach can outperform four state-of-the-art approaches in AP over 15% and MAP over 10% on average, and how its performance is affected by different conditions of source, intermediate, and target artifacts.

Running TRIAD

Environment Required

  • Java version 11
  • dependencies management with Maven

Running for RQ1: To what extent does TRIAD exceed the performance of baseline approaches?

  • run main() in src/main/java/RunWithBaseline.java
  • set evaluated project by projectEnum parameter
  • set evaluated ir model by irEnum parameter
  • four baselines include IR-ONLY, TAROT, LIA, and COMET

Running for RQ2: What is the individual impact of biterms, outer- and inner-transitive on performance?

  • run main() in src/main/java/RunTRIAD.java.
  • set evaluated project by projectEnum parameter
  • set evaluated ir model by irEnum parameter

Code Structures

├── RunWithBaselines.java             <- Run result for RQ1.
│
├── RunTRIAD.java                     <- Run result for RQ2.
│
├── approach                          <- TRIAD and four approaches.
│   ├── TRAID.java                    <- Implemention of TRIAD.
│   ├── TRAID_NoBiterm.java           <- Implemention of TRIAD without biterms.
│   ├── TAROT.java                    <- Implemention of TAROT.
│   ├── COMET.java                    <- Implemention of COMET.
│   └── LIA.java                      <- Implemention of LIA.
│
├── experiment                        <- Contain all information about the experiment.
│   ├── preprocess                    <- Preprocess datasets, including text preprocess and biterms extraction.
│   ├── project                       <- Information of evaluated projects.
│   ├── transitive                    <- Two types of transitive strategies.
│   │   ├── OuterTransitive.java      <- Only consider outer-transitive links (e.g., S1→I1→T1).
│   │   └── OuterInnerTransitive.java <- Consider outer-inner combined transitive links (e.g., S1→S2→I1→T1 and S1→I1→I2→T1 ).
│   ├── enum                          <- Enum types used in this project.
│   └── Result.java                   <- Result of each approach.
│
├── model                             <- Three IR models (i.e., VSM, LSI, and JSD).
│   ├── VSM                           <- VSM model.
│   ├── LSI                           <- LSI model.
│   └── JSD                           <- JSD model.
│
├── document                          <- Model artifacts and links into entity classes.
│
└── util                              <- Utilities class used in the project.

Datasets

Overview of the five evaluated systems:

DatasetsourceIntermediateTargetS→II→TI→T
Dronology Requirement:58 Design Definitions:144 Source Code:184 Req→DD:132 DD→Src:563 Req→Src:393
WARCNon-Func. Reqs:21 Specifications:89 Func. Reqs:42 NFR→SRS:58 SRS→FRS:78 NFR→FRS:45
EasyClinic Use Case:30 Interaction Descr.:20 Code Descr.:47 UC→ID:132 ID→CD:563 UC→CD:393
EBTRequirement:44 Test Case Descr.:25 Source Code:50 Req→TC:51 TC→Src:93 Req→Src:98
LibESTRequirement:52 Test Code:21 Source Code:14 Req→Test:352 Test→Src:108 Req→Src:204

Using Your Own Data

Step1: add statement of the project in src/main/java/experiment/enums/ProjectEnum.java

Step2: new an entity class of the project in src/main/java/experiment/project

Step3: create file folders for the project in dataset and copy artifacts

Step4: preprocess artifacts

Step5: extract biterms

  • Extracting biterms from natural language written artifacts (refer WARC)
  • Extracting biterms from programming language written artifacts (refer Dronology and LibEST)
    • We only provide implementation of extract biterm from C (i.e., LibEST) and Java (i.e., Dronology) code. If you want to extract biterms from the other programming language. you can take following steps:
      1. parse code files with available parser tool to get identifier names (i.e., class name, method name, invoked method name, field name and its type, and parameter name and its type) and comments;
      2. extract candidate biterms from identifier names by combining any two splitted terms sequentially.

Step6: run TRIAD