The DataSet Explorer (DSE) tool supports annotators during the code smell annotation procedure.
DSE tool development started as a part of the Clean CaDET project which was funded by the Science Fund of the Republic of Serbia.
Maintainability is an aspect of software quality that refers to the ease with which software can be modified to correct faults, improve performance, or adjust to a new environment. Software maintainability can be negatively impacted by code smells, which are structures in code that indicate issues in software design or implementation. Software engineering experts agree that detecting and removing harmful code smells is important for high-quality code. Machine learning (ML) models could be used to detect code smells, but the models must be trained on high-quality datasets to be accurate and useful to software engineers.
Datasets created automatically using heuristic-based tools can result in false positives and false negatives. Semi-automated approaches require experts to validate annotations made by tool, but these datasets may still contain false negatives. On the other hand, fully manual approach is challenging. Inconsistent annotations, small size, non-realistic smell-to-non-smell ratio, and poor smell coverage hinder the dataset quality. These issues arise mainly due to the time-consuming nature of manual annotation and annotators' disagreements caused by ambiguous and vague smell definitions.
To speed up and ease the manual code smell annotation, we developed the DataSet Explorer (DSE) tool. This tool supports annotators during the annotation procedure by providing various functionalities described in detail here.
The DSE tool can be used by annotators and ML researchers aiming to build high-quality datasets which can be used to train ML code smell detection models.
Set up and get started with DSE tool by following these instructions.
We outline notable resources that can assist researchers in using our implementation:
- Back-end source code - A repository hosting the source code of the DSE server application
- Front-end source code - A repository hosting the source code of the web UI
- General documentation - Wiki pages explaining our DSE design and supported features
Our project team consists of professors and teaching assistants from the Faculty of Technical Sciences, Novi Sad, Serbia. We are part of the Chair of Informatics, an organizational unit that has traditionally been the local center of excellence for both artificial intelligence and software engineering research.
- The people that make up the Clean CaDET Core are listed here.