2021
DOI: 10.1007/s42979-020-00408-4
|View full text |Cite
|
Sign up to set email alerts
|

A Clustering Approach Towards Cross-Project Technical Debt Forecasting

Abstract: Technical debt (TD) describes quality compromises that can yield short-term benefits but may negatively affect the quality of software products in the long run. A wide range of tools and techniques have been introduced over the years in order for the developers to be able to determine and manage TD. However, being able to also predict its future evolution is of equal importance to avoid its accumulation, and, in turn, the unlikely event of making the project unmaintainable. Although recent research endeavors h… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
13
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
3
3

Relationship

1
5

Authors

Journals

citations
Cited by 12 publications
(18 citation statements)
references
References 65 publications
0
13
0
Order By: Relevance
“…We have aggregated the matched cluster prototypes from different repository sets by taking the mean of the matched prototypes for each cluster -the result is presented in Figure 1 (where the prototypes are normalized between each others for better visualization) -the metrics on the radar plots are numbered following the next order: issues, then commits metrics -full history (1-7 on the radar plots), past month (8)(9)(10)(11)(12)(13)(14), past two weeks (15)(16)(17)(18)(19)(20)(21), the latest date (22)(23)(24)(25)(26)(27)(28). Compared to the results generated on random data, the discrepancy for c 1 shows relatively consistent results in terms of cosine distance between the cluster prototypes.…”
Section: Aggregation Of the Resultsmentioning
confidence: 99%
See 1 more Smart Citation
“…We have aggregated the matched cluster prototypes from different repository sets by taking the mean of the matched prototypes for each cluster -the result is presented in Figure 1 (where the prototypes are normalized between each others for better visualization) -the metrics on the radar plots are numbered following the next order: issues, then commits metrics -full history (1-7 on the radar plots), past month (8)(9)(10)(11)(12)(13)(14), past two weeks (15)(16)(17)(18)(19)(20)(21), the latest date (22)(23)(24)(25)(26)(27)(28). Compared to the results generated on random data, the discrepancy for c 1 shows relatively consistent results in terms of cosine distance between the cluster prototypes.…”
Section: Aggregation Of the Resultsmentioning
confidence: 99%
“…Tsoukalas et al ( 2021) [10] divided 27 software projects from the technical debt dataset [11] into six clusters of similar projects with respect to their technical debt aspects using K-means algorithm and built specific technical debt forecasting models for each cluster using regression algorithms. As metrics for clustering, they used effort in minutes to fix code smells, bugs, vulnerability issues and number of lines of code, bugs, smells, as well as cyclomatic complexity.…”
Section: B Clustering Software Repositoriesmentioning
confidence: 99%
“…Finally, regarding our decision in Section 4.1 to filter out classes whose number of past versions is below the threshold of 100, we acknowledge that a greater (or smaller) threshold value would result in a smaller (or greater) number of software classes being considered for the next step of the approach. However, we relied on this value as a "rule of thumb," after performing dedicated experiments within the context of not only the present study but also in our previous related empirical studies, 5,19,42 in order to assess what would be the minimum number of samples that would result in an acceptable forecasting error. We point out that this threshold can also be adapted to specific needs, allowing the user to decide (based on their expertise) what an acceptable time frame is, having in mind however that choosing a very small amount of past history would result in an insufficient amount of data, thus affecting the accuracy of the produced forecasting models.…”
Section: Limitations and Threats To Validitymentioning
confidence: 99%
“…The research on TD identification has also gained momentum with the introduction of this new dataset. [34][35][36][37] Using Technical Debt Dataset, 33 we conducted an exploratory study on 33 open-source Java projects in which 57,528 refactoring activities including 29 different refactoring types were detected at commit level by RMiner tool, 669 fault-inducing commits, and 8538 fault-fixing commits were identified by SZZ, and 37,553 code smells were extracted from the commits by Ptidej. Using the links between all these measures, commits, and files of the software projects, we address the following main research question: "To what extent is refactoring related to code debt indicators (code smells and faults) in the software projects?"…”
Section: Analyzed Relationmentioning
confidence: 99%
“…The data includes commit and file‐based information about code smells, refactorings, all the Jira issues, and the fault‐inducing commits extracted with the SZZ algorithm. The research on TD identification has also gained momentum with the introduction of this new dataset 34–37 . Using Technical Debt Dataset, 33 we conducted an exploratory study on 33 open‐source Java projects in which 57,528 refactoring activities including 29 different refactoring types were detected at commit level by RMiner tool, 669 fault‐inducing commits, and 8538 fault‐fixing commits were identified by SZZ, and 37,553 code smells were extracted from the commits by Ptidej.…”
Section: Introductionmentioning
confidence: 99%