A Clustering Approach Towards Cross-Project Technical Debt Forecasting

Tsoukalas, Dimitrios; Mathioudaki, Maria; Siavvas, Miltiadis; Kehagias, Dionysios; Chatzigeorgiou, Alexander

doi:10.1007/s42979-020-00408-4

Cited by 12 publications

(18 citation statements)

References 65 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We have aggregated the matched cluster prototypes from different repository sets by taking the mean of the matched prototypes for each cluster -the result is presented in Figure 1 (where the prototypes are normalized between each others for better visualization) -the metrics on the radar plots are numbered following the next order: issues, then commits metrics -full history (1-7 on the radar plots), past month (8)(9)(10)(11)(12)(13)(14), past two weeks (15)(16)(17)(18)(19)(20)(21), the latest date (22)(23)(24)(25)(26)(27)(28). Compared to the results generated on random data, the discrepancy for c 1 shows relatively consistent results in terms of cosine distance between the cluster prototypes.…”

Section: Aggregation Of the Resultsmentioning

confidence: 99%

“…Tsoukalas et al ( 2021) [10] divided 27 software projects from the technical debt dataset [11] into six clusters of similar projects with respect to their technical debt aspects using K-means algorithm and built specific technical debt forecasting models for each cluster using regression algorithms. As metrics for clustering, they used effort in minutes to fix code smells, bugs, vulnerability issues and number of lines of code, bugs, smells, as well as cyclomatic complexity.…”

Section: B Clustering Software Repositoriesmentioning

confidence: 99%

See 1 more Smart Citation

Qualitative Clustering of Software Repositories Based on Software Metrics

Bugayenko¹,

Daniakin

Farina

et al. 2023

IEEE Access

View full text Add to dashboard Cite

Software repositories contain a wealth of information about the aspects related to software development process. For this reason, many studies analyze software repositories using methods of data analytics with a focus on clustering. Software repository clustering has been applied in studying software ecosystems such as GitHub, defect and technical debt prediction, software remodularization. Although some interesting insights have been reported, the considered studies exhibited some limitations. The limitations are associated with the use of individual clustering methods and manifesting in the shortcomings of the obtained results. In this study, to alleviate the existing limitations we engage multiple cluster validity indices applied to multiple clustering methods and carry out consensus clustering. To our knowledge, this study is the first to apply the consensus clustering approach to analyze software repositories and one of the few to apply the consensus clustering to software metrics. Intensive experimental studies are reported for software repository metrics data consisting of a number of software repositories each described by software metrics. We revealed seven clusters of software repositories and relate them to developers' activity. It is advocated that the proposed clustering environment could be useful for facilitating the decision making process for business investors and open-source community with the help of the Gartner's hype cycle.

show abstract

Section: Aggregation Of the Resultsmentioning

confidence: 99%

Section: B Clustering Software Repositoriesmentioning

confidence: 99%

Qualitative Clustering of Software Repositories Based on Software Metrics

Bugayenko¹,

Daniakin

Farina

et al. 2023

IEEE Access

View full text Add to dashboard Cite

show abstract

“…Finally, regarding our decision in Section 4.1 to filter out classes whose number of past versions is below the threshold of 100, we acknowledge that a greater (or smaller) threshold value would result in a smaller (or greater) number of software classes being considered for the next step of the approach. However, we relied on this value as a "rule of thumb," after performing dedicated experiments within the context of not only the present study but also in our previous related empirical studies, 5,19,42 in order to assess what would be the minimum number of samples that would result in an acceptable forecasting error. We point out that this threshold can also be adapted to specific needs, allowing the user to decide (based on their expertise) what an acceptable time frame is, having in mind however that choosing a very small amount of past history would result in an insufficient amount of data, thus affecting the accuracy of the produced forecasting models.…”

Section: Limitations and Threats To Validitymentioning

confidence: 99%

A practical approach for technical debt prioritization based on class‐level forecasting

Tsoukalas

Siavvas

Kehagias

et al. 2023

J Software Evolu Process

Self Cite

View full text Add to dashboard Cite

Monitoring technical debt (TD) is considered highly important for software companies, as it provides valuable information on the effort required to repay TD and in turn maintain the system. When it comes to TD repayment, however, developers are often overwhelmed with a large volume of TD liabilities that they need to fix, rendering the procedure effort demanding. Hence, prioritizing TD liabilities is of utmost importance for effective TD repayment. Existing approaches rely on the current TD state of the system; however, prioritization would be more efficient by also considering its future evolution. To this end, the present work proposes a practical approach for prioritization of TD liabilities by incorporating information retrieved from TD forecasting techniques, emphasizing on the class‐level granularity to provide highly actionable results. Specifically, the proposed approach considers the change proneness and forecasted TD evolution of software artifacts and combines it with proper visualization techniques, to enable the early identification of classes that are more likely to become unmaintainable. To demonstrate and evaluate the approach, an empirical study is conducted on six real‐world applications. The proposed approach is expected to facilitate developers better plan refactoring activities, in order to manage TD promptly and avoid unforeseen situations long term.

show abstract

“…The research on TD identification has also gained momentum with the introduction of this new dataset. [34][35][36][37] Using Technical Debt Dataset, 33 we conducted an exploratory study on 33 open-source Java projects in which 57,528 refactoring activities including 29 different refactoring types were detected at commit level by RMiner tool, 669 fault-inducing commits, and 8538 fault-fixing commits were identified by SZZ, and 37,553 code smells were extracted from the commits by Ptidej. Using the links between all these measures, commits, and files of the software projects, we address the following main research question: "To what extent is refactoring related to code debt indicators (code smells and faults) in the software projects?"…”

Section: Analyzed Relationmentioning

confidence: 99%

“…The data includes commit and file‐based information about code smells, refactorings, all the Jira issues, and the fault‐inducing commits extracted with the SZZ algorithm. The research on TD identification has also gained momentum with the introduction of this new dataset 34–37 . Using Technical Debt Dataset, 33 we conducted an exploratory study on 33 open‐source Java projects in which 57,528 refactoring activities including 29 different refactoring types were detected at commit level by RMiner tool, 669 fault‐inducing commits, and 8538 fault‐fixing commits were identified by SZZ, and 37,553 code smells were extracted from the commits by Ptidej.…”

Section: Introductionmentioning

confidence: 99%

Exploring the relationship between refactoring and code debt indicators

Halepmollasi

Tosun

2022

J Software Evolu Process

View full text Add to dashboard Cite

Refactoring, which aims to improve the internal structure of the software systems preserving their behavior, is the most common payment strategy for technical debt (TD) by removing the code smells. There exist many studies presenting code smell detection approaches/tools or investigating their impact on quality attributes. There are also studies that focus on refactoring techniques, their relation with quality attributes, tool supports, and opportunities for them. Although there are several studies addressing the gap between refactoring and TD indicators, the empirical evidence provided is still limited. In this study, we examine the distribution of 29 refactoring types among the different projects and their relation with code smells or faults. We explore the refactoring types that are most commonly performed together and other activities performed with refactorings. We conduct a large exploratory study with automatically detected 57,528 refactorings, 37,553 smells, 27,340 faults, and 134,812 commits of 33 Java projects. Results show that some refactoring types are more commonly applied by developers. Our analysis indicates that refactorings usually remove or do not affect the code smells, and this contradicts with the previous studies. Also, the commits in which refactoring(s) is performed are three times more fault inducing than those without refactoring.

show abstract

A Clustering Approach Towards Cross-Project Technical Debt Forecasting

Cited by 12 publications

References 65 publications

Qualitative Clustering of Software Repositories Based on Software Metrics

Qualitative Clustering of Software Repositories Based on Software Metrics

A practical approach for technical debt prioritization based on class‐level forecasting

Exploring the relationship between refactoring and code debt indicators

Contact Info

Product

Resources

About