Identifying self-admitted technical debt in open source projects using text mining

Huang, Qiao; Shihab, Emad; Xia, Xin; Lo, David; Li, Shanping

doi:10.1007/s10664-017-9522-4

Cited by 158 publications

(166 citation statements)

References 41 publications

Supporting

Mentioning

166

Contrasting

Order By: Relevance

“…Using these makes the results more generic when building a vocabulary from several sources. Here, we discuss our findings and compare our results to other previous works, where the one we chose as our baseline had selected predictor terms manually for analyzing commit messages (Yan et al 2018), and others looked into predictors built from source code comments (Huang et al 2018;Potdar and Shihab 2014). Finally, we look into possible threats to the validity of our work.…”

Section: Rq4: How Well Does the Best Model Perform In Cross-project Tmentioning

confidence: 67%

“…This practice has also been employed in the industry to find technical debt (Laitila 2019;SonarQube 2019). Looking at the predictors in Huang et al (2018), we can see both similarities and differences between our work. Here, the authors have identified different features from source code comments, which are all single stemmed terms.…”

Section: Comparing the Predictor Terms With Source Code Level Predictmentioning

confidence: 69%

“…Finding predictors by analyzing messages at source code level has been looked into at earlier research (Huang et al 2018;Potdar and Shihab 2014). This practice has also been employed in the industry to find technical debt (Laitila 2019;SonarQube 2019).…”

Section: Comparing the Predictor Terms With Source Code Level Predictmentioning

confidence: 99%

“…Previous studies have researched SATD by analyzing commit metrics, like lines of code added and number of files changed (Yan et al 2018), and source code comments where the debt is admitted (Huang et al 2018). Previous work has so far left commit messages largely out of the picture, when predicting self-admitted technical debt appearance.…”

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations

Predicting technical debt from commit contents: reproduction and extension with automated feature selection

Rantala

Mäntylä

2020

Software Qual J

View full text Add to dashboard Cite

Self-admitted technical debt refers to sub-optimal development solutions that are expressed in written code comments or commits. We reproduce and improve on a prior work by Yan et al. (2018) on detecting commits that introduce self-admitted technical debt. We use multiple natural language processing methods: Bag-of-Words, topic modeling, and word embedding vectors. We study 5 open-source projects. Our NLP approach uses logistic Lasso regression from Glmnet to automatically select best predictor words. A manually labeled dataset from prior work that identified self-admitted technical debt from code level commits serves as ground truth. Our approach achieves + 0.15 better area under the ROC curve performance than a prior work, when comparing only commit message features, and + 0.03 better result overall when replacing manually selected features with automatically selected words. In both cases, the improvement was statistically significant (p < 0.0001). Our work has four main contributions, which are comparing different NLP techniques for SATD detection, improved results over previous work, showing how to generate generalizable predictor words when using multiple repositories, and producing a list of words correlating with SATD. As a concrete result, we release a list of the predictor words that correlate positively with SATD, as well as our used datasets and scripts to enable replication studies and to aid in the creation of future classifiers.

show abstract

Section: Rq4: How Well Does the Best Model Perform In Cross-project Tmentioning

confidence: 67%

Section: Comparing the Predictor Terms With Source Code Level Predictmentioning

confidence: 69%

Section: Comparing the Predictor Terms With Source Code Level Predictmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

Predicting technical debt from commit contents: reproduction and extension with automated feature selection

Rantala

Mäntylä

2020

Software Qual J

View full text Add to dashboard Cite

show abstract

“…A growing community holds that software quality practices to improve systems' sustainability (e.g., refactoring) is ultimately a business decision [10]. Even in the domain of open source software there is a trend into exploiting the concept of technical debt as intentional, hence strategic: see for instance the study on self-admitted technocal debt found in [7].…”

Section: Introductionmentioning

confidence: 99%

The Strategic Technical Debt Management Model: An Empirical Proposal

Ciancarini¹,

Russo²

2020

IFIP Advances in Information and Communication Technology

View full text Add to dashboard Cite

Increasing development complexity in software applications raises major concerns about technical debt management, also in Open Source environments. A strategic management perspective provides organizations with an action map to pursue business' targets with limited resources. This article presents the Strategic Technical Debt Management Model (STDMM) to provide practitioners with an actionable roadmap to manage their technical debt properly, considering both social and technical aspects. To do so, we pursued a theoretical mapping, exploiting a set of interviews of 124 carefully selected and well-informed domain experts of the IT financial sector.

show abstract

Self‐admitted technical debt detection by learning its comprehensive semantics via graph neural networks

Liu

et al. 2022

Softw Pract Exp

View full text Add to dashboard Cite

The goal of software development is to deliver software products with high quality and free from defects, but resource and time constraints often cause the developers to submit incomplete or temporary patches of codes and further bear the additional burden. Therefore, the investigations on identifying self‐admitted technical debt (SATD) to improve code quality have been conducted in recent years. However, missing syntactic structure information and the imbalance distribution bias shorten the SATD identification performance. Addressing to this issue, we present a graph neural network based SATD identification model (GNNSI) to improve the performance. Specifically, we obtain the structure information of the missing SATD in a compositional way to obtain different feature maps for different comments, and use focal loss to handle the imbalance between SATD and non‐SATD classes in the comments. Then extensive experiments on 10 open source projects are conducted, and the results show that GNNSI outperforms the baselines and can help developers to better predict SATDs.

show abstract

Identifying self-admitted technical debt in open source projects using text mining

Cited by 158 publications

References 41 publications

Predicting technical debt from commit contents: reproduction and extension with automated feature selection

Predicting technical debt from commit contents: reproduction and extension with automated feature selection

The Strategic Technical Debt Management Model: An Empirical Proposal

Self‐admitted technical debt detection by learning its comprehensive semantics via graph neural networks

Contact Info

Product

Resources

About