2020
DOI: 10.1007/s11219-020-09520-3
|View full text |Cite
|
Sign up to set email alerts
|

Predicting technical debt from commit contents: reproduction and extension with automated feature selection

Abstract: Self-admitted technical debt refers to sub-optimal development solutions that are expressed in written code comments or commits. We reproduce and improve on a prior work by Yan et al. (2018) on detecting commits that introduce self-admitted technical debt. We use multiple natural language processing methods: Bag-of-Words, topic modeling, and word embedding vectors. We study 5 open-source projects. Our NLP approach uses logistic Lasso regression from Glmnet to automatically select best predictor words. A manual… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1

Citation Types

0
2
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
5
1

Relationship

0
6

Authors

Journals

citations
Cited by 9 publications
(3 citation statements)
references
References 55 publications
0
2
0
Order By: Relevance
“…One of the threats to construct validity in the study concerns the potentially different interpretations of discussed topics between interviewees and researchers. Because we focus on SATD in this study and most Code Comments [6], [7], [12], [14], [15], [38], [39], [40], [41], [42], [43], [44], [45], [46], [47], [48], [49], [50], [51], [52], [53], [54], [55], [56], [57], [58], [59], [60], [61], [62], [63], [64], [65], [66] Issue Trackers [3], [12], [16] Commit Messages [12] Pull Requests [12] Automated Differentiation Between Fixed and Unfixed SATD -Automated Tracing Between SATD in Different Sources [11], [12], [36], [37] and Code and Related Development Tasks -Automated SATD Prioritization [9], [67], …”
Section: Threats To Validity 61 Construct Validitymentioning
confidence: 99%
“…One of the threats to construct validity in the study concerns the potentially different interpretations of discussed topics between interviewees and researchers. Because we focus on SATD in this study and most Code Comments [6], [7], [12], [14], [15], [38], [39], [40], [41], [42], [43], [44], [45], [46], [47], [48], [49], [50], [51], [52], [53], [54], [55], [56], [57], [58], [59], [60], [61], [62], [63], [64], [65], [66] Issue Trackers [3], [12], [16] Commit Messages [12] Pull Requests [12] Automated Differentiation Between Fixed and Unfixed SATD -Automated Tracing Between SATD in Different Sources [11], [12], [36], [37] and Code and Related Development Tasks -Automated SATD Prioritization [9], [67], …”
Section: Threats To Validity 61 Construct Validitymentioning
confidence: 99%
“…For the commit messages, we used the same dataset that was used in the study described in [18]. This dataset consists of 73,625 messages, of which 1,876 are classified as SATD.…”
Section: Commits Messagesmentioning
confidence: 99%
“…Rantala and Mäntylä [18] replicating and extending the work introduced by Yan et al [16], they used 1876 commits messages extracted from five repositories (Camel, Log4J, Hadoop, Gerrit, and Tomcat) that were pre-labeled as SATD, and three techniques of NLP (bag-of-words, latent Dirichl et al location, and word embedding), to predict self-admitted technical debt from commit messages. The main contribution of this study, the bag-of-words technique, is the best performance with a median (AUC 0.7411).…”
mentioning
confidence: 99%