Using Natural Language Processing to Automatically Detect Self-Admitted Technical Debt

Maldonado, Everton da S.; Shihab, Emad; Tsantalis, Nikolaos

doi:10.1109/tse.2017.2654244

Cited by 147 publications

(125 citation statements)

References 47 publications

Supporting

Mentioning

120

Contrasting

Order By: Relevance

“…Besides a simple pattern-matching of keywords in comments [10], [40], different approaches for detecting SATDrelated comments have been proposed in the literature. Specifically, Maldonado et al [11] used a Natural Language Processing approach to classify SATD. Also, Ren et al [41] proposed the use of CNN to classify SATD, outperforming previously-proposed approaches.…”

Section: A Self-admitted Technical Debt (Satd) and Its Removalmentioning

confidence: 99%

Automatically Learning Patterns for Self-Admitted Technical Debt Removal

Zampetti

Serebrenik

Penta

2020

2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER)

View full text Add to dashboard Cite

DOI to the publisher's website. • The final author version and the galley proof are versions of the publication after peer review. • The final published version features the final layout of the paper including the volume, issue and page numbers. Link to publication General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain • You may freely distribute the URL identifying the publication in the public portal. If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the "Taverne" license above, please follow below link for the End User Agreement:

show abstract

Section: A Self-admitted Technical Debt (Satd) and Its Removalmentioning

confidence: 99%

Automatically Learning Patterns for Self-Admitted Technical Debt Removal

Zampetti

Serebrenik

Penta

2020

2020 IEEE 27th International Conference on Software Analysis, Evolution and Reengineering (SANER)

View full text Add to dashboard Cite

show abstract

“…Since our dataset is unbalanced (i.e., only a small percentage of commits are CI skipped), we would like to put our results in context by comparing it to a baseline that takes this imbalanced data into account. Similar to prior work [9], [32], we calculate the performance of the baseline model as follows: the precision of this baseline model is calculated by taking the total number of CI skip commits over the total number of commits of each project. For example, project jMotif-GI has a total number of 345 commits, of those, only 42 commits are commits that are explicitly labeled as CI skip commits.…”

Section: Rq1: How Effective Is Our Rule-based Technique In Detecting mentioning

confidence: 99%

“…In some cases, the list of file types we use may not be comprehensive. We also provide a list of all the file extensions that are used in our study 9 .…”

Section: Internal Validitymentioning

confidence: 99%

Which Commits Can Be CI Skipped?

Mujahid

Shihab

Rilling

2021

IIEEE Trans. Software Eng.

Self Cite

View full text Add to dashboard Cite

Continuous Integration (CI) frameworks such as Travis CI, automatically build and run tests whenever a new commit is submitted/pushed. Although there are many advantages in using CI, e.g., speeding up the release cycle and automating the test execution process, it has been noted that the CI process can take a very long time to complete. One of the possible reasons for such delays is the fact that some commits (e.g., changes to readme files) unnecessarily kick off the CI process. Therefore, the goal of this paper is to automate the process of determining which commits can be CI skipped. We start by examining the commits of 58 Java projects and identify commits that were explicitly CI skipped by developers. Based on the manual investigation of 1,813 explicitly CI skipped commits, we first devise an initial model of a CI skipped commit and use this model to propose a rule-based technique that automatically identifies commits that should be CI skipped. To evaluate the rule-based technique, we perform a study on unseen datasets extracted from ten projects and show that the devised rule-based technique is able to detect and label CI skip commits, achieving Areas Under the Curve (AUC) values between 0.56 and 0.98 (average of 0.73). Additionally, we show that, on average, our technique can reduce the number of commits that need to trigger the CI process by 18.16%. We also qualitatively triangulated our analysis on the importance of skipping the CI process through a survey with 40 developers. The survey results showed that 75% of the surveyed developers consider it to be nice, important or very important to have a technique that automatically flags CI skip commits. To operationalize our technique, we develop a publicly available prototype tool, called CI-SKIPPER, that can be integrated with any git repository and automatically mark commits that can be CI skipped.

show abstract

“…In [10], customers accesses to businesses URLs are analyzed using a word2vec-based method to propose better services to customers. Finally, NLP is also used to detect design and requirement debts [13] from comments of ten open source projects.…”

Section: Related Workmentioning

confidence: 99%