Background.
Self-admitted technical debt (SATD) is a special kind of technical debt that is intentionally introduced and remarked by code comments. Those technical debts reduce the quality of software and increase the cost of subsequent software maintenance. Therefore, it is necessary to find out and resolve these debts in time. Recently, many automatic approaches have been proposed to identify SATD.
Problem.
Popular IDEs support a number of predefined task annotation tags for indicating SATD in comments, which have been used in many projects. However, such clear prior knowledge is neglected by existing SATD identification approaches when identifying SATD.
Objective.
We aim to investigate how far we have really progressed in the field of SATD identification by comparing existing approaches with a simple approach that leverages the predefined task tags to identify SATD.
Method.
We first propose a simple heuristic approach that fuzzily Matches task Annotation Tags (
MAT
) in comments to identify SATD. In nature,
MAT
is an unsupervised approach, which does not need any data to train a prediction model and has a good understandability. Then, we examine the real progress in SATD identification by comparing
MAT
against existing approaches.
Result.
The experimental results reveal that: (1)
MAT
has a similar or even superior performance for SATD identification compared with existing approaches, regardless of whether non-effort-aware or effort-aware evaluation indicators are considered; (2) the SATDs (or non-SATDs) correctly identified by existing approaches are highly overlapped with those identified by
MAT
; and (3) supervised approaches misclassify many SATDs marked with task tags as non-SATDs, which can be easily corrected by their combinations with
MAT
.
Conclusion.
It appears that the problem of SATD identification has been (unintentionally) complicated by our community, i.e., the real progress in SATD comments identification is not being achieved as it might have been envisaged. We hence suggest that, when many task tags are used in the comments of a target project, future SATD identification studies should use
MAT
as an easy-to-implement baseline to demonstrate the usefulness of any newly proposed approach.
Many studies have explored the methods of deriving thresholds of object-oriented (i.e. OO) metrics. Unsupervised methods are mainly based on the distributions of metric values, while supervised methods principally rest on the relationships between metric values and defect-proneness of classes. The objective of this study is to empirically examine whether there are effective threshold values of OO metrics by analyzing existing threshold derivation methods with a large-scale meta-analysis. Based on five representative threshold derivation methods (i.e. VARL, ROC, BPP, MFM, and MGM) and 3268 releases from 65 Java projects, we first employ statistical meta-analysis and sensitivity analysis techniques to derive thresholds for 62 OO metrics on the training data. Then, we investigate the predictive performance of five candidate thresholds for each metric on the validation data to explore which of these candidate thresholds can be served as the threshold. Finally, we evaluate their predictive performance on the test data. The experimental results show that 26 of 62 metrics have the threshold effect and the derived thresholds by meta-analysis achieve promising results of GM values and significantly outperform almost all five representative (baseline) thresholds.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.