AutoSpearman: Automatically Mitigating Correlated Software Metrics for Interpreting Defect Models

Jiarpakdee, Jirayus; Tantithamthavorn, Chakkrit

doi:10.1109/icsme.2018.00018

Cited by 51 publications

(26 citation statements)

References 76 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…As Table 10 shows, some of our features are likely to be correlated, e.g., LineCountText and LengthText. To mitigate correlated metrics, we used AutoSpearman [35], an automated metric selection approach based on correlation analyses, with a threshold of 0.7.…”

Section: Non-documentation Linksmentioning

confidence: 99%

Contextual Documentation Referencing on Stack Overflow

Baltes

Robillard

2022

IIEEE Trans. Software Eng.

Self Cite

View full text Add to dashboard Cite

Software engineering is knowledge-intensive and requires software developers to continually search for knowledge, often on community question answering platforms such as Stack Overflow. Such information sharing platforms do not exist in isolation, and part of the evidence that they exist in a broader software documentation ecosystem is the common presence of hyperlinks to other documentation resources found in forum posts. With the goal of helping to improve the efficiency of information flow on Stack Overflow, we conducted a study to answer the question of how and why documentation is referenced in Stack Overflow threads. We sampled and classified 759 links from two different domains, regular expressions and Android development, to qualitatively and quantitatively analyze the links' context and purpose. We found that links on Stack Overflow serve a wide range of distinct purposes. This observation has major corollaries, including our observation that links to documentation resources are a reflection of the information needs typical to a technology domain. We contribute a framework and method to and analyze the context and purpose of Stack Overflow links, a public dataset of annotated links, and a description of five major observations about linking practices on Stack Overflow, with detailed links to evidence, implications, and a conceptual framework to capture the relations between the five observations.

show abstract

Section: Non-documentation Linksmentioning

confidence: 99%

Contextual Documentation Referencing on Stack Overflow

Baltes

Robillard

2022

IIEEE Trans. Software Eng.

Self Cite

View full text Add to dashboard Cite

show abstract

“…Prior work points out that software metrics are often correlated [22,32,33,35,36,74,77,85]. However, little is known the prevalence of correlated metrics in the publiclyavailable defect datasets.…”

Section: Correlated Metrics and Concerns In The Literaturementioning

confidence: 99%

“…Metrics of prior studies are often correlated [22,32,33,35,36,74,77,85]. For example, Herraiz et al [33], and Gil et al [22] point out that code complexity (CC) is often correlated with code size (size).…”

Section: Introductionmentioning

confidence: 99%

The Impact of Correlated Metrics on the Interpretation of Defect Models

Jiarpakdee

Tantithamthavorn

Hassan

2021

IIEEE Trans. Software Eng.

Self Cite

View full text Add to dashboard Cite

Defect models are analytical models for building empirical theories related to software quality. Prior studies often derive knowledge from such models using interpretation techniques, e.g., ANOVA Type-I. Recent work raises concerns that correlated metrics may impact the interpretation of defect models. Yet, the impact of correlated metrics in such models has not been investigated. In this paper, we investigate the impact of correlated metrics on the interpretation of defect models and the improvement of the interpretation of defect models when removing correlated metrics. Through a case study of 14 publicly-available defect datasets, we find that (1) correlated metrics have the largest impact on the consistency, the level of discrepancy, and the direction of the ranking of metrics, especially for ANOVA techniques. On the other hand, we find that removing all correlated metrics (2) improves the consistency of the produced rankings regardless of the ordering of metrics (except for ANOVA Type-I); (3) improves the consistency of ranking of metrics among the studied interpretation techniques; (4) impacts the model performance by less than 5 percentage points. Thus, when one wishes to derive sound interpretation from defect models, one must (1) mitigate correlated metrics especially for ANOVA analyses; and (2) avoid using ANOVA Type-I even if all correlated metrics are removed.

show abstract

“…Among the three main feature selection methods, filter methods are preferred to wrapper and embedded methods in applications where the computational efficiency, classifier independence, simplicity, ease of use and the stability of the results are required. Therefore, filter feature selection remains an interesting topic in many recent research areas such as biomarker identification for cancer prediction and drugs discovery, text classification and predicting defective software [3][4][5]10,11,16,18] and has growing interest in big data applications [19]; according to the Google Scholar search results, the number of research papers published related to filter methods in year 2018 is ∼1,800 of which ∼170 are in gene selection area.…”

Section: Introductionmentioning

confidence: 99%

“…The nearby pixels in images can be grouped together based on their spatial locality to improve selection of pixels for image classification. In software data, software metrics can be grouped according to their granularity in the code to improve the prediction of defective software [11,18]. In Sect.…”

Section: Introductionmentioning

confidence: 99%

A Framework for Feature Selection to Exploit Feature Group Structures

Perera

Chan

Karunasekera

2020

Advances in Knowledge Discovery and Data Mining

View full text Add to dashboard Cite

Filter feature selection methods play an important role in machine learning tasks when low computational costs, classifier independence or simplicity is important. Existing filter methods predominantly focus only on the input data and do not take advantage of the external sources of correlations within feature groups to improve the classification accuracy. We propose a framework which facilitates supervised filter feature selection methods to exploit feature group information from external sources of knowledge and use this framework to incorporate feature group information into minimum Redundancy Maximum Relevance (mRMR) algorithm, resulting in GroupMRMR algorithm. We show that GroupMRMR achieves high accuracy gains over mRMR (up to ∼35%) and other popular filter methods (up to ∼50%). GroupMRMR has same computational complexity as that of mRMR, therefore, does not incur additional computational costs. Proposed method has many real world applications, particularly the ones that use genomic, text and image data whose features demonstrate strong group structures.

show abstract

AutoSpearman: Automatically Mitigating Correlated Software Metrics for Interpreting Defect Models

Cited by 51 publications

References 76 publications

Contextual Documentation Referencing on Stack Overflow

Contextual Documentation Referencing on Stack Overflow

The Impact of Correlated Metrics on the Interpretation of Defect Models

A Framework for Feature Selection to Exploit Feature Group Structures

Contact Info

Product

Resources

About