2017 IEEE/ACM 14th International Conference on Mining Software Repositories (MSR) 2017
DOI: 10.1109/msr.2017.18
|View full text |Cite
|
Sign up to set email alerts
|

A Large-Scale Study of the Impact of Feature Selection Techniques on Defect Classification Models

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

5
103
1

Year Published

2018
2018
2023
2023

Publication Types

Select...
4
2
1

Relationship

0
7

Authors

Journals

citations
Cited by 97 publications
(117 citation statements)
references
References 48 publications
5
103
1
Order By: Relevance
“…We also observe that wrapper-based feature selection techniques have the least impact on model performance for both classification techniques. This finding is consistent with Ghotra et al [24] who find that the performance of defect models is impacted by at most 2%pts for the AUC measure when applying wrapperbased feature selection techniques. We observe that filterbased feature selection techniques (except for consistency-based feature selection techniques) have the highest impact on the performance of defect models, since filter-based techniques tend to overly remove metrics that share a strong relationship with the outcome.…”
Section: An Evaluation Of Autospearmansupporting
confidence: 92%
See 2 more Smart Citations
“…We also observe that wrapper-based feature selection techniques have the least impact on model performance for both classification techniques. This finding is consistent with Ghotra et al [24] who find that the performance of defect models is impacted by at most 2%pts for the AUC measure when applying wrapperbased feature selection techniques. We observe that filterbased feature selection techniques (except for consistency-based feature selection techniques) have the highest impact on the performance of defect models, since filter-based techniques tend to overly remove metrics that share a strong relationship with the outcome.…”
Section: An Evaluation Of Autospearmansupporting
confidence: 92%
“…Since it is impractical to study all of these techniques, we would like to select a manageable set of feature selection techniques for our study. Similar to Ghotra et al [24], we select two commonly-used families of feature selection techniques, i.e., filter-based feature selection techniques and wrapper-based feature selection techniques. Thus, embedded-based feature selection techniques are excluded from our analysis, as they are rarely explored in software engineering.…”
Section: Stepwise Regression (Both Directions)mentioning
confidence: 99%
See 1 more Smart Citation
“…The most discriminator terms could provide a deep insight into each feature. Recently, a large‐scale study conducted on 30 feature selection techniques and apply on 18 data sets. They found that correlation based ranking search technique perform well to select the important features.…”
Section: Methodsmentioning
confidence: 99%
“…We use principal component analysis (PCA) [8] to account for multi-collinearity amongst features [8], and has been extensively used in the domain of defect prediction [41] [42]. PCA creates independent linear combinations of the features that account for most of the co-variation of the features.…”
Section: Principal Component Analysismentioning
confidence: 99%