2020
DOI: 10.3390/sym12071147
|View full text |Cite
|
Sign up to set email alerts
|

Impact of Feature Selection Methods on the Predictive Performance of Software Defect Prediction Models: An Extensive Empirical Study

Abstract: Feature selection (FS) is a feasible solution for mitigating high dimensionality problem, and many FS methods have been proposed in the context of software defect prediction (SDP). Moreover, many empirical studies on the impact and effectiveness of FS methods on SDP models often lead to contradictory experimental results and inconsistent findings. These contradictions can be attributed to relative study limitations such as small datasets, limited FS search methods, and unsuitable prediction models in t… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1

Citation Types

2
43
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
6
4

Relationship

5
5

Authors

Journals

citations
Cited by 49 publications
(51 citation statements)
references
References 49 publications
2
43
0
Order By: Relevance
“…Table 3 and Table 4 present the experimental results of spam models with one of the feature selection methods, Information Gain, which is a form of dimensionality reduction technique. This is to further improve the performance of the spam models (ensemble and base classifiers) as feature selection has been known to improve prediction models [48][49][50][51]. The heterogeneous ensemble method still outperforms the baseline classifiers on all performance metrics on both datasets.…”
Section: Resultsmentioning
confidence: 99%
“…Table 3 and Table 4 present the experimental results of spam models with one of the feature selection methods, Information Gain, which is a form of dimensionality reduction technique. This is to further improve the performance of the spam models (ensemble and base classifiers) as feature selection has been known to improve prediction models [48][49][50][51]. The heterogeneous ensemble method still outperforms the baseline classifiers on all performance metrics on both datasets.…”
Section: Resultsmentioning
confidence: 99%
“…In [42], GWO was converted into binary then include two-phase mutation to compute the most informative features. Binary GWO (BGWO) was applied in many areas such as oil and gas [43], software defect problems [44], and the medical domain [45] [46]. A brief review for GWO for feature selection can be found in [46].…”
Section: Introductionmentioning
confidence: 99%
“…Many ML methods in detecting phishing websites have been used and reported with relatively low detection accuracy values and high false-positive rates [23,24]. This can be due to the existence of data quality issues such as class imbalance that have adverse effects on ML method performance [25,26,27]. The dynamism of phishing websites also calls for more sophisticated ML techniques with a high detection rate of phishing and low false-positive rates [28].…”
Section: Introductionmentioning
confidence: 99%