Impact of Feature Selection Methods on the Predictive Performance of Software Defect Prediction Models: An Extensive Empirical Study

Balogun, Abdullateef Oluwagbemiga; Basri, Shuib; Mahamad, Saipunidzam; Abdulkadir, Said Jadid; Almomani, Malek Ahmad; Adeyemo, Victor Elijah; Al-Tashi, Qasem; Mojeed, Hammed A.; Imam, Abdullahi Abubakar; Bajeh, Amos Orenyi

doi:10.3390/sym12071147

Cited by 49 publications

(51 citation statements)

References 49 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Table 3 and Table 4 present the experimental results of spam models with one of the feature selection methods, Information Gain, which is a form of dimensionality reduction technique. This is to further improve the performance of the spam models (ensemble and base classifiers) as feature selection has been known to improve prediction models [48][49][50][51]. The heterogeneous ensemble method still outperforms the baseline classifiers on all performance metrics on both datasets.…”

Section: Resultsmentioning

confidence: 99%

Heterogeneous Ensemble with Combined Dimensionality Reduction for Social Spam Detection

Oladepo

Bajeh

Balogun

et al. 2021

Int. J. Interact. Mob. Technol.

Self Cite

View full text Add to dashboard Cite

This study presents a novel framework based on a heterogeneous ensemble method and a hybrid dimensionality reduction technique for spam detection in micro-blogging social networks. A hybrid of Information Gain (IG) and Principal Component Analysis (PCA) (dimensionality reduction) was implemented for the selection of important features and a heterogeneous ensemble consisting of Naïve Bayes (NB), K Nearest Neighbor (KNN), Logistic Regression (LR) and Repeated Incremental Pruning to Produce Error Reduction (RIPPER) classifiers based on Average of Probabilities (AOP) was used for spam detection. The proposed framework was applied on MPI_SWS and SAC’13 Tip spam datasets and the developed models were evaluated based on accuracy, precision, recall, f-measure, and area under the curve (AUC). From the experimental results, the proposed framework (that is, Ensemble + IG + PCA) outperformed other experimented methods on studied spam datasets. Specifically, the proposed method had an average accuracy value of 87.5%, an average precision score of 0.877, an average recall value of 0.845, an average F-measure value of 0.872 and an average AUC value of 0.943. Also, the proposed method had better performance than some existing methods. Consequently, this study has shown that addressing high dimensionality in spam datasets, in this case, a hybrid of IG and PCA with a heterogeneous ensemble method can produce a more effective method for detecting spam contents.

show abstract

Section: Resultsmentioning

confidence: 99%

Heterogeneous Ensemble with Combined Dimensionality Reduction for Social Spam Detection

Oladepo

Bajeh

Balogun

et al. 2021

Int. J. Interact. Mob. Technol.

Self Cite

View full text Add to dashboard Cite

show abstract

“…In [42], GWO was converted into binary then include two-phase mutation to compute the most informative features. Binary GWO (BGWO) was applied in many areas such as oil and gas [43], software defect problems [44], and the medical domain [45] [46]. A brief review for GWO for feature selection can be found in [46].…”

Section: Introductionmentioning

confidence: 99%

Hybrid Binary Grey Wolf With Harris Hawks Optimizer for Feature Selection

et al. 2021

Self Cite

View full text Add to dashboard Cite

Despite Grey Wolf Optimizer's (GWO) superior performance in many areas, stagnation in local optima areas may still be a concern. Several significant GWO factors can be explored to enhance the performance of selection in classification, with two conflicting concepts to be considered in using or modeling a metaheuristic method, exploring a search field, and exploiting optimal solutions. Balancing exploration and exploitation in a good manner will improve the search algorithm's performance. To achieve a good balance, this paper proposes a binary hybrid GWO and Harris Hawks Optimization (HHO) to form a memetic approach called HBGWOHHO. The sigmoid transfer function is used to transfer the continuous search space into a binary one to meet the feature selection nature requirement. A wrapper-based k-Nearest neighbor is used to evaluate the goodness of the selected features. To validate the performance of the proposed method, 18 standard UCI benchmark datasets were used. The performance of the proposed hybrid method was compared with Binary Grey Wolf Optimizer (BGWO), Binary Particle Swarm Optimization (BPSO), Binary Harris Hawks Optimizer (BHHO), Binary Genetic Algorithm (BGA) and Binary Hybrid BWOPSO. The findings revealed that the proposed method was effective in improving the performance of the BGWO algorithm. The proposed hybrid method outperforms the BGWO algorithm in terms of accuracy, selected feature size, and computational time. Similarly, compared with BPSO and BGA feature selection algorithms, the proposed HBGWOHHO surpassed them yield better accuracy, the smaller size of selected features in much lower computational time.

show abstract

“…Many ML methods in detecting phishing websites have been used and reported with relatively low detection accuracy values and high false-positive rates [23,24]. This can be due to the existence of data quality issues such as class imbalance that have adverse effects on ML method performance [25,26,27]. The dynamism of phishing websites also calls for more sophisticated ML techniques with a high detection rate of phishing and low false-positive rates [28].…”

Section: Introductionmentioning

confidence: 99%

Improving the phishing website detection using empirical analysis of Function Tree and its variants

Balogun

Adewole

Raheem

et al. 2021

Heliyon

Self Cite

View full text Add to dashboard Cite

The phishing attack is one of the most complex threats that have put internet users and legitimate web resource owners at risk. The recent rise in the number of phishing attacks has instilled distrust in legitimate internet users, making them feel less safe even in the presence of powerful antivirus apps. Reports of a rise in financial damages as a result of phishing website attacks have caused grave concern. Several methods, including blacklists and machine learning-based models, have been proposed to combat phishing website attacks. The blacklist antiphishing method has been faulted for failure to detect new phishing URLs due to its reliance on compiled blacklisted phishing URLs. Many ML methods for detecting phishing websites have been reported with relatively low detection accuracy and high false alarm. Hence, this research proposed a Functional Tree (FT) based metalearning models for detecting phishing websites. That is, this study investigated improving the phishing website detection using empirical analysis of FT and its variants. The proposed models outperformed baseline classifiers, meta-learners and hybrid models that are used for phishing websites detection in existing studies. Besides, the proposed FT based meta-learners are effective for detecting legitimate and phishing websites with accuracy as high as 98.51% and a false positive rate as low as 0.015. Hence, the deployment and adoption of FT and its metalearner variants for phishing website detection and applicable cybersecurity attacks are recommended.

show abstract

Impact of Feature Selection Methods on the Predictive Performance of Software Defect Prediction Models: An Extensive Empirical Study

Cited by 49 publications

References 49 publications

Heterogeneous Ensemble with Combined Dimensionality Reduction for Social Spam Detection

Heterogeneous Ensemble with Combined Dimensionality Reduction for Social Spam Detection

Hybrid Binary Grey Wolf With Harris Hawks Optimizer for Feature Selection

Improving the phishing website detection using empirical analysis of Function Tree and its variants

Contact Info

Product

Resources

About