Victor Elijah Adeyemo scite author profile

Feature selection (FS) is a feasible solution for mitigating high dimensionality problem, and many FS methods have been proposed in the context of software defect prediction (SDP). Moreover, many empirical studies on the impact and effectiveness of FS methods on SDP models often lead to contradictory experimental results and inconsistent findings. These contradictions can be attributed to relative study limitations such as small datasets, limited FS search methods, and unsuitable prediction models in the respective scope of studies. It is hence critical to conduct an extensive empirical study to address these contradictions to guide researchers and buttress the scientific tenacity of experimental conclusions. In this study, we investigated the impact of 46 FS methods using Naïve Bayes and Decision Tree classifiers over 25 software defect datasets from 4 software repositories (NASA, PROMISE, ReLink, and AEEEM). The ensuing prediction models were evaluated based on accuracy and AUC values. Scott–KnottESD and the novel Double Scott–KnottESD rank statistical methods were used for statistical ranking of the studied FS methods. The experimental results showed that there is no one best FS method as their respective performances depends on the choice of classifiers, performance evaluation metrics, and dataset. However, we recommend the use of statistical-based, probability-based, and classifier-based filter feature ranking (FFR) methods, respectively, in SDP. For filter subset selection (FSS) methods, correlation-based feature selection (CFS) with metaheuristic search methods is recommended. For wrapper feature selection (WFS) methods, the IWSS-based WFS method is recommended as it outperforms the conventional SFS and LHS-based WFS methods.

show abstract

AI Meta-Learners and Extra-Trees Algorithm for the Detection of Phishing Websites

Alsariera

Adeyemo

Balogun

et al. 2020

IEEE Access

View full text Add to dashboard Cite

Phishing is a type of social web-engineering attack in cyberspace where criminals steal valuable data or information from insensitive or uninformed users of the internet. Existing countermeasures in the form of anti-phishing software and computational methods for detecting phishing activities have proven to be effective. However, new methods are deployed by hackers to thwart these countermeasures. Due to the evolving nature of phishing attacks, the need for novel and efficient countermeasures becomes crucial as the effect of phishing attacks are often fatal and disastrous. Artificial Intelligence (AI) schemes have been the cornerstone of modern countermeasures used for mitigating phishing attacks. AI-based phishing countermeasures or methods possess their shortcomings particularly the high false alarm rate and the inability to interpret how most phishing methods perform their function. This study proposed four (4) meta-learner models (AdaBoost-Extra Tree (ABET), Bagging -Extra tree (BET), Rotation Forest -Extra Tree (RoFBET) and LogitBoost-Extra Tree (LBET)) developed using the extra-tree base classifier. The proposed AI-based meta-learners were fitted on phishing website datasets (currently with the newest features) and their performances were evaluated. The models achieved a detection accuracy not lower than 97% with a drastically low false-positive rate of not more 0.028. In addition, the proposed models outperform existing MLbased models in phishing attack detection. Hence, we recommend the adoption of meta-learners when building phishing attack detection models.

show abstract

SMOTE-Based Homogeneous Ensemble Methods for Software Defect Prediction

Balogun

Lafenwa-Balogun

Mojeed

et al. 2020

View full text Add to dashboard Cite

Class imbalance is a prevalent problem in machine learning which affects the prediction performance of classification algorithms. Software Defect Prediction (SDP) is no exception to this latent problem. Solutions such as data sampling and ensemble methods have been proposed to address the class imbalance problem in SDP. This study proposes a combination of Synthetic Minority Oversampling Technique (SMOTE) and homogeneous ensemble (Bagging and Boosting) methods for predicting software defects. The proposed approach was implemented using Decision Tree (DT) and Bayesian Network (BN) as base classifiers on defects datasets acquired from NASA software corpus. The experimental results showed that the proposed approach outperformed other experimental methods. High accuracy of 86.8% and area under operating receiver characteristics curve value of 0.93% achieved by the proposed technique affirmed its ability to differentiate between the defective and non-defective labels without bias.

show abstract

Phishing Website Detection: Forest by Penalizing Attributes Algorithm and Its Enhanced Variations

2020

View full text Add to dashboard Cite

Ensemble and Deep-Learning Methods for Two-Class and Multi-Attack Anomaly Intrusion Detection: An Empirical Study

Adeyemo¹,

Abdullah²,

Jhanjhi³

et al. 2019

IJACSA

View full text Add to dashboard Cite

Cyber-security, as an emerging field of research, involves the development and management of techniques and technologies for protection of data, information and devices. Protection of network devices from attacks, threats and vulnerabilities both internally and externally had led to the development of ceaseless research into Network Intrusion Detection System (NIDS). Therefore, an empirical study was conducted on the effectiveness of deep learning and ensemble methods in NIDS, thereby contributing to knowledge by developing a NIDS through the implementation of machine and deep-learning algorithms in various forms on recent network datasets that contains more recent attacks types and attackers' behaviours (UNSW-NB15 dataset). This research involves the implementation of a deep-learning algorithm-Long Short-Term Memory (LSTM)-and two ensemble methods (a homogeneous method-using optimised bagged Random-Forest algorithm, and a heterogeneous method-an Averaged Probability method of Voting ensemble). The heterogeneous ensemble was based on four (4) standard classifiers with different computational characteristics (Naïve Bayes, kNN, RIPPER and Decision Tree). The respective model implementations were applied on the UNSW_NB15 datasets in two forms: as a two-classed attack dataset and as a multi-attack dataset. LSTM achieved a detection accuracy rate of 80% on the two-classed attack dataset and 72% detection accuracy rate on the multi-attack dataset. The homogeneous method had an accuracy rate of 98% and 87.4% on the two-class attack dataset and the multi-attack dataset, respectively. Moreover, the heterogeneous model had 97% and 85.23% detection accuracy rate on the two-class attack dataset and the multi-attack dataset, respectively.

show abstract

Empirical Analysis of Rank Aggregation-Based Multi-Filter Feature Selection Methods in Software Defect Prediction

et al. 2021

View full text Add to dashboard Cite

Selecting the most suitable filter method that will produce a subset of features with the best performance remains an open problem that is known as filter rank selection problem. A viable solution to this problem is to independently apply a mixture of filter methods and evaluate the results. This study proposes novel rank aggregation-based multi-filter feature selection (FS) methods to address high dimensionality and filter rank selection problem in software defect prediction (SDP). The proposed methods combine rank lists generated by individual filter methods using rank aggregation mechanisms into a single aggregated rank list. The proposed methods aim to resolve the filter selection problem by using multiple filter methods of diverse computational characteristics to produce a dis-joint and complete feature rank list superior to individual filter rank methods. The effectiveness of the proposed method was evaluated with Decision Tree (DT) and Naïve Bayes (NB) models on defect datasets from NASA repository. From the experimental results, the proposed methods had a superior impact (positive) on prediction performances of NB and DT models than other experimented FS methods. This makes the combination of filter rank methods a viable solution to filter rank selection problem and enhancement of prediction models in SDP.

show abstract

LCCspm: l-Length Closed Contiguous Sequential Patterns Mining Algorithm to Find Frequent Athlete Movement Patterns from GPS

Adeyemo

Palczewska

Jones

2021

View full text Add to dashboard Cite

Parameter tuning in KNN for software defect prediction: an empirical analysis

Mabayoje

Balogun

Jibril

et al. 2019

Jurnal Teknologi dan Sistem Komputer

View full text Add to dashboard Cite

Software Defect Prediction (SDP) provides insights that can help software teams to allocate their limited resources in developing software systems. It predicts likely defective modules and helps avoid pitfalls that are associated with such modules. However, these insights may be inaccurate and unreliable if parameters of SDP models are not taken into consideration. In this study, the effect of parameter tuning on the k nearest neighbor (k-NN) in SDP was investigated. More specifically, the impact of varying and selecting optimal k value, the influence of distance weighting and the impact of distance functions on k-NN. An experiment was designed to investigate this problem in SDP over 6 software defect datasets. The experimental results revealed that k value should be greater than 1 (default) as the average RMSE values of k-NN when k>1(0.2727) is less than when k=1(default) (0.3296). In addition, the predictive performance of k-NN with distance weighing improved by 8.82% and 1.7% based on AUC and accuracy respectively. In terms of the distance function, kNN models based on Dilca distance function performed better than the Euclidean distance function (default distance function). Hence, we conclude that parameter tuning has a positive effect on the predictive performance of k-NN in SDP.

show abstract

12 3

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.