Class imbalance is a prevalent problem in machine learning which affects the prediction performance of classification algorithms. Software Defect Prediction (SDP) is no exception to this latent problem. Solutions such as data sampling and ensemble methods have been proposed to address the class imbalance problem in SDP. This study proposes a combination of Synthetic Minority Oversampling Technique (SMOTE) and homogeneous ensemble (Bagging and Boosting) methods for predicting software defects. The proposed approach was implemented using Decision Tree (DT) and Bayesian Network (BN) as base classifiers on defects datasets acquired from NASA software corpus. The experimental results showed that the proposed approach outperformed other experimental methods. High accuracy of 86.8% and area under operating receiver characteristics curve value of 0.93% achieved by the proposed technique affirmed its ability to differentiate between the defective and non-defective labels without bias.
Software testing using software defect prediction aims to detect as many defects as possible in software before the software release. This plays an important role in ensuring quality and reliability. Software defect prediction can be modeled as a classification problem that classifies software modules into two classes: defective and non-defective; and classification algorithms are used for this process. This study investigated the impact of feature selection methods on classification via clustering techniques for software defect prediction. Three clustering techniques were selected; Farthest First Clusterer, K-Means and Make-Density Clusterer, and three feature selection methods: Chi-Square, Clustering Variation, and Information Gain were used on software defect datasets from NASA repository. The best software defect prediction model was farthest-first using information gain feature selection method with an accuracy of 78.69%, precision value of 0.804 and recall value of 0.788. The experimental results showed that the use of clustering techniques as a classifier gave a good predictive performance and feature selection methods further enhanced their performance. This indicates that classification via clustering techniques can give competitive results against standard classification methods with the advantage of not having to train any model using labeled dataset; as it can be used on the unlabeled datasets.Keywords: Classification, Clustering, Feature Selection, Software Defect PredictionVol. 26, No 1, June, 2019
Banking is one of the sectors that pays close attention to their clients’ behavior with a view to tracking their activities, most especially as relates to monetary transactions. To add new customers to the existing fold is not only time consuming, but also expensive. This is why Banks generally would like to do everything within their means to ensure the customer retention pattern is consistently high. The objective of this study, therefore, is to create a prediction model that is capable of predicting the retention rate of bank customers. In other to achieve this central goal, this study proposed a machine learning predictive model, created using a function that combines a number of base classifiers to produce an efficient model. The model was created from the dataset retrieved from an open repository, kaggle. The data basically comprised of some demographic and psychological features and the algorithms implemented on these datasets includes: KNN, CART and Naïve Bayes as base classifiers, while the Logistic Regression was used as the Meta Classifier. The model created was evaluated severally to determine its level of accuracy. The resulting output shows a very high accuracy of 83%. A further comparison of this result with the existing related studies unveils that, the proposed ensemble classifier out-performs the existing model which attains 79% to 81% classification accuracies. The proposed model is reliable and can therefore, be used as a bench-mark for similar models created for the prediction of customer retention pattern within the banking sector.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.