In this paper, we solve the customer credit card churn prediction via data mining. We developed an ensemble system incorporating majority voting and involving Multilayer Perceptron (MLP), Logistic Regression (LR), decision trees (J48), Random Forest (RF), Radial Basis Function (RBF) network and Support Vector Machine (SVM) as the constituents. The dataset was taken from the Business Intelligence Cup organised by the University of Chile in 2004. Since it is a highly unbalanced dataset with 93% loyal and 7% churned customers, we employed (1) undersampling, (2) oversampling, (3) a combination of undersampling and oversampling and (4) the Synthetic Minority Oversampling Technique (SMOTE) for balancing it. Furthermore, tenfold cross-validation was employed. The results indicated that SMOTE achieved good overall accuracy. Also, SMOTE and a combination of undersampling and oversampling improved the sensitivity and overall accuracy in majority voting. In addition, the Classification and Regression Tree (CART) was used for the purpose of feature selection. The reduced feature set was fed to the classifiers mentioned above. Thus, this paper outlines the most important predictor variables in solving the credit card churn prediction problem. Moreover, the rules generated by decision tree J48 act as an early warning expert system.
Banknotes are currencies used by any nation to carry-out financial activities and are every countries asset which every nation wants it (bank-note) to be genuine. Lot of miscreants induces fake notes into the market which resemble exactly the original note. Hence, there is a need for an efficient authentication system which predicts accurately whether the given note is genuine or not. Exhaustive experiments have been conducted using different machine learning techniques and found that Decision tree and MLP techniques are effective for banknote authentication which efficiently classifies a given banknote data. The rules given by Decision Tree are also tested and found that they are accurate enough to be used for prediction.
Anemia is one of the most pressing public health issues in the world with iron deficiency a major public health issue worldwide. The highest prevalence of anemia is in developing countries. The complete blood count is a blood test used to diagnose the prevalence of anemia. While earlier studies have framed the problem of diagnosis as a binary classification problem, this paper frames it as a multi class (three classes) classification problem with mild, moderate and severe classes. The three classes for the anemia classification (mild, moderate, severe) are so chosen as the world health organization (WHO) guidelines formalize this categorization based on the Haemoglobin (HGB) values of the chosen sample of patients in the Complete Blood Count (CBC) patient data set. Complete blood count test data was collected in an outpatient clinical setting in India. We used Feature selection with Majority voting to identify the key attributes in the input patient data set. In addition, since the original data set was imbalanced we used Synthetic Minority Oversampling Technique (SMOTE) to balance the data set. Four data sets including the original data set were used to perform the data experiments. Six standard machine learning algorithms were utilised to test our four data sets, performing multi class classification. Benchmarking these algorithms was performed and tabulated using both10 fold cross validation and hold out methods. The experimental results indicated that multilayer perceptron network was predominantly giving good recall values across mild and moderate class which are early and middle stages of the disease. With a good prediction model at early stages, medical intervention can provide preventive measure from further deterioration into severe stage or recommend the use of supplements to overcome this problem.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.