This paper analyses the performance of classification models using single classification and combination of ensemble method, which are Breast Cancer Wisconsin and Hepatitis data sets as training datasets. This paper presents a comparison of different classifiers based on a 10-fold cross validation using a data mining tool. In this experiment, various classifiers are implemented including three popular ensemble methods which are boosting, bagging and stacking for the combination. The result shows that for the classification of the Breast Cancer Wisconsin data set, the single classification of Naïve Bayes (NB) and a combination of bagging+NB algorithm displayed the highest accuracy at the same percentage (97.51%) compared to other combinations of ensemble classifiers. For the classification of the Hepatitisdata set, the result showed that the combination of stacking+Multi-Layer Perception (MLP) algorithm achieved a higher accuracy at 86.25%. By using the ensemble classifiers, the result may be improved. In future, a multi-classifier approach will be proposed by introducing a fusion at the classification level between these classifiers to obtain classification with higher accuracies.
The aims of this paper were to provide a comprehensive review of classification techniques and their alternative approaches in data mining. Classification is a data mining technique that assigns categories to a collection of data to aide in more accurate predictions and analyses. It is one of the several methods intended to make the analysis of very large datasets effective. The goal of classification is to accurately predict the target class for each case in the data. One of the classification approaches is the ensemble method. In recent years, the usage of ensemble method in medical application has been increasing. Not only in medical areas, it can also help researchers to solve modem problems in many fields like machine learning, data mining and other related areas.
Dirty water is the world's biggest health risk. When water from rain roads into rivers, it picks up toxic chemicals, dirt, trash and disease-carrying organisms along the way. Many of our water resources lack basic protections, making them vulnerable to pollution from factory farms and industrial plants. Due to that, a classification model is needed to present the quality of the water environment. In this paper, the data mining techniques are used in this research by applying the classification method for water quality application. Various classifiers were studied in order to find the most accurate classifier for the dataset. This paper presents the comparison of accuracies for the five classifiers (NB, MLP, J48, SMO, and IBk) based on a 10-fold cross validation as a test method with respect to water quality from the datasets of Kinta River, Perak Malaysia. This study also explores which classifier is suitable to classify the dataset. The selected attributes used in this study were:
Breast cancer is one of the diseases that haunt every woman around the world. It is one of the main killers of women not only in Malaysia, even around the world. Medical diagnosis such as breast cancer is considered a significant but complicated task that needs to be carried out correctly and effectively. In improving the prediction accuracy of breast cancer dataset, this study evaluates the performance of multi-classifier based deep learning approach on datasets. There are five classifiers that are involved like Sequential Minimal Optimization (SMO) , decision tree (J48) , random forests (RFs), Naïve Bayes (NB) and Instance Based for K-Nearest neighbor (IBk). These classifiers will be combined and analyzed using deep learning approach. This strategy utilizes models of deep neural network that is a variant of Neural Network but with big approximation to human brain using an advance system compared to a straightforward neural network. The results of combination different classifiers using deep learning approach indicate the highest accuracy than single classification with 96.63% as a combination SMO+RF+IBK+NB.
This paper presents a comparison among the different classifier such as Sequential Minimal Optimization (SMO) , decision tree (J48) , random forests (RFs), Naïve Bayes (NB) and Instance Based for K-Nearest neighbor (IBK) on medical data sets such as Breast Cancer Wisconsin and Hepatitis. Classification accuracy was used in this research based on 10-fold cross validation method. Then, a combination at classification level between these classifiers using deep learning approach was applied to get the highest accuracy and see which the most suitable Deep Multi-classifier Learning (DMCL) approach for the data sets. These medical data sets were taken from the UCI Repository. The results showed that the combination SMO+RF+IBK+NB achieved the highest accuracy for Breast Cancer Wisconsin data set with percentage 96.63%. While for Hepatitis data set, the combination IBk+NB+J48+SMO achieved the highest percentage with 92.50 %. It showed that the proposed method are able to produce the highest prediction accuracy than single and combination of classifier that using majority voting for all these medical data sets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.