Data Mining plays an important role in the field of healthcare, because disease diagnosis and analysis have huge size of data. These circumstances create huge number of data handling issues, and that to be handled effectively. The health dataset's are uncertain and dynamic in nature and it is very tedious to maintain and to manipulate. To overcome the above issues, several studies introduced numerous Machine learning approaches for various disease diagnosis and prognosis. This paper a different data mining and machine learning techniques used in diabetes are analyzed and compared. The task of disease diagnosis and prognosis is a part of classification and prediction. The recent and popular data mining techniques used in clinical data includes Bayesian, Random forest algorithms, Artificial Neural network, SVM and Decision Tree etc. This paper gives the problems and findings about those techniques with various factors.
Diabetes is a chronic disease that causes numerous amount of death each year. Untreated diabetes disturbs the proper functionality of other organs in human body. Hence early detection is a significant process to have a healthy life style. Usually the performance of the classification is affected due to the existence of high dimensionality in medical data.In this study a system model is proposed on Pima dataset to enhance the classification accuracy by eliminating the irrelevant features. Therefore it is important to choose a suitable feature selection approach that provides the better accuracy in disease prediction compared to prior study.Hencenovel techniquesImprovedFirefly(IFF)andhybrid Random forest algorithmis proposed for feature selection and classification. The present study provides a better result with 96.3% accuracy.The efficiency of the present studyis compared with the prior classification approaches.
Sentiment analysis is field of text mining in which reviews are in form of unstructured data so opinions can be extracted from overall opinion. This paper works on finding approaches that generate output with good accuracy. Least squares twin support vector machine (LSTSVM) is a quite new version of support vector machine (SVM) based on non-parallel twin hyperplanes. LSTSVM is an extremely efficient and fast algorithm for binary classification and its parameters depend on the nature of the problem. The goal of this paper is to improve the accuracy through LSTSVM. A result on several benchmark datasets is applied to train a sentiment classifier inorder to demonstrate the accuracy of the proposed algorithm. N-grams and different weighting scheme were used to take out the most classical features. It also analyzes Chi-Square weight features to select informative features for the classification. Experimental analysis reveals that by using Chi-Square feature selection in LSTSVM may provide significant improvement on classification accuracy.
Sentimental analysis is the process of identifying the human’s thoughts or feelings. So Many methods have been developed for the sentimental analysis. Machine learning is one of the widely used approaches towards sentiment classification. In this work, Sentimental analysis is done by using Relevance Vector Machine Classifier with Cuckoo Search Optimization. Here Relevance Vector Machine Classifier (RVMC) is combined with Cuckoo Search Optimization (CSO) for better accuracy and performance. Experiment is made with movie and twitter datasets. Accuracy, precision and recall of all other techniques are evaluated. Here the comparison is made among other algorithms. The result shows that RVMC-CSO algorithm gives accuracy and good performance than other algorithm like SVM, ELM and RVM.
Diabetes is the most common chronic disease among
the world. Early prediction of these will assist the physicians to
provide the improved treatment. Machine learning approaches
are widely used for predicting the disease at the earlier stage.
However the selecting the significant features and the suitable
classifier are still reduces the diagnosis accuracy. In this paper
the PCA based feature transformation and the hybrid random
forest classifier is utilized for diabetes prediction. PCA attempt to
identify the best subset of transformed components that greatly
improves the classification result. The system is compared with
priori machine learning approaches to evaluate the efficiency of
this work. The experimental result shows that the present study
enhances the prediction accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.