Data Mining plays an important role in the field of healthcare, because disease diagnosis and analysis have huge size of data. These circumstances create huge number of data handling issues, and that to be handled effectively. The health dataset's are uncertain and dynamic in nature and it is very tedious to maintain and to manipulate. To overcome the above issues, several studies introduced numerous Machine learning approaches for various disease diagnosis and prognosis. This paper a different data mining and machine learning techniques used in diabetes are analyzed and compared. The task of disease diagnosis and prognosis is a part of classification and prediction. The recent and popular data mining techniques used in clinical data includes Bayesian, Random forest algorithms, Artificial Neural network, SVM and Decision Tree etc. This paper gives the problems and findings about those techniques with various factors.
Diabetes is a chronic disease that causes numerous amount of death each year. Untreated diabetes disturbs the proper functionality of other organs in human body. Hence early detection is a significant process to have a healthy life style. Usually the performance of the classification is affected due to the existence of high dimensionality in medical data.In this study a system model is proposed on Pima dataset to enhance the classification accuracy by eliminating the irrelevant features. Therefore it is important to choose a suitable feature selection approach that provides the better accuracy in disease prediction compared to prior study.Hencenovel techniquesImprovedFirefly(IFF)andhybrid Random forest algorithmis proposed for feature selection and classification. The present study provides a better result with 96.3% accuracy.The efficiency of the present studyis compared with the prior classification approaches.
Sentiment analysis is field of text mining in which reviews are in form of unstructured data so opinions can be extracted from overall opinion. This paper works on finding approaches that generate output with good accuracy. Least squares twin support vector machine (LSTSVM) is a quite new version of support vector machine (SVM) based on non-parallel twin hyperplanes. LSTSVM is an extremely efficient and fast algorithm for binary classification and its parameters depend on the nature of the problem. The goal of this paper is to improve the accuracy through LSTSVM. A result on several benchmark datasets is applied to train a sentiment classifier inorder to demonstrate the accuracy of the proposed algorithm. N-grams and different weighting scheme were used to take out the most classical features. It also analyzes Chi-Square weight features to select informative features for the classification. Experimental analysis reveals that by using Chi-Square feature selection in LSTSVM may provide significant improvement on classification accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.