Diabetes mellitus is a hyperglycemia-like chronic condition that is a troublesome disease. It is estimated that, according to the growing morbidity, by 2040, the world will cross 642 million diabetic patients. This means that each one of the ten adults will be diabetes-affected. Diabetes can also lead to other illnesses such as heart attacks, kidney damage, and even blindness. The prediction of diabetes in advance motivates us to develop a machine learning-based model. A dataset was obtained from the online repository for this work. The obtained dataset was imbalanced. An imbalanced dataset presents a challenge that is needed to be balanced for prediction using multiple machine learning like Tomek and SMOTE. These techniques remove necessary outliers that are incomplete in the provided dataset. These outliers are also managed using the IQR method. Additionally, this research employed a two-stage model selection methodology. In the first stage, logistic regression, Support Vector Machine, k-nearest neighbors, gradient boost, Naive Bayes, and Random Forests were applied to determine the efficiency of prediction based on patients’ preconditioning. At this stage, Random Forest was found to be the best with an accuracy of 80.7% after applying SMOTE oversampling technique to balance the dataset. In the second stage, three better-performing models were used by utilizing a voting algorithm. The results were encouraging, and the model obtained 82.0% accuracy with the default dataset and 81.7% accuracy with the balanced dataset. Naive Bayes Theorem, Gradient Boosting Classifier, and Random Forest were used as inputs to the voting algorithm.
Data have multiplied at an exponential rate in the age of the Internet. Large amounts of data can be combined at this science hotspot. Making sense of big data has become increasingly difficult due to its volume, velocity, precision, and variety (sometimes referred to as heterogeneity). Many data sources are employed to create data heterogeneity. Big data fusion has both advantages and disadvantages when it comes to integrating data from a variety of sources. The focus of this work is on large data fusion using deep learning approaches to combine datasets from a variety of different sources. It is also possible to combine data from many sources. People are increasingly turning to the Internet and web-based services to meet their daily demands. Storage media can hold data in a variety of formats. Managing the vast volume of data is quite tough for an organization (referred to as “big data”). These data are rationally combined and incorporated into the system. Data fusion will be the subject of this paper. The process of collecting data and making judgments based on that data has become much more challenging as a result of technological advancements. The heterogeneity of data is made possible by the great volume, precision, and, most critically, variety of big data. A wide range of data sources can both help and hinder big-data converging. This study was created to introduce several methods and techniques for semantically merging huge datasets.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.