2022
DOI: 10.1155/2022/9220560
|View full text |Cite
|
Sign up to set email alerts
|

Diabetes Mellitus Disease Prediction Using Machine Learning Classifiers with Oversampling and Feature Augmentation

Abstract: The technical improvements in healthcare sector today have given rise to many new inventions in the field of artificial intelligence. Patterns for disease identification are carried out, and the onset of prediction of many diseases is detected. Diseases include diabetes mellitus disease, fatal heart diseases, and symptomatic cancer. There are many algorithms that have played a critical role in the prediction of diseases. This paper proposes an ML based approach for diabetes mellitus disease prediction. For dia… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
3

Relationship

1
7

Authors

Journals

citations
Cited by 11 publications
(12 citation statements)
references
References 32 publications
0
7
0
Order By: Relevance
“…The suggested model yields typical outcomes in the identification of diabetes mellitus. Before applying machine learning techniques, it was advised to utilize data augmentation on the PIMA dataset to prevent under-sampling because of the tiny dataset [18]. When combined with augmented datasets, machine learning algorithms significantly improve their ability to forecast diabetes mellitus.…”
Section: Literature Surveymentioning
confidence: 99%
“…The suggested model yields typical outcomes in the identification of diabetes mellitus. Before applying machine learning techniques, it was advised to utilize data augmentation on the PIMA dataset to prevent under-sampling because of the tiny dataset [18]. When combined with augmented datasets, machine learning algorithms significantly improve their ability to forecast diabetes mellitus.…”
Section: Literature Surveymentioning
confidence: 99%
“…Based on research on the prediction of chronic disease, here are some names of methods used for handling imbalance data. (1) oversampling techniques: SMOTE [70], [85]- [88]; ADASYN [60], [89], [90]; ROS [5], [20]; orchard SMOTE [91]; SMOTE-Tomek [92]; TimeGAN49 [93]; and SVM-SMOTE [94]. (2) undersampling techniques such as: Tversky similarity [95], near miss [60], RUS [60], and NCL [89].…”
Section: ) Imbalance Datamentioning
confidence: 99%
“…More tolerant to overfitting compared to a single decision tree XGBoost [63] max_depth: 1-10, gamma: [0, 0.4-1], min_child_weight: [1][2][3][4][5][6]8,10] Boosting for accuracy prediction C4.5 [206] Criterion: gini, max_depth: none, n_estimators: 150 Automatically identifying key risk factors associated with stroke CatBoost [10] learning_rate: 0.03, class_weight: 1, iterations: 100, depth: 6…”
Section: Strokementioning
confidence: 99%
See 1 more Smart Citation
“…It is calculated using 2 , where y1 is the actually calculated value and y1 ′ is the fnally value that is predicted by the model. So y1 ′ is replaced with G n (X) which represents the target value [24]. It is mathematically given as follows:…”
Section: Gradient Boostingmentioning
confidence: 99%