Prediction of Type 2 Diabetes Risk and Its Effect Evaluation Based on the XGBoost Model

Wang, Liyang; Wang, Xiaoya; Chen, Angxuan; Jin, Xian; Che, Huilian

doi:10.3390/healthcare8030247

Cited by 78 publications

(60 citation statements)

References 20 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…Our study showed XGBoost performed best when predicting microbial sources in two categories. The result is not surprising as other studies have shown that XGBoost has advantages over other models ( Pan, 2018 ; Wang et al, 2020 ). Similar to Random Forest, it is an ensemble method that makes inferences based on multiple decision trees, thus reducing prediction errors.…”

Section: Discussionmentioning

confidence: 50%

Tracking Major Sources of Water Contamination Using Machine Learning

Song

Dubinsky

et al. 2021

Front. Microbiol.

View full text Add to dashboard Cite

Current microbial source tracking techniques that rely on grab samples analyzed by individual endpoint assays are inadequate to explain microbial sources across space and time. Modeling and predicting host sources of microbial contamination could add a useful tool for watershed management. In this study, we tested and evaluated machine learning models to predict the major sources of microbial contamination in a watershed. We examined the relationship between microbial sources, land cover, weather, and hydrologic variables in a watershed in Northern California, United States. Six models, including K-nearest neighbors (KNN), Naïve Bayes, Support vector machine (SVM), simple neural network (NN), Random Forest, and XGBoost, were built to predict major microbial sources using land cover, weather and hydrologic variables. The results showed that these models successfully predicted microbial sources classified into two categories (human and non-human), with the average accuracy ranging from 69% (Naïve Bayes) to 88% (XGBoost). The area under curve (AUC) of the receiver operating characteristic (ROC) illustrated XGBoost had the best performance (average AUC = 0.88), followed by Random Forest (average AUC = 0.84), and KNN (average AUC = 0.74). The importance index obtained from Random Forest indicated that precipitation and temperature were the two most important factors to predict the dominant microbial source. These results suggest that machine learning models, particularly XGBoost, can predict the dominant sources of microbial contamination based on the relationship of microbial contaminants with daily weather and land cover, providing a powerful tool to understand microbial sources in water.

show abstract

Section: Discussionmentioning

confidence: 50%

Tracking Major Sources of Water Contamination Using Machine Learning

Song

Dubinsky

et al. 2021

Front. Microbiol.

View full text Add to dashboard Cite

show abstract

“…These parameters include the number of boosting stages to perform (n_estimators), the maximum depth of a tree (max_depth), the minimum sum of instance weight required in a child (min_child_weight), the subsample ratio of the training instances (subsample), the random seed given to each estimator at each boosting iteration (random_state), and the rate of learning from training data (learning_rate). Similar XGBoost hyperparameter tuning has been done by the authors of the study [65].…”

Section: Resultsmentioning

confidence: 99%

Severity Classification of Diabetic Retinopathy Using an Ensemble Learning Algorithm through Analyzing Retinal Images

et al. 2021

View full text Add to dashboard Cite

Diabetic Retinopathy (DR) refers to the damages endured by the retina as an effect of diabetes. DR has become a severe health concern worldwide, as the number of diabetes patients is soaring uncountably. Periodic eye examination allows doctors to detect DR in patients at an early stage to initiate proper treatments. Advancements in artificial intelligence and camera technology have allowed us to automate the diagnosis of DR, which can benefit millions of patients indeed. This paper inscribes a novel method for DR diagnosis based on the gray-level intensity and texture features extracted from fundus images using a decision tree-based ensemble learning technique. This study primarily works with the Asia Pacific Tele-Ophthalmology Society 2019 Blindness Detection (APTOS 2019 BD) dataset. We undertook several steps to curate its contents to make them more suitable for machine learning applications. Our approach incorporates several image processing techniques, two feature extraction techniques, and one feature selection technique, which results in a classification accuracy of 94.20% (margin of error: ±0.32%) and an F-measure of 93.51% (margin of error: ±0.5%). Several other parameters regarding the proposed method’s performance have been presented to manifest its robustness and reliability. Details on each employed technique have been included to make the provided results reproducible. This method can be a valuable tool for mass retinal screening to detect DR, thus drastically reducing the rate of vision loss attributed to it.

show abstract

“…Unfortunately, in the case of methods such as deep learning, one cannot understand the decision-making process within the algorithm. Thus, it is hard to make grounded implications for medical purposes when multiple causes are at play [ 27 ].…”

Section: Discussionmentioning

confidence: 99%

Identifying High-Risk Factors of Depression in Middle-Aged Persons with a Novel Sons and Spouses Bayesian Network Model

et al. 2020

View full text Add to dashboard Cite

It has been reported repeatedly that depression in middle-aged people may cause serious ramifications in public health. However, previous studies on this important research topic have focused on utilizing either traditional statistical methods (i.e., logistic regressions) or black-or-gray artificial intelligence (AI) methods (i.e., neural network, Support Vector Machine (SVM), ensemble). Previous studies lack suggesting more decision-maker-friendly methods, which need to produce clear interpretable results with information on cause and effect. For the sake of improving the quality of decisions of healthcare decision-makers, public health issues require identification of cause and effect information for any type of strategic healthcare initiative. In this sense, this paper proposes a novel approach to identify the main causes of depression in middle-aged people in Korea. The proposed method is the Sons and Spouses Bayesian network model, which is an extended version of conventional TAN (Tree-Augmented Naive Bayesian Network). The target dataset is a longitudinal dataset employed from the Korea National Health and Nutrition Examination Survey (KNHANES) database with a sample size of 8580. After developing the proposed Sons and Spouses Bayesian network model, we found thirteen main causes leading to depression. Then, genetic optimization was executed to reveal the most probable cause of depression in middle-aged people that would provide practical implications to field practitioners. Therefore, our proposed method can help healthcare decision-makers comprehend changes in depression status by employing what-if queries towards a target individual.

show abstract

Prediction of Type 2 Diabetes Risk and Its Effect Evaluation Based on the XGBoost Model

Cited by 78 publications

References 20 publications

Tracking Major Sources of Water Contamination Using Machine Learning

Tracking Major Sources of Water Contamination Using Machine Learning

Severity Classification of Diabetic Retinopathy Using an Ensemble Learning Algorithm through Analyzing Retinal Images

Identifying High-Risk Factors of Depression in Middle-Aged Persons with a Novel Sons and Spouses Bayesian Network Model

Contact Info

Product

Resources

About