Aims
This study aimed to develop a machine learningâbased prediction model for gestational diabetes mellitus (GDM) in early pregnancy in Chinese women.
Materials and methods
We used an established populationâbased prospective cohort of 19,331 pregnant women registered as pregnant before the 15th gestational week in Tianjin, China, from October 2010 to August 2012. The dataset was randomly divided into a training set (70%) and a test set (30%). Risk factors collected at registration were examined and used to construct the prediction model in the training dataset. Machine learning, that is, the extreme gradient boosting (XGBoost) method, was employed to develop the model, while a traditional logistic model was also developed for comparison purposes. In the test dataset, the performance of the developed prediction model was assessed by calibration plots for calibration and area under the receiver operating characteristic curve (AUR) for discrimination.
Results
In total, 1484 (7.6%) women developed GDM. Preâpregnancy body mass index, maternal age, fasting plasma glucose at registration, and alanine aminotransferase were selected as risk factors. The machine learning XGBoost modelâpredicted probability of GDM was similar to the observed probability in the test data set, while the logistic model tended to overestimate the risk at the highest risk level (HosmerâLemeshow test p value: 0.243 vs. 0.099). The XGBoost model achieved a higher AUR than the logistic model (0.742 vs. 0.663, p < 0.001). This XGBoost model was deployed through a free, publicly available software interface (https://liuhongwei.shinyapps.io/gdm_risk_calculator/).
Conclusion
The XGBoost model achieved better performance than the logistic model.