Background Diabetic mellitus (DM) and cardiovascular diseases (CVD) cause significant healthcare burden globally and often co-exists. Current approaches often fail to identify many people with co-occurrence of DM and CVD, leading to delay in healthcare seeking, increased complications and morbidity. In this paper, we aimed to develop and evaluate a two-stage machine learning (ML) model to predict the co-occurrence of DM and CVD. Methods We used the diabetes complications screening research initiative (DiScRi) dataset containing >200 variables from >2000 participants. In the first stage, we used two ML models (logistic regression and Evimp functions) implemented in multivariate adaptive regression splines model to infer the significant common risk factors for DM and CVD and applied the correlation matrix to reduce redundancy. In the second stage, we used classification and regression algorithm to develop our model. We evaluated the prediction models using prediction accuracy, sensitivity and specificity as performance metrics. Results Common risk factors for DM and CVD co-occurrence was family history of the diseases, gender, deep breathing heart rate change, lying to standing blood pressure change, HbA1c, HDL and TC\HDL ratio. The predictive model showed that the participants with HbA1c >6.45 and TC\HDL ratio > 5.5 were at risk of developing both diseases (97.9% probability). In contrast, participants with HbA1c >6.45 and TC\HDL ratio ≤ 5.5 were more likely to have only DM (84.5% probability) and those with HbA1c ≤5.45 and HDL >1.45 were likely to be healthy (82.4%. probability). Further, participants with HbA1c ≤5.45 and HDL <1.45 were at risk of only CVD (100% probability). The predictive accuracy of the ML model to detect co-occurrence of DM and CVD is 94.09%, sensitivity 93.5%, and specificity 95.8%. Conclusions Our ML model can significantly predict with high accuracy the co-occurrence of DM and CVD in people attending a screening program. This might help in early detection of patients with DM and CVD who could benefit from preventive treatment and reduce future healthcare burden.
Background: Cardiac autonomic neuropathy (CAN) is a diabetes-related complication with increasing prevalence and remains challenging to detect in clinical settings. Machine learning (ML) approaches have the potential to predict CAN using clinical data. In this study, we aimed to develop and evaluate the performance of an ML model to predict early CAN occurrence in patients with diabetes. Methods: We used the diabetes complications screening research initiative data set containing 200 CAN-related tests on more than 2000 participants with type 2 diabetes in Australia. Data were collected on peripheral nerve functions, Ewing’s tests, blood biochemistry, demographics, and medical history. The ML model was validated using 10-fold cross-validation, of which 90% were used in training the model and the remaining 10% was used in evaluating the performance of the model. Predictive accuracy was assessed by area under the receiver operating curve, and sensitivity, specificity, positive predictive value, and negative predictive value. Results: Of the 237 patients included, 105 were diagnosed with an early stage of CAN while the remaining 132 were healthy. The ML model showed outstanding performance for CAN prediction with receiver operating characteristic curve of 0.962 [95% confidence interval (CI) = 0.939–0.984], 87.34% accuracy, and 87.12% sensitivity. There was a significant and positive association between the ML model and CAN occurrence ( p < 0.001). Conclusion: Our ML model has the potential to detect CAN at an early stage using Ewing’s tests. This model might be useful for healthcare providers for predicting the occurrence of CAN in patients with diabetes, monitoring the progression, and providing timely intervention.
In recent years, graph data analysis has become very important in modeling data distribution or structure in many applications, for example, social science, astronomy, computational biology or social networks with a massive number of nodes and edges. However, high-dimensionality of the graph data remains a difficult task, mainly because the analysis system is not used to dealing with large graph data. Therefore, graph-based dimensionality reduction approaches have been widely used in many machine learning and pattern recognition applications. This paper offers a novel dimensionality reduction approach based on the recent graph data. In particular, we focus on combining two linear methods: Neighborhood Preserving Embedding (NPE) method with the aim of preserving the local neighborhood information of a given dataset, and Principal Component Analysis (PCA) method with aims of maximizing the mutual information between the original high-dimensional data sets. The combination of NPE and PCA contributes to proposing a new Hybrid dimensionality reduction technique (HDR). We propose HDR to create a transformation matrix, based on formulating a generalized eigenvalue problem and solving it with Rayleigh Quotient solution. Consequently, therefore, a massive reduction is achieved compared to the use of PCA and NPE separately. We compared the results with the conventional PCA, NPE, and other linear dimension reduction methods. The proposed method HDR was found to perform better than other techniques. Experimental results have been based on two real datasets.
Summary Many emerging applications such as social networks have prompted remarkable attention in graph data analysis. Graph data is typically high‐dimensional in nature, and dimensionality reduction is critical regarding storage, analysis, and querying of such data efficiently. Although there are many dimensionality reduction methods, it is not clear to what extent the performances of the various dimensionality reduction techniques differ. In this article, we review some of the well‐known linear dimensionality reduction methods and perform an empirical analysis of these approaches using large multidimensional graph datasets. Our results show that in linear unsupervised learning methods, the principal component analysis, singular value decomposition, and neighborhood preserving embedding methods achieve better retrieval data performance than other methods of the statistical information category, dictionary methods, and embedding methods, respectively. Regarding supervised learning methods, the experimental results demonstrate that linear discriminant analysis and partial least squares presented almost similar results.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.