Disease risk prediction is a rising challenge in the medical domain. Researchers have widely used machine learning algorithms to solve this challenge. The k-nearest neighbour (KNN) algorithm is the most frequently used among the wide range of machine learning algorithms. This paper presents a study on different KNN variants (Classic one, Adaptive, Locally adaptive, k-means clustering, Fuzzy, Mutual, Ensemble, Hassanat and Generalised mean distance) and their performance comparison for disease prediction. This study analysed these variants in-depth through implementations and experimentations using eight machine learning benchmark datasets obtained from Kaggle, UCI Machine learning repository and OpenML. The datasets were related to different disease contexts. We considered the performance measures of accuracy, precision and recall for comparative analysis. The average accuracy values of these variants ranged from 64.22% to 83.62%. The Hassanaat KNN showed the highest average accuracy (83.62%), followed by the ensemble approach KNN (82.34%). A relative performance index is also proposed based on each performance measure to assess each variant and compare the results. This study identified Hassanat KNN as the best performing variant based on the accuracy-based version of this index, followed by the ensemble approach KNN. This study also provided a relative comparison among KNN variants based on precision and recall measures. Finally, this paper summarises which KNN variant is the most promising candidate to follow under the consideration of three performance measures (accuracy, precision and recall) for disease prediction. Healthcare researchers and stakeholders could use the findings of this study to select the appropriate KNN variant for predictive disease risk analytics.
Chronic disease prediction is a critical task in healthcare. Existing studies fulfil this requirement by employing machine learning techniques based on patient features, but they suffer from high dimensional data problems and a high level of bias. We propose a framework for predicting chronic disease based on Graph Neural Networks (GNNs) to address these issues. We begin by projecting a patient-disease bipartite graph to create a weighted patient network (WPN) that extracts the latent relationship among patients. We then use GNN-based techniques to build prediction models. These models use features extracted from WPN to create robust patient representations for chronic disease prediction. We compare the output of GNN-based models to machine learning methods by using cardiovascular disease and chronic pulmonary disease. The results show that our framework enhances the accuracy of chronic disease prediction. The model with attention mechanisms achieves an accuracy of 93.49% for cardiovascular disease prediction and 89.15% for chronic pulmonary disease prediction. Furthermore, the visualisation of the last hidden layers of GNN-based models shows the pattern for the two cohorts, demonstrating the discriminative strength of the framework. The proposed framework can help stakeholders improve health management systems for patients at risk of developing chronic diseases and conditions.
Graph machine-learning (ML) methods have recently attracted great attention and have made significant progress in graph applications. To date, most graph ML approaches have been evaluated on social networks, but they have not been comprehensively reviewed in the health informatics domain. Herein, a review of graph ML methods and their applications in the disease prediction domain based on electronic health data is presented in this study from two levels: node classification and link prediction. Commonly used graph ML approaches for these two levels are shallow embedding and graph neural networks (GNN). This study performs comprehensive research to identify articles that applied or proposed graph ML models on disease prediction using electronic health data. We considered journals and conferences from four digital library databases (i.e., PubMed, Scopus, ACM digital library, and IEEEXplore). Based on the identified articles, we review the present status of and trends in graph ML approaches for disease prediction using electronic health data. Even though GNN-based models have achieved outstanding results compared with the traditional ML methods in a wide range of disease prediction tasks, they still confront interpretability and dynamic graph challenges. Though the disease prediction field using ML techniques is still emerging, GNN-based models have the potential to be an excellent approach for disease prediction, which can be used in medical diagnosis, treatment, and the prognosis of diseases.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.