The coronavirus disease 2019 (COVID-19), caused by the virus SARS-CoV-2, is an acute respiratory disease that has been classified as a pandemic by the World Health Organization (WHO). The sudden spike in the number of infections and high mortality rates have put immense pressure on the public healthcare systems. Hence, it is crucial to identify the key factors for mortality prediction to optimize patient treatment strategy. Different routine blood test results are widely available compared to other forms of data like X-rays, CT-scans, and ultrasounds for mortality prediction. This study proposes machine learning (ML) methods based on blood tests data to predict COVID-19 mortality risk. A powerful combination of five features: neutrophils, lymphocytes, lactate dehydrogenase (LDH), high-sensitivity C-reactive protein (hs-CRP), and age helps to predict mortality with 96% accuracy. Various ML models (neural networks, logistic regression, XGBoost, random forests, SVM, and decision trees) have been trained and performance compared to determine the model that achieves consistently high accuracy across the days that span the disease. The best performing method using XGBoost feature importance and neural network classification, predicts with an accuracy of 90% as early as 16 days before the outcome. Robust testing with three cases based on days to outcome confirms the strong predictive performance and practicality of the proposed model. A detailed analysis and identification of trends was performed using these key biomarkers to provide useful insights for intuitive application. This study provide solutions that would help accelerate the decision-making process in healthcare systems for focused medical treatments in an accurate, early, and reliable manner.
The coronavirus disease 2019 (COVID-19) is an acute respiratory disease that has been classified as a pandemic by World Health Organization (WHO). The sudden spike in the number of infections and high mortality rates have put immense pressure on the public medical systems. Hence, it is crucial to identify the key factors of mortality that yield high accuracy and consistency to optimize patient treatment strategy. This study uses machine learning methods to identify a powerful combination of five features that help predict mortality with 96% accuracy: neutrophils, lymphocytes, lactate dehydrogenase (LDH), high-sensitivity C-reactive protein (hs-CRP) and age. Various machine learning algorithms have been compared to achieve a consistent high accuracy across the days that span the disease. Robust testing with three cases confirm the strong predictive performance of the proposed model. The model predicts with an accuracy of 90% as early as 16 days before the outcome. This study would help accelerate the decision making process in healthcare systems for focused medical treatments early and accurately.
The variability of clinical course and prognosis of COVID-19 highlights the necessity of patient sub-group risk stratification based on clinical data. In this study, clinical data from a cohort of Indian COVID-19 hospitalized patients is used to develop risk stratification and mortality prediction models. We analyzed a set of 70 clinical parameters including physiological and hematological for developing machine learning models to identify biomarkers. We also compared the Indian and Wuhan cohort, and analyzed the role of steroids. A bootstrap averaged ensemble of Bayesian networks was also learned to construct an explainable model for discovering actionable influences on mortality and days to outcome. We discovered blood parameters, diabetes, co-morbidity and SpO2 levels as important risk stratification features, whereas mortality prediction is dependent only on blood parameters. XGboost and logistic regression model yielded the best performance on risk stratification and mortality prediction, respectively (AUC score 0.83, AUC score 0.92). Blood coagulation parameters (ferritin, D-Dimer and INR), immune and inflammation parameters IL6, LDH and Neutrophil (%) are common features for both risk and mortality prediction. Compared with Wuhan patients, Indian patients with extreme blood parameters indicated higher survival rate. Analyses of medications suggest that a higher proportion of survivors and mild patients who were administered steroids had extreme neutrophil and lymphocyte percentages. The ensemble averaged Bayesian network structure revealed serum ferritin to be the most important predictor for mortality and Vitamin D to influence severity independent of days to outcome. The findings are important for effective triage during strains on healthcare infrastructure.
Computational methods and recently modern machine learning methods have played a key role in structure-based drug design. Though several benchmarking datasets are available for machine learning applications in virtual screening, accurate prediction of binding affinity for a protein-ligand complex remains a major challenge. New datasets that allow for the development of models for predicting binding affinities better than the state-of-the-art scoring functions are important. For the first time, we have developed a dataset, PLAS-5k comprised of 5000 protein-ligand complexes chosen from PDB database. The dataset consists of binding affinities along with energy components like electrostatic, van der Waals, polar and non-polar solvation energy calculated from molecular dynamics simulations using MMPBSA (Molecular Mechanics Poisson-Boltzmann Surface Area) method. The calculated binding affinities outperformed docking scores and showed a good correlation with the available experimental values. The availability of energy components may enable optimization of desired components during machine learning-based drug design. Further, OnionNet model has been retrained on PLAS-5k dataset and is provided as a baseline for the prediction of binding affinities.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.