Summary The determination of endometrial carcinoma histological subtypes, molecular subtypes, and mutation status is critical for the diagnostic process, and directly affects patients’ prognosis and treatment. Sequencing, albeit slower and more expensive, can provide additional information on molecular subtypes and mutations that can be used to better select treatments. Here, we implement a customized multi-resolution deep convolutional neural network, Panoptes, that predicts not only the histological subtypes but also the molecular subtypes and 18 common gene mutations based on digitized H&E-stained pathological images. The model achieves high accuracy and generalizes well on independent datasets. Our results suggest that Panoptes, with further refinement, has the potential for clinical application to help pathologists determine molecular subtypes and mutations of endometrial carcinoma without sequencing.
Background The COVID-19 pandemic began in early 2021 and placed significant strains on health care systems worldwide. There remains a compelling need to analyze factors that are predictive for patients at elevated risk of morbidity and mortality. Objective The goal of this retrospective study of patients who tested positive with COVID-19 and were treated at NYU (New York University) Langone Health was to identify clinical markers predictive of disease severity in order to assist in clinical decision triage and to provide additional biological insights into disease progression. Methods The clinical activity of 3740 patients at NYU Langone Hospital was obtained between January and August 2020; patient data were deidentified. Models were trained on clinical data during different parts of their hospital stay to predict three clinical outcomes: deceased, ventilated, or admitted to the intensive care unit (ICU). Results The XGBoost (eXtreme Gradient Boosting) model that was trained on clinical data from the final 24 hours excelled at predicting mortality (area under the curve [AUC]=0.92; specificity=86%; and sensitivity=85%). Respiration rate was the most important feature, followed by SpO2 (peripheral oxygen saturation) and being aged 75 years and over. Performance of this model to predict the deceased outcome extended 5 days prior, with AUC=0.81, specificity=70%, and sensitivity=75%. When only using clinical data from the first 24 hours, AUCs of 0.79, 0.80, and 0.77 were obtained for deceased, ventilated, or ICU-admitted outcomes, respectively. Although respiration rate and SpO2 levels offered the highest feature importance, other canonical markers, including diabetic history, age, and temperature, offered minimal gain. When lab values were incorporated, prediction of mortality benefited the most from blood urea nitrogen and lactate dehydrogenase (LDH). Features that were predictive of morbidity included LDH, calcium, glucose, and C-reactive protein. Conclusions Together, this work summarizes efforts to systematically examine the importance of a wide range of features across different endpoint outcomes and at different hospitalization time points.
Determining endometrial carcinoma histological subtype is a critical diagnostic process that directly affects prognosis and treatment options. Recently, molecular subtyping and mutation status are gaining popularity as they offer more relevant information to evaluate the severity and develop individualized therapies. However, compared to the histopathological approach, the availability of molecular subtyping is limited as it can only be obtained by genomic sequencing, which is more expensive. Here, we implemented deep convolutional neural network models that predict not only the histological subtypes, but also molecular subtypes and 18 common gene mutations based on digitized H&E stained pathological images. Taking advantage of the multiresolution nature of the whole slide images, we introduced a customized architecture, Panoptes, to integrate features of different magnification. The model was trained and evaluated with images from The Cancer Genome Atlas and Clinical Proteomic Tumor Analysis Consortium. Our models achieved an area under the receiver operating characteristic curve (AUROC) of 0.969 in predicting histological subtype and 0.934 to 0.958 in predicting CNV-H molecular subtype. The prediction tasks of 4 mutations and MSI-High molecular subtype also achieved high performance with AUROC ranging from 0.781 to 0.873. Panoptes showed significantly better performance than InceptionResnet in most of these top predicted tasks by up to 18%. Feature extraction and visualization revealed that the model relied on human-interpretable patterns. Our results suggest that Panoptes can help pathologists determine molecular subtypes and mutations without sequencing, and is generally applicable to any cancer type.
Studies have shown that STK11 mutation plays a critical role in affecting the lung adenocarcinoma (LUAD) tumor immune environment. By training an Inception-Resnet-v2 deep convolutional neural network model, we were able to classify STK11-mutated and wild type LUAD tumor histopathology images with a promising accuracy (per slide AUROC=0.795). Dimensional reduction of the activation maps before the output layer of the test set images revealed that less immune cells were accumulated around cancer cells in STK11-mutation cases. Our study demonstrated that deep convolutional network model can automatically identify STK11 mutations based on histopathology slides and confirmed that the immune cell density was the main feature used by the model to distinguish STK11-mutated cases.
Background Postpartum hemorrhage remains one of the largest causes of maternal morbidity and mortality in the United States. Objective The aim of this paper is to use machine learning techniques to identify patients at risk for postpartum hemorrhage at obstetric delivery. Methods Women aged 18 to 55 years delivering at a major academic center from July 2013 to October 2018 were included for analysis (N=30,867). A total of 497 variables were collected from the electronic medical record including the following: demographic information; obstetric, medical, surgical, and family history; vital signs; laboratory results; labor medication exposures; and delivery outcomes. Postpartum hemorrhage was defined as a blood loss of ≥1000 mL at the time of delivery, regardless of delivery method, with 2179 (7.1%) positive cases observed. Supervised learning with regression-, tree-, and kernel-based machine learning methods was used to create classification models based upon training (21,606/30,867, 70%) and validation (4630/30,867, 15%) cohorts. Models were tuned using feature selection algorithms and domain knowledge. An independent test cohort (4631/30,867, 15%) determined final performance by assessing for accuracy, area under the receiver operating curve (AUROC), and sensitivity for proper classification of postpartum hemorrhage. Separate models were created using all collected data versus models limited to data available prior to the second stage of labor or at the time of decision to proceed with cesarean delivery. Additional models examined patients by mode of delivery. Results Gradient boosted decision trees achieved the best discrimination in the overall model. The model including all data mildly outperformed the second stage model (AUROC 0.979, 95% CI 0.971-0.986 vs AUROC 0.955, 95% CI 0.939-0.970). Optimal model accuracy was 98.1% with a sensitivity of 0.763 for positive prediction of postpartum hemorrhage. The second stage model achieved an accuracy of 98.0% with a sensitivity of 0.737. Other selected algorithms returned models that performed with decreased discrimination. Models stratified by mode of delivery achieved good to excellent discrimination but lacked the sensitivity necessary for clinical applicability. Conclusions Machine learning methods can be used to identify women at risk for postpartum hemorrhage who may benefit from individualized preventative measures. Models limited to data available prior to delivery perform nearly as well as those with more complete data sets, supporting their potential utility in the clinical setting. Further work is necessary to create successful models based upon mode of delivery and to validate the findings of this study. An unbiased approach to hemorrhage risk prediction may be superior to human risk assessment and represents an area for future research.
BACKGROUND Postpartum hemorrhage remains one of the largest causes of maternal morbidity and mortality in the United States. OBJECTIVE To utilize machine learning techniques to identify patients at risk for postpartum hemorrhage at obstetric delivery. METHODS Women aged 18 to 55 delivering at a major academic center from July 2013 to October 2018 were included for analysis (n = 30,867). A total of 497 variables were collected from the electronic medical record including demographic information, obstetric, medical, surgical, and family history, vital signs, laboratory results, labor medication exposures, and delivery outcomes. Postpartum hemorrhage was defined as a blood loss of ≥ 1000 mL at the time of delivery, regardless of delivery method, with 2179 positive cases observed (7.06%). Supervised learning with regression-, tree-, and kernel-based machine learning methods was used to create classification models based upon training (n = 21,606) and validation (n = 4,630) cohorts. Models were tuned using feature selection algorithms and domain knowledge. An independent test cohort (n = 4,631) determined final performance by assessing for accuracy, area under the receiver operating curve (AUC), and sensitivity for proper classification of postpartum hemorrhage. Separate models were created using all collected data versus limited to data available prior to the second stage of labor/at the time of decision to proceed with cesarean delivery. Additional models examined patients by mode of delivery. RESULTS Gradient boosted decision trees achieved the best discrimination in the overall model. The model including all data mildly outperformed the second stage model (AUC 0.979, 95% CI 0.971-0.986 vs. AUC 0.955, 95% CI 0.939-0.970). Optimal model accuracy was 98.1% with a sensitivity of 0.763 for positive prediction of postpartum hemorrhage. The second stage model achieved an accuracy of 98.0% with a sensitivity of 0.737. Other selected algorithms returned models that performed with decreased discrimination. Models stratified by mode of delivery achieved good to excellent discrimination, but lacked sensitivity necessary for clinical applicability. CONCLUSIONS Machine learning methods can be used to identify women at risk for postpartum hemorrhage who may benefit from individualized preventative measures. Models limited to data available prior to delivery perform nearly as well as those with more complete datasets, supporting their potential utility in the clinical setting. Further work is necessary to create successful models based upon mode of delivery. An unbiased approach to hemorrhage risk prediction may be superior to human risk assessment and represents an area for future research.
BACKGROUND The COVID-19 pandemic began in early 2021 and placed significant strains on health care systems worldwide. There remains a compelling need to analyze factors that are predictive for patients at elevated risk of morbidity and mortality. OBJECTIVE The goal of this retrospective study of patients who tested positive with COVID-19 and were treated at NYU (New York University) Langone Health was to identify clinical markers predictive of disease severity in order to assist in clinical decision triage and to provide additional biological insights into disease progression. METHODS The clinical activity of 3740 patients at NYU Langone Hospital was obtained between January and August 2020; patient data were deidentified. Models were trained on clinical data during different parts of their hospital stay to predict three clinical outcomes: deceased, ventilated, or admitted to the intensive care unit (ICU). RESULTS The XGBoost (eXtreme Gradient Boosting) model that was trained on clinical data from the final 24 hours excelled at predicting mortality (area under the curve [AUC]=0.92; specificity=86%; and sensitivity=85%). Respiration rate was the most important feature, followed by SpO<sub>2</sub> (peripheral oxygen saturation) and being aged 75 years and over. Performance of this model to predict the deceased outcome extended 5 days prior, with AUC=0.81, specificity=70%, and sensitivity=75%. When only using clinical data from the first 24 hours, AUCs of 0.79, 0.80, and 0.77 were obtained for deceased, ventilated, or ICU-admitted outcomes, respectively. Although respiration rate and SpO<sub>2</sub> levels offered the highest feature importance, other canonical markers, including diabetic history, age, and temperature, offered minimal gain. When lab values were incorporated, prediction of mortality benefited the most from blood urea nitrogen and lactate dehydrogenase (LDH). Features that were predictive of morbidity included LDH, calcium, glucose, and C-reactive protein. CONCLUSIONS Together, this work summarizes efforts to systematically examine the importance of a wide range of features across different endpoint outcomes and at different hospitalization time points.
Cell line perturbation data could be utilized as a reference for inferring underlying molecular processes in new gene expression profiles. It is important to develop accurate and computationally efficient algorithms to exploit biological knowledge in the growing compendium of existing perturbation data and harness these for new predictions. We reframed the problem of inferring possible gene perturbation based on a reference perturbation database into a classification task and evaluated the application of deep neural network models to address this problem. Our results showed that a fully-connected multi-layer neural network was able to achieve up to 74.9% accuracy in a holdout test set, but the model generalizability was limited by consistency between training and testing data. Capacity and flexibility enables neural network models to efficiently represent transcriptomic features associated with single gene knockdown perturbations. With consistent signals between training and testing sets, neural networks may be trained to classify new samples to experimentally confirmed molecular phenotypes.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.