We present an interpretable machine learning algorithm called ‘eARDS’ for predicting ARDS in an ICU population comprising COVID-19 patients, up to 12-hours before satisfying the Berlin clinical criteria. The analysis was conducted on data collected from the Intensive care units (ICU) at Emory Healthcare, Atlanta, GA and University of Tennessee Health Science Center, Memphis, TN and the Cerner® Health Facts Deidentified Database, a multi-site COVID-19 EMR database. The participants in the analysis consisted of adults over 18 years of age. Clinical data from 35,804 patients who developed ARDS and controls were used to generate predictive models that identify risk for ARDS onset up to 12-hours before satisfying the Berlin criteria. We identified salient features from the electronic medical record that predicted respiratory failure among this population. The machine learning algorithm which provided the best performance exhibited AUROC of 0.89 (95% CI = 0.88–0.90), sensitivity of 0.77 (95% CI = 0.75–0.78), specificity 0.85 (95% CI = 085–0.86). Validation performance across two separate health systems (comprising 899 COVID-19 patients) exhibited AUROC of 0.82 (0.81–0.83) and 0.89 (0.87, 0.90). Important features for prediction of ARDS included minimum oxygen saturation (SpO2), standard deviation of the systolic blood pressure (SBP), O2 flow, and maximum respiratory rate over an observational window of 16-hours. Analyzing the performance of the model across various cohorts indicates that the model performed best among a younger age group (18–40) (AUROC = 0.93 [0.92–0.94]), compared to an older age group (80+) (AUROC = 0.81 [0.81–0.82]). The model performance was comparable on both male and female groups, but performed significantly better on the severe ARDS group compared to the mild and moderate groups. The eARDS system demonstrated robust performance for predicting COVID19 patients who developed ARDS at least 12-hours before the Berlin clinical criteria, across two independent health systems.
Robust generalization to new concepts has long remained a distinctive feature of human intelligence. However, recent progress in deep generative models has now led to neural architectures capable of synthesizing novel instances of unknown visual concepts from a single training example. Yet, a more precise comparison between these models and humans is not possible because existing performance metrics for generative models (i.e., FID, IS, likelihood) are not appropriate for the one-shot generation scenario. Here, we propose a new framework to evaluate one-shot generative models along two axes: sample recognizability vs. diversity (i.e., intra-class variability). Using this framework, we perform a systematic evaluation of representative one-shot generative models on the Omniglot handwritten dataset. We first show that GAN-like and VAE-like models fall on opposite ends of the diversity-recognizability space. Extensive analyses of the effect of key model parameters further revealed that spatial attention and context integration have a linear contribution to the diversity-recognizability trade-off. In contrast, disentanglement transports the model along a parabolic curve that could be used to maximize recognizability. Using the diversity-recognizability framework, we were able to identify models and parameters that closely approximate human data.Preprint. Under review.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.