IMPORTANCEThe National COVID Cohort Collaborative (N3C) is a centralized, harmonized, highgranularity electronic health record repository that is the largest, most representative COVID-19 cohort to date. This multicenter data set can support robust evidence-based development of predictive and diagnostic tools and inform clinical care and policy.OBJECTIVES To evaluate COVID-19 severity and risk factors over time and assess the use of machine learning to predict clinical severity. DESIGN, SETTING, AND PARTICIPANTSIn a retrospective cohort study of 1 926 526 US adults with SARS-CoV-2 infection (polymerase chain reaction >99% or antigen <1%) and adult patients without SARS-CoV-2 infection who served as controls from 34 medical centers nationwide between January 1, 2020, and December 7, 2020, patients were stratified using a World Health Organization COVID-19 severity scale and demographic characteristics. Differences between groups over time were evaluated using multivariable logistic regression. Random forest and XGBoost models were used to predict severe clinical course (death, discharge to hospice, invasive ventilatory support, or extracorporeal membrane oxygenation). MAIN OUTCOMES AND MEASURESPatient demographic characteristics and COVID-19 severity using the World Health Organization COVID-19 severity scale and differences between groups over time using multivariable logistic regression. RESULTSThe cohort included 174 568 adults who tested positive for SARS-CoV-2 (mean [SD] age, 44.4 [18.6] years; 53.2% female) and 1 133 848 adult controls who tested negative for SARS-CoV-2 (mean [SD] age, 49.5 [19.2] years; 57.1% female). Of the 174 568 adults with SARS-CoV-2, 32 472(18.6%) were hospitalized, and 6565 (20.2%) of those had a severe clinical course (invasive ventilatory support, extracorporeal membrane oxygenation, death, or discharge to hospice). Of the hospitalized patients, mortality was 11.6% overall and decreased from 16.4% in March to April 2020 to 8.6% in September to October 2020 (P = .002 for monthly trend). Using 64 inputs available on the first hospital day, this study predicted a severe clinical course using random forest and XGBoost models (area under the receiver operating curve = 0.87 for both) that were stable over time. The factor most strongly associated with clinical severity was pH; this result was consistent across machine learning methods. In a separate multivariable logistic regression model built for inference, (continued) Key Points Question In a US data resource large enough to adjust for multiple confounders, what risk factors are associated with COVID-19 severity and severity trajectory over time, and can machine learning models predict clinical severity? Findings In this cohort study of 174 568 adults with SARS-CoV-2, 32 472 (18.6%) were hospitalized and 6565 (20.2%) were severely ill, and first-day machine learning models accurately predicted clinical severity. Mortality was 11.6%
IMPORTANCE Suicide is a leading cause of mortality, with suicide-related deaths increasing in recent years. Automated methods for individualized risk prediction have great potential to address this growing public health threat. To facilitate their adoption, they must first be validated across diverse health care settings.OBJECTIVE To evaluate the generalizability and cross-site performance of a risk prediction method using readily available structured data from electronic health records in predicting incident suicide attempts across multiple, independent, US health care systems. DESIGN, SETTING, AND PARTICIPANTSFor this prognostic study, data were extracted from longitudinal electronic health record data comprising International Classification of Diseases, Ninth Revision diagnoses, laboratory test results, procedures codes, and medications for more than 3.7 million patients from 5 independent health care systems participating in the Accessible Research Commons for Health network. Across sites, 6 to 17 years' worth of data were available, up to 2018. Outcomes were defined by International Classification of Diseases, Ninth Revision codes reflecting incident suicide attempts (with positive predictive value >0.70 according to expert clinician medical record review). Models were trained using naive Bayes classifiers in each of the 5 systems. Models were cross-validated in independent data sets at each site, and performance metrics were calculated. Data analysis was performed from November 2017 to August 2019. MAIN OUTCOMES AND MEASURES The primary outcome was suicide attempt as defined by a previously validated case definition using International Classification of Diseases, Ninth Revision codes. The accuracy and timeliness of the prediction were measured at each site. RESULTS Across the 5 health care systems, of the 3 714 105 patients (2 130 454 female [57.2%])included in the analysis, 39 162 cases (1.1%) were identified. Predictive features varied by site but, as expected, the most common predictors reflected mental health conditions (eg, borderline personality disorder, with odds ratios of 8.1-12.9, and bipolar disorder, with odds ratios of 0.9-9.1) and substance use disorders (eg, drug withdrawal syndrome, with odds ratios of 7.0-12.9). Despite variation in geographical location, demographic characteristics, and population health characteristics, model performance was similar across sites, with areas under the curve ranging from 0.71 (95% CI, 0.70-0.72) to 0.76 (95% CI, 0.75-0.77). Across sites, at a specificity of 90%, the models detected a mean of 38% of cases a mean of 2.1 years in advance. CONCLUSIONS AND RELEVANCEAcross 5 diverse health care systems, a computationally efficient approach leveraging the full spectrum of structured electronic health record data was able to detect (continued) Key Points Question Can a process for training machine-learning algorithms based on electronic health records identify individuals at increased risk of suicide attempts across independent health care systems? Abstract (contin...
Objective The study sought to design, pilot, and evaluate a federated data completeness tracking system (CTX) for assessing completeness in research data extracted from electronic health record data across the Accessible Research Commons for Health (ARCH) Clinical Data Research Network. Materials and Methods The CTX applies a systems-based approach to design workflow and technology for assessing completeness across distributed electronic health record data repositories participating in a queryable, federated network. The CTX invokes 2 positive feedback loops that utilize open source tools (DQ e -c and Vue) to integrate technology and human actors in a system geared for increasing capacity and taking action. A pilot implementation of the system involved 6 ARCH partner sites between January 2017 and May 2018. Results The ARCH CTX has enabled the network to monitor and, if needed, adjust its data management processes to maintain complete datasets for secondary use. The system allows the network and its partner sites to profile data completeness both at the network and partner site levels. Interactive visualizations presenting the current state of completeness in the context of the entire network as well as changes in completeness across time were valued among the CTX user base. Discussion Distributed clinical data networks are complex systems. Top-down approaches that solely rely on technology to report data completeness may be necessary but not sufficient for improving completeness (and quality) of data in large-scale clinical data networks. Improving and maintaining complete (high-quality) data in such complex environments entails sociotechnical systems that exploit technology and empower human actors to engage in the process of high-quality data curating. Conclusions The CTX has increased the network’s capacity to rapidly identify data completeness issues and empowered ARCH partner sites to get involved in improving the completeness of respective data in their repositories.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.