Understanding the temporal dynamics of COVID-19 patient phenotypes is necessary to derive finegrained resolution of pathophysiology. Here we use state-of-the-art deep neural networks over an institution-wide machine intelligence platform for the augmented curation of 15.8 million clinical notes from 30,494 patients subjected to COVID-19 PCR diagnostic testing. By contrasting the Electronic Health Record (EHR)-derived clinical phenotypes of COVID-19-positive (COVIDpos, n=635) versus COVID-19-negative (COVIDneg, n=29,859) patients over each day of the week preceding the PCR testing date, we identify anosmia/dysgeusia (37.4-fold), myalgia/arthralgia (2.6-fold), diarrhea (2.2-fold), fever/chills (2.1-fold), respiratory difficulty (1.9-fold), and cough (1.8-fold) as significantly amplified in COVIDpos over COVIDneg patients. The specific combination of cough and diarrhea has a 3.2-fold amplification in COVIDpos patients during the week prior to PCR testing, and along with anosmia/dysgeusia, constitutes the earliest EHR-derived signature of COVID-19 (4-7 days prior to typical PCR testing date). This study introduces an Augmented Intelligence platform for the realtime synthesis of institutional knowledge captured in EHRs. The platform holds tremendous potential for scaling up curation throughput, with minimal need for retraining underlying neural networks, thus promising EHR-powered early diagnosis for a broad spectrum of diseases.
Temporal inference from laboratory testing results and triangulation with clinical outcomes extracted from unstructured EHR provider notes is integral to advancing precision medicine. Here, we studied 246 SARS-CoV-2 PCR-positive (COVIDpos)patients and propensity-matched 2,460 SARS-CoV-2 PCR-negative (COVIDneg) patients subjected to around 700,000 lab tests cumulatively across 194 assays. Compared to COVIDneg patients at the time of diagnostic testing, COVIDpos patients tended to have higher plasma fibrinogen levels and lower platelet counts. However, as the infection evolves, COVIDpos patients distinctively show declining fibrinogen, increasing platelet counts, and lower white blood cell counts. Augmented curation of EHRs suggests that only a minority of COVIDpos patients develop thromboembolism, and rarely, disseminated intravascular coagulopathy (DIC), with patients generally not displaying platelet reductions typical of consumptive coagulopathies. These temporal trends provide fine-grained resolution into COVID-19 associated coagulopathy (CAC) and set the stage for personalizing thromboprophylaxis.
Case reports of patients infected with COVID-19 and influenza virus (“flurona”) have raised questions around the prevalence and severity of co-infection. Using data from HHS Protect Public Data Hub, NCBI Virus, and CDC FluView, we analyzed trends in SARS-CoV-2 and influenza hospitalized co-infection cases and strain prevalences. We also characterized co-infection cases across the Mayo Clinic Enterprise from January 2020 to April 2022. We compared expected and observed co-infection case counts across different waves of the pandemic and assessed symptoms and outcomes of co-infection and COVID-19 mono-infection cases after propensity score matching on clinically-relevant baseline characteristics. From both Mayo Clinic and nationwide datasets, the observed co-infection rate for SARS-CoV-2 and influenza has been higher during the Omicron era (December 14, 2021 to April 2, 2022) compared to previous waves, but no higher than expected assuming infection rates are independent. At Mayo Clinic, only 120 co-infection cases were observed among 197,364 SARS-CoV-2 cases. Co-infected patients were relatively young (mean age: 26.7 years) and had fewer serious comorbidities compared to mono-infected patients. While there were no significant differences in 30-day hospitalization, ICU admission, or mortality rates between co-infected and matched COVID-19 mono-infection cases, co-infection cases reported higher rates of symptoms including congestion, cough, fever/chills, headache, myalgia/arthralgia, pharyngitis, and rhinitis. While most co-infection cases observed at Mayo Clinic occurred among relatively healthy individuals, further observation is needed to assess outcomes among subpopulations with risk factors for severe COVID-19 such as older age, obesity, and immunocompromised status. Significance Statement Reports of COVID-19 and influenza co-infections (“flurona”) have raised concern in recent months as both COVID-19 and influenza cases have increased to significant levels in the US. Here, we analyze trends in co-infection cases over the course of the pandemic to show that these co-infection cases are expected given the background prevalences of COVID-19 and influenza independently. In addition, from an initial analysis of these co-infection cases which have been observed at the Mayo Clinic, we find that these co-infection cases are extremely rare, have mostly been observed in relatively young, healthy patients, and do not have an increased risk of hospitalization, ICU admission, or death while they do have more emblematic viral symptoms.
The natural language portions of an electronic health record (EHR) communicate critical information about disease and treatment progression. However, the presence of personally identifying information in this data constrains its broad reuse. In the United States, the Health Insurance Portability and Accountability Act of 1996 (HIPAA) provides a de-identification standard for the removal of protected health information (PHI). Despite continuous improvements in methods for the automated detection of PHI over time, the residual identifiers in clinical notes continue to pose significant challenges - often requiring manual validation and correction that is not scalable to generate the amount of data needed for modern machine learning tools. In this paper, we describe an automated de-identification system that employs an ensemble architecture, incorporating attention-based deep learning models and rule based methods, supported by heuristics for detecting PHI in EHR data. Upon detection of PHI, the system transforms these detected identifiers into plausible, though fictional, surrogates to further obfuscate any leaked identifier. We evaluated the system with a publicly available dataset of 515 notes from the I2B2 2014 de-identification challenge and a dataset of 10,000 notes from the Mayo Clinic. We compared our approach with other existing tools considered best-in-class. The results indicated a recall of 0.992 and 0.994 and a precision of 0.979 and 0.967 on the I2B2 and the Mayo Clinic data, respectively.
Highlights d An ensemble approach to automated de-identification of unstructured clinical text d Our approach leverages advances in deep learning along with heuristics d Detected personally identifiable information is replaced with suitable surrogates d Patient data are de-identified at scale to accelerate medical discovery
Highly transmissible or immuno-evasive SARS-CoV-2 variants have intermittently emerged and outcompeted previously circulating strains, resulting in repeated COVID-19 surges, reinfections, and breakthrough infections in vaccinated individuals. With over 5 million SARS-CoV-2 genomes sequenced globally over the last 2 years, there is unprecedented data to decipher how competitive viral evolution results in the emergence of fitter SARS-CoV-2 variants. Much attention has been directed to studying how specific mutations in the Spike protein impact its binding to the ACE2 receptor or viral neutralization by antibodies, but there is limited knowledge of genomic signatures shared primarily by dominant variants. Here we introduce a methodology to quantify the genome-wide distinctiveness of polynucleotide fragments of various lengths (3- to 240-mers) that constitute SARS-CoV-2 lineage genomes. Compared to standard phylogenetic distance metrics and overall mutational load, the quantification of distinctive 9-mer polynucleotides provides a higher resolution of separation between variants of concern (Reference = 89, IQR: 65-108; Alpha = 166, IQR: 150-182; Beta 130, IQR: 113-147; Gamma = 165, IQR: 152-180; Delta = 234, IQR: 216-253; and Omicron = 294, IQR: 287-315). The similar scoring of the Alpha and Gamma variants by our methodology is consistent with these strains emerging at approximately the same time and circulating in distinct geographical regions as dominant strains. Furthermore, evaluation of genomic distinctiveness for 1,363 lineages annotated in GISAID highlights that polynucleotide diversity has increased over time (R2 = 0.37) and that VOCs show high distinctiveness compared to non-VOC contemporary lineages. To facilitate similar real-time assessments on the competitive fitness potential of future variants, we are launching a freely accessible resource for infusing pandemic preparedness with genomic inference ("GENI" — https://academia.nferx.com/GENI). This study demonstrates the value of characterizing new SARS-CoV-2 variants by their genome-wide polynucleotide distinctiveness and emphasizes the need to go beyond a narrow set of mutations at known functionally salient sites.
Temporal inference from laboratory testing results and their triangulation with clinical outcomes as described in the associated unstructured text from the provider notes in the Electronic Health Record (EHR) is integral to advancing precision medicine. Here, we studied 181 COVIDpos and 7,775 COVIDneg patients subjected to 1.3 million laboratory tests across 194 assays during a two-month observation period centered around their SARS-CoV-2 PCR testing dates. We found that compared to COVIDneg at the time of clinical presentation and diagnostic testing, COVIDpos patients tended to have higher plasma fibrinogen levels and similarly low platelet counts, with approximately 25% of patients in both cohorts showing outright thrombocytopenia. However, these measures show opposite longitudinal trends as the infection evolves, with declining fibrinogen and increasing platelet counts to levels that are lower and higher compared to the COVIDneg cohort, respectively. Our EHR augmented curation efforts suggest a minority of patients develop thromboembolic events after the PCR testing date, including rare cases with disseminated intravascular coagulopathy (DIC), with most patients lacking the platelet reductions typically observed in consumptive coagulopathies. These temporal trends present, for the first time, fine-grained resolution of COVID-19 associated coagulopathy (CAC), via a digital framework that synthesizes longitudinal lab measurements with structured medication data and neural network-powered extraction of outcomes from the unstructured EHR. This study demonstrates how a precision medicine platform can help contextualize each patients specific coagulation profile over time, towards the goal of informing better personalization of thromboprophylaxis regimen.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.