Objectives The COVID-19 pandemic has evolved the importance of monitoring viral evolution and correlated clinical outcomes in directing patient care and health policy. Our institution sequences the viral genome from essentially all SARS-CoV-2 infected patients in our hospital system’s diverse patient population, allowing for continuous monitoring of the virus composition in a major US metroplex. Our objective was to support the vision of near real-time assessment using bioinformatics and data management methods designed to enable rapid and accurate analyses. Methods Our bioinformatics pipeline is designed to analyze genomic data with paired patient data as rapidly as possible. To always have the most up-to-date patient information available, all co-morbidity, therapeutic and outcomes data are extracted from the Electronic Medical Record (EMR) system every night. Essentially all positive SARS-CoV-2 samples in our healthcare system are sequenced with an Illumina NovaSeq 6000 instrument in batches of 768 genomes. Immediately after the batched sequencing data is complete, the pipeline assembles the genomes using a Singularity container provided by the Bacterial and Viral Bioinformatics Resource Center (BV-BRC). Lineage calls are made using the latest available version of Pangolin. The genomic data is merged with the patient data, and 36 different statistical analyses (such as age, ethnicity, co-morbidity, admission rate, and mortality rate comparisons) are performed in 11 different patient cohorts (such as all Omicron vs Delta, vaccinated vs non-vaccinated, and monoclonal antibody recipients only). 30 interactive Tableau figures are automatically refreshed with the latest data, and summary reports are emailed to key stakeholders. The pipeline is orchestrated using the Snakemake workflow management system on an on-premise high-performance compute cluster (HPC). Results The bioinformatics analysis for a batched run containing 768 genomes, including genome assembly, all patient cohort comparisons and dashboard updates, completes in four hours. Conclusion Throughout the COVID-19 pandemic, but especially during the early weeks of each surge, our hospital leaders have been extremely interested in the analysis for each sequencing run. The use of a modular, container-based architecture orchestrated using a workflow manager on an on-premise HPC enables us to deliver results very quickly. The performance of this design allowed for dynamic decision making of our organization as new strains emerged in the Houston area.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.