Background Québec was the Canadian province most impacted by COVID-19, with 401,462 cases as of September 24th, 2021, and 11,347 deaths due mostly to a very severe first pandemic wave. In April 2020, we assembled the Coronavirus Sequencing in Québec (CoVSeQ) consortium to sequence SARS-CoV-2 genomes in Québec to track viral introduction events and transmission within the province. Methods Using genomic epidemiology, we investigated the arrival of SARS-CoV-2 to Québec. We report 2921 high-quality SARS-CoV-2 genomes in the context of > 12,000 publicly available genomes sampled globally over the first pandemic wave (up to June 1st, 2020). By combining phylogenetic and phylodynamic analyses with epidemiological data, we quantify the number of introduction events into Québec, identify their origins, and characterize the spatiotemporal spread of the virus. Results Conservatively, we estimated approximately 600 independent introduction events, the majority of which happened from spring break until 2 weeks after the Canadian border closed for non-essential travel. Subsequent mass repatriations did not generate large transmission lineages (> 50 sequenced cases), likely due to mandatory quarantine measures in place at the time. Consistent with common spring break and “snowbird” destinations, most of the introductions were inferred to have originated from Europe via the Americas. Once introduced into Québec, viral lineage sizes were overdispersed, with a few lineages giving rise to most infections. Consistent with founder effects, the earliest lineages to arrive tended to spread most successfully. Fewer than 100 viral introductions arrived during spring break, of which 7–12 led to the largest transmission lineages of the first wave (accounting for 52–75% of all sequenced infections). These successful transmission lineages dispersed widely across the province. Transmission lineage size was greatly reduced after March 11th, when a quarantine order for returning travellers was enacted. While this suggests the effectiveness of early public health measures, the biggest transmission lineages had already been ignited prior to this order. Conclusions Combined, our results reinforce how, in the absence of tight travel restrictions or quarantine measures, fewer than 100 viral introductions in a week can ensure the establishment of extended transmission chains.
The genome of the Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2), the pathogen that causes coronavirus disease 2019 (COVID-19), has been sequenced at an unprecedented scale leading to a tremendous amount of viral genome sequencing data. To assist in tracing infection pathways and design preventive strategies, a deep understanding of the viral genetic diversity landscape is needed. We present here a set of genomic surveillance tools from population genetics which can be used to better understand the evolution of this virus in humans. To illustrate the utility of this toolbox, we detail an in depth analysis of the genetic diversity of SARS-CoV-2 in first year of the COVID-19 pandemic. We analyzed 329,854 high-quality consensus sequences published in the GISAID database during the pre-vaccination phase. We demonstrate that, compared to standard phylogenetic approaches, haplotype networks can be computed efficiently on much larger datasets. This approach enables real-time lineage identification, a clear description of the relationship between variants of concern, and efficient detection of recurrent mutations. Furthermore, time series change of Tajima's D by haplotype provides a powerful metric of lineage expansion. Finally, principal component analysis (PCA) highlights key steps in variant emergence and facilitates the visualization of genomic variation in the context of SARS-CoV-2 diversity. The computational framework presented here is simple to implement and insightful for real-time genomic surveillance of SARS-CoV-2 and could be applied to any pathogen that threatens the health of populations of humans and other organisms.
Wastewater-based epidemiology has emerged as a promising tool to monitor pathogens in a population, particularly when clinical diagnostic capacities become overwhelmed. During the ongoing COVID-19 pandemic caused by Severe Acute Respiratory Syndrome Coronavirus-2 (SARS-CoV-2), several jurisdictions have tracked viral concentrations in wastewater to inform public health authorities. While some studies have also sequenced SARS-CoV-2 genomes from wastewater, there have been relatively few direct comparisons between viral genetic diversity in wastewater and matched clinical samples from the same region and time period. Here we report sequencing and inference of SARS-CoV-2 mutations and variant lineages (including variants of concern) in 936 wastewater samples and thousands of matched clinical sequences collected between March 2020 and July 2021 in the cities of Montreal, Quebec City, and Laval, representing almost half the population of the Canadian province of Quebec. We benchmarked our sequencing and variant-calling methods on known viral genome sequences to establish thresholds for inferring variants in wastewater with confidence. We found that variant frequency estimates in wastewater and clinical samples are correlated over time in each city, with similar dates of first detection. Across all variant lineages, wastewater detection is more concordant with targeted outbreak sequencing than with semi-random clinical swab sampling. Most variants were first observed in clinical and outbreak data due to higher sequencing rate. However, wastewater sequencing is highly efficient, detecting more variants for a given sampling effort. This shows the potential for wastewater sequencing to provide useful public health data, especially at places or times when sufficient clinical sampling is infrequent or infeasible.
Using genomic epidemiology, we investigated the arrival of SARS-CoV-2 to Québec, the Canadian province most impacted by COVID-19, with >280,000 positive cases and >10,000 deaths in a population of 8.5 million as of March 1st, 2021. We report 2,921 high-quality SARS-CoV-2 genomes in the context of >12,000 publicly available genomes sampled globally over the first pandemic wave (up to June 1st, 2020). By combining phylogenetic and phylodynamic analyses with epidemiological data, we quantify the number of introduction events into Québec, identify their origins, and characterize the spatio-temporal spread of the virus. Conservatively, we estimated at least 500 independent introduction events, the majority of which happened from spring break until two weeks after the Canadian border closed for non-essential travel. Subsequent mass repatriations did not generate large transmission lineages (>50 cases), likely due to mandatory quarantine measures in place at the time. Consistent with common spring break and 'snowbird' destinations, most of the introductions were inferred to have originated from Europe via the Americas. Fewer than 100 viral introductions arrived during spring break, of which 5-10 led to the largest transmission lineages of the first wave (accounting for 36-58% of all sequenced infections). These successful viral transmission lineages dispersed widely across the province, consistent with founder effects and superspreading dynamics. Transmission lineage size was greatly reduced after March 11th, when a quarantine order for returning travelers was enacted. While this suggests the effectiveness of early public health measures, the biggest transmission lineages had already been ignited prior to this order. Combined, our results reinforce how, in the absence of tight travel restrictions or quarantine measures, fewer than 100 viral introductions in a week can ensure the establishment of extended transmission chains.
Pangenomes—the cumulative set of genes encoded by a population or species—arise from the interplay of horizontal gene transfer, drift, and selection. The balance of these forces in shaping pangenomes has been debated, and studies to date focused on ancient evolutionary time scales have suggested that pangenomes generally confer niche adaptation to their bacterial hosts. To shed light on pangenome evolution on shorter evolutionary time scales, we inferred the selective pressures acting on mobile genes within individual human microbiomes from 176 Fiji islanders. We mapped metagenomic sequence reads to a set of known mobile genes to identify single nucleotide variants (SNVs) and calculated population genetic metrics to infer deviations from a neutral evolutionary model. We found that mobile gene sequence evolution varied more by gene family than by human social attributes, such as household or village. Patterns of mobile gene sequence evolution could be qualitatively recapitulated with a simple evolutionary simulation without the need to invoke adaptive value of mobile genes to either bacterial or human hosts. These results stand in contrast with the apparent adaptive value of pangenomes over longer evolutionary time scales. In general, the most highly mobile genes (i.e. those present in more distinct bacterial host genomes) tend to have higher metagenomic read coverage and an excess of low-frequency SNVs, consistent with their rapid spread across multiple bacterial species in the gut. However, a subset of mobile genes– including those involved in defense mechanisms and secondary metabolism—showed a contrasting signature of intermediate-frequency SNVs, indicating species-specific selective pressures or negative frequency-dependent selection on these genes. Together, our evolutionary models and population genetic data show that gene-specific selective pressures predominate over human or bacterial host-specific pressures during the relatively short time scales of a human lifetime.
The genome of the Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2), the pathogen that causes coronavirus disease 2019 (COVID-19), has been sequenced at an unprecedented scale, leading to a tremendous amount of viral genome sequencing data. To understand the evolution of this virus in humans, and to assist in tracing infection pathways and designing preventive strategies, we present a set of computational tools that span phylogenomics, population genetics and machine learning approaches. To illustrate the utility of this toolbox, we detail an in depth analysis of the genetic diversity of SARS-CoV-2 in first year of the COVID-19 pandemic, using 329,854 high-quality consensus sequences published in the GISAID database during the pre-vaccination phase. We demonstrate that, compared to standard phylogenetic approaches, haplotype networks can be computed efficiently on much larger datasets, enabling real-time analyses. Furthermore, time series change of Tajima's D provides a powerful metric of population expansion. Unsupervised learning techniques further highlight key steps in variant detection and facilitate the study of the role of this genomic variation in the context of SARS-CoV-2 infection, with Multiscale PHATE methodology identifying fine-scale structure in the SARS-CoV-2 genetic data that underlies the emergence of key lineages. The computational framework presented here is useful for real-time genomic surveillance of SARS-CoV-2 and could be applied to any pathogen that threatens the health of worldwide populations of humans and other organisms.
Pangenomes -- the cumulative set of genes encoded by a species -- arise from evolutionary forces including horizontal gene transfer (HGT), drift, and selection. The relative importance of drift and selection in shaping pangenome structure has been recently debated, and the role of sequence evolution (point mutations) within mobile genes has been largely ignored, with studies focusing mainly on patterns of gene presence or absence. The effects of drift, selection, and HGT on pangenome evolution likely depends on the time scale being studied, ranging from ancient (e.g., between distantly related species) to recent (e.g., within a single animal host), and the unit of selection being considered (e.g., the gene, whole genome, microbial species, or human host). To shed light on pangenome evolution within microbiomes on relatively recent time scales, we investigate the selective pressures acting on mobile genes using a dataset that previously identified such genes in the gut metagenomes of 176 Fiji islanders. We mapped the metagenomic reads to mobile genes to call single nucleotide variants (SNVs) and calculate population genetic metrics that allowed us to infer deviations from a neutral evolutionary model. We found that mobile gene sequence evolution varied more by gene family than by human social attributes, such as household or village membership, suggesting that selection at the level of gene function is most relevant on these short time scales. Patterns of mobile gene sequence evolution could be qualitatively recapitulated with a simple evolutionary simulation, without the need to invoke an adaptive advantage of mobile genes to their bacterial host genome. This suggests that, at least on short time scales, a majority of the pangenome need not be adaptive. On the other hand, a subset of gene functions including defense mechanisms and secondary metabolism showed an aberrant pattern of molecular evolution, consistent with species-specific selective pressures or negative frequency-dependent selection not seen in prophages, transposons, or other gene categories. That mobile genes of different functions behave so differently suggests stronger selection at the gene level, rather than at the genome level. While pangenomes may be largely adaptive to their bacterial hosts on longer evolution time scales, here we show that, on shorter "human" time scales, drift and gene-specific selection predominate.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.