As COVID-19 has spread from the first reported cases into a global pandemic, there has been a number of efforts to understand the mutations and clusters of genetic lineages of the SARS-CoV-2 virus. The high mutation rate and rapid spread makes this analysis capable of tracking chains of infections as well as putting individual sequences in context. Whole genomes of the SARS-CoV-2 virus are being collected and shared from across the globe. With the advent of affordable and prolific Next Generation Sequencing, this is the first pandemic in which the genomic evolution of the pathogen can be tracked in near real-time. So far, phylogenetic analysis methods have recently found a broader application in this regard. Here we demonstrate that Principal Component Analysis (PCA), used heavily in population genetics, corroborates the existing findings while providing unique new capabilities to understand our public repositories of complete virus sequences. This novel application of PCA is demonstrated on all publicly available SARS-CoV-2 samples from GenBank and other open-access databases until mid-April. We show that PCA is a useful and easy-to-use tool to analyze SARS-CoV-2 genomes in addition to phylogenetic analytics. It offers a previously untapped opportunity to analyze the dynamics of the current SARS-CoV-2 pandemic in a new way.
Since the beginning of the global SARS-CoV-2 pandemic, there have been a number of efforts to understand the mutations and clusters of genetic lines of the SARS-CoV-2 virus. Until now, phylogenetic analysis methods have been used for this purpose. Here we show that Principal Component Analysis (PCA), which is widely used in population genetics, can not only help us to understand existing findings about the mutation processes of the virus, but can also provide even deeper insights into these processes while being less sensitive to sequencing gaps. Here we describe a comprehensive analysis of a 46,046 SARS-CoV-2 genome sequence dataset downloaded from the GISAID database in June of this year.SummaryPCA provides deep insights into the analysis of large data sets of SARS-CoV-2 genomes, revealing virus lineages that have thus far been unnoticed.
As COVID-19 has spread from its origin in Wuhan, China, into a global pandemic, there has been a number of efforts to understand the mutations and clusters of genetic lineages of the SARS-CoV-2 virus. The high mutation rate and rapid spread makes this analysis capable of tracking chains of infections as well as putting individual sequences in context. So far, phylogenetic analysis methods have recently found a broader application in this regard. Here we demonstrate that Principal Component Analysis (PCA), used heavily in population genetics, corroborates the existing findings while providing unique new capabilities to understand our public repositories of complete virus sequences. This novel application of PCA is demonstrated on all publicly available SARS-CoV-2 samples from GenBank and other open-access databases until mid-April. We show that PCA is a useful and easy-to-use tool to analyze SARS-CoV-2 genomes in addition to phylogenetic analytics. It offers a previously untapped opportunity to analyze the dynamics of the current SARS-CoV-2 pandemic in a new way.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.