2021
DOI: 10.1093/molbev/msab264
|View full text |Cite
|
Sign up to set email alerts
|

A Daily-Updated Database and Tools for Comprehensive SARS-CoV-2 Mutation-Annotated Trees

Abstract: The vast scale of SARS-CoV-2 sequencing data has made it increasingly challenging to comprehensively analyze all available data using existing tools and file formats. To address this, we present a database of SARS-CoV-2 phylogenetic trees inferred with unrestricted public sequences, which we update daily to incorporate new sequences. Our database uses the recently-proposed mutation-annotated tree (MAT) format to efficiently encode the tree with branches labeled with parsimony-inferred mutations, as well as Nex… Show more

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

1
124
0

Year Published

2021
2021
2024
2024

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 89 publications
(135 citation statements)
references
References 25 publications
1
124
0
Order By: Relevance
“…To avoid duplicate entries, we removed the Russian sequences present in the UShER tree, and then added the Russian GISAID sequences to the tree using UShER [ 19 ]. Branch lengths were corrected using mutation paths obtained by matUtils [ 20 ].…”
Section: Methodsmentioning
confidence: 99%
“…To avoid duplicate entries, we removed the Russian sequences present in the UShER tree, and then added the Russian GISAID sequences to the tree using UShER [ 19 ]. Branch lengths were corrected using mutation paths obtained by matUtils [ 20 ].…”
Section: Methodsmentioning
confidence: 99%
“…This format is richer than the others, as it provides information regarding each mutation event, even those that might be over-written by other mutations at the same position; it is also more efficient than multiple sequence alignment formats in the scenario of short branch lengths considered here. We also allow a binary analogue of this annotated Newick tree, called a MAT (mutation annotated tree) [37], which is compatible with the phylogenetic software UShER [27].…”
Section: Output Formatsmentioning
confidence: 99%
“…Therefore, from the early days of the pandemic, the global scientific community mobilized to monitor the viral mutations and the evolutionary dynamics with the help of genome sequencing ( Lo and Jamrozy, 2020 ; Maxmen, 2021 ). The first SARS-CoV-2 genome sequence was deposited on an online database in January 2020 ( Wu et al, 2020 ), and since then, over 4 million additional sequences have been shared through an extraordinary worldwide effort, with tens of thousands more being shared every day ( Maxmen, 2021 ; McBroome et al, 2021 ). This vast volume of genomic data has provided invaluable insights into the evolution and spread of the virus, and has allowed public health officials and governments to respond to it in a timely fashion ( Lam-Hine et al, 2021 ; Oude Munnink et al, 2020 ).…”
Section: Overview Of the Problemmentioning
confidence: 99%
“…We also parallelized our tools for use in a CPU cluster. For parallelizing UShER over CPU nodes, we had to implement a merge operation in matUtils ( McBroome et al, 2021 ). Briefly, this operation accepts two input MATs, checks if the subtrees resulting from common samples are consistent (i.e.…”
Section: Innovations Realizedmentioning
confidence: 99%
See 1 more Smart Citation