2022
DOI: 10.1101/2022.09.12.507613
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

PARNAS: Objectively Selecting the Most Representative Taxa on a Phylogeny

Abstract: The use of next-generation sequencing technology has enabled phylogenetic studies with hun- dreds of thousands of taxa. Such large-scale phylogenies have become a critical component in genomic epidemiology in pathogens such as SARS-CoV-2 and influenza A virus. However, de- tailed phenotypic characterization of pathogens or generating a computationally tractable dataset for detailed phylogenetic analyses requires bias free subsampling of taxa. To address this need, we propose parnas, an objective and flexible a… Show more

Help me understand this report
View published versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
5

Citation Types

0
7
0

Year Published

2023
2023
2024
2024

Publication Types

Select...
5

Relationship

2
3

Authors

Journals

citations
Cited by 5 publications
(7 citation statements)
references
References 40 publications
0
7
0
Order By: Relevance
“…All H5N1 HPAIV clade 2.3.4.4b sequences available in the EpiFlu database between 1 st September 2020 and 22 nd January 2024 were downloaded to create a sequence dataset. As North America and Europe were over-represented in this dataset, these were sub-sampled to maintain representative sequences using PARNAS 78 . The remaining dataset was separated by segment and aligned using Mafft v7.520 79 , and manually trimmed to the open-reading frame using Aliview version 1.26 80 The trimmed alignments were then used to a infer maximum-likelihood phylogenetic tree using IQ-Tree version 2.2.3 81 along with ModelFinder were downloaded to create a sequence dataset.…”
Section: Methodsmentioning
confidence: 99%
See 2 more Smart Citations
“…All H5N1 HPAIV clade 2.3.4.4b sequences available in the EpiFlu database between 1 st September 2020 and 22 nd January 2024 were downloaded to create a sequence dataset. As North America and Europe were over-represented in this dataset, these were sub-sampled to maintain representative sequences using PARNAS 78 . The remaining dataset was separated by segment and aligned using Mafft v7.520 79 , and manually trimmed to the open-reading frame using Aliview version 1.26 80 The trimmed alignments were then used to a infer maximum-likelihood phylogenetic tree using IQ-Tree version 2.2.3 81 along with ModelFinder were downloaded to create a sequence dataset.…”
Section: Methodsmentioning
confidence: 99%
“…The remaining dataset was separated by segment and aligned using Mafft v7.520 79 , and manually trimmed to the open-reading frame using Aliview version 1.26 80 The trimmed alignments were then used to a infer maximum-likelihood phylogenetic tree using IQ-Tree version 2.2.3 81 along with ModelFinder were downloaded to create a sequence dataset. As North America and Europe were over-represented in this dataset, these were sub-sampled to maintain representative sequences using PARNAS 78 . The remaining dataset was separated by segment and aligned using Mafft v7.520 79 , and manually trimmed to the open-reading frame using Aliview version 1.26 80 The trimmed alignments were then used to a infer maximum-likelihood phylogenetic tree using IQ-Tree version 2.2.3 81 along with ModelFinder 8 and 1,000 ultrafast bootstraps 82 .…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…Threshold distances of 0.24659 and 0.21677 were used for EATRO1125 and Lister427, respectively. Down sampling and subtree generation of representative sequences was performed using PARNAS (0.1.4) 59 . The command used for PARNAS included flag: --cover –radius <THRESHOLD distance>.…”
Section: Methodsmentioning
confidence: 99%
“…The remaining sequences were aligned and trimmed to the ORF for each segment before generating a concatenated alignment using SeqKit (44) and then used to a infer maximum-likelihood phylogenetic tree using IQ-Tree version 2.2.3 (45). This resultant phylogenetic tree contained over 2,000 sequences and was therefore sub-sampled to cover 98% of the diversity within using PARNAS (46), which reduce the dataset down approximately 300 sequences whilst still containing representatives of the predominant UK genotypes. The sub-sampled dataset was then used to infer maximum-likelihood phylogenies for each gene segment using IQ-Tree along with ModelFinder (47) and 1,000 ultrafast bootstraps (48).…”
Section: Methodsmentioning
confidence: 99%