Fast and Accurate Distance-based Phylogenetic Placement using Divide and Conquer

Balaban, Metin; Roush, Daniel; Zhu, Quanyin; Mirarab, Siavash

doi:10.1101/2021.02.14.431150

Cited by 7 publications

(24 citation statements)

References 70 publications

(144 reference statements)

Supporting

Mentioning

Contrasting

Order By: Relevance

“…We use the published FastTree-2 results (tree and numeric parameters estimated under Minimum Evolution) on the nt78 datasets [21]. The scripts to calculate delta error was adapted from a published script [1], and borrow utilities from newick_utils. Further details are provided in the published scripts here.…”

Section: A Summarymentioning

confidence: 99%

“…The most accurate of the current phylogenetic placement methods operate using maximum likelihood, and so require the query sequences to already be aligned to the backbone sequences; examples of such methods include pplacer [9] and EPA-ng [3]. Other types of methods have been developed for phylogenetic placement that are not based on maximum likelihood; a recent example is APPLES-2 [1], which uses branch lengths and distances computed between the query sequence and the leaves in the backbone tree to place each query sequence. APPLES-2 is much faster than pplacer and EPA-ng and has lower memory requirements, but is not as accurate as pplacer.…”

Section: Introductionmentioning

confidence: 99%

“…The first is a software package [4], which computes the numerical model parameters on the backbone tree (required for the maximum likelihood placement of query sequences) using FastTree 2 [14] instead of RAxML. This substitution allows pplacer to run on larger datasets without producing negative infinity log likelihood values, as shown in [1, 21].…”

Section: Introductionmentioning

confidence: 99%

See 2 more Smart Citations

SCAMPP+FastTree: Improving Scalability for Likelihood-based Phylogenetic Placement

Chu

Warnow

2022

Preprint

View full text Add to dashboard Cite

Phylogenetic placement is the problem of placing 'query' sequences into an existing tree (called a 'backbone tree') whose leaves are aligned sequences, and has applications to updating large trees and microbiome analysis. While substantial advances have been made in developing methods for phylogenetic placement, to date the most accurate approaches (e.g., pplacer and EPA-ng) are based on maximum likelihood, and these methods tend to have computational challenges when the backbone tree is large. Of the two, EPA-ng can scale to larger backbone tree sizes than pplacer (which seems to be limited to about 5,000-leaf backbone trees), but pplacer seems to have better accuracy than EPA-ng when it can run. Divide-and-conquer methods have been developed to address the limited scalability of pplacer, which operate by finding a small subtree of the backbone tree for the given query sequence, and then placing into that small subtree; SCAMPP is a recent development that shows particular benefits. Another approach, which is specific for pplacer, is taxtastic, which provides numeric model parameters in a form that helps pplacer run on larger datasets. In this study, we examine the potential of using both these approaches for scaling pplacer to large datasets, exploring the impact on accuracy as well as on running time and memory usage. We show that the combination of techniques (i.e., pplacer-taxtastic-SCAMPP) produces the best accuracy of all placement methods to date, with excellent speed and reduced memory usage. Finally, we explore how changing the subtree size associated with the SCAMPP framework changes the runtime-accuracy trade-off, and discuss avenues for future research. Our software for pplacer-taxtastic-SCAMPP is available at https://github.com/gillichu/PLUSplacer-taxtastic.

show abstract

Section: A Summarymentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

SCAMPP+FastTree: Improving Scalability for Likelihood-based Phylogenetic Placement

Chu

Warnow

2022

Preprint

View full text Add to dashboard Cite

show abstract

“…Other phylogenetic placement methods have been developed that seek to improve scalability to larger trees or reduce running time (e.g., UShER (Turakhia et al, 2021), EPA-ng (Barbera et al, 2019), APPLES (Balaban et al, 2020), and APPLES-2 (Balaban et al, 2021)). EPA-ng is likelihood-based and has been optimized for "batch processing" of query sequences (so that the cost of performing phylogenetic placement of a large number of query sequences is much less than the cost of placing them one-by-one).…”

Section: Adding Sequences To Gene Treesmentioning

confidence: 99%

“…EPA-ng has slightly reduced accuracy compared to pplacer. APPLES is a very fast distance-based method; recent studies (Balaban et al, 2020(Balaban et al, , 2021Wedell et al, 2021) showed that APPLES can run on trees with 200,000 leaves and is much faster than both pplacer and EPA-ng. APPLES-2 is an improvement on APPLES with respect to accuracy and running time, and also scales to at least 200,000 sequences.…”

Section: Adding Sequences To Gene Treesmentioning

confidence: 99%

<strong></strong> Recent Progress on Methods for Estimating and Updating Large Phylogenies

Zaharias¹,

Warnow²

2021

Preprint

View full text Add to dashboard Cite

With the increased availability of sequence data and even of fully sequenced and assembled genomes, phylogeny estimation of very large trees (even of hundreds of thousands of sequences) is now a goal for some biologists. Yet, the construction of these phylogenies is a complex pipeline presenting analytical and computational challenges, especially when the number of sequences is very large. In the last few years, new methods have been developed that aim to enable highly accurate phylogeny estimations on these large datasets, including divide-and-conquer techniques for multiple sequence alignment and/or tree estimation, methods that can estimate species trees from multi-locus datasets while addressing heterogeneity due to biological processes (e.g., incomplete lineage sorting and gene duplication and loss), and methods to add sequences into large gene trees or species trees. Here we present some of these recent advances and discuss opportunities for future improvements.

show abstract

Scalable and Accurate Phylogenetic Placement Using pplacer-XR

Wedell

Cai

Warnow

2021

Lecture Notes in Computer Science

View full text Add to dashboard Cite

Fast and Accurate Distance-based Phylogenetic Placement using Divide and Conquer

Cited by 7 publications

References 70 publications

SCAMPP+FastTree: Improving Scalability for Likelihood-based Phylogenetic Placement

SCAMPP+FastTree: Improving Scalability for Likelihood-based Phylogenetic Placement

<strong></strong> Recent Progress on Methods for Estimating and Updating Large Phylogenies

Scalable and Accurate Phylogenetic Placement Using pplacer-XR

Contact Info

Product

Resources

About